Big data, the Global South, and digital transformation

The University of the Witwatersrand (Wits), in Johannesburg, recently made the news for the launch of one of the most innovative centers in Africa seeking to leverage academic research to promote new applications in the fields of big data, digital business, and social innovation.

In parallel with this effort, we (the Media Department, and Journalism Department and the LINK Center), launched a series of seminars reflecting on the transformative power of the Internet. The focus on the first seminar was on big data and development. In resonance with the spirit informing the Big Data and Development Incubator at Oxford, scholars and practitioners from different disciplines – from information systems, to journalism, media studies and urban planning – reflected on the promises and drawbacks of the abundance of digital born data, and its relevance for countries seeking to combat poverty, exclusion, and guarantee basic services.

While the format was a typical round-table, the debate and discussion that followed, offered unusual lessons that are important to share more broadly with the community reflecting on – and seeking to shape – the relationship between big data and human development.

The first part of the seminar highlighted continuities and discontinuing between the rhetoric characterizing “new” debates on big data and development and “older” debates on the “digital divide”, those that dominated the late 1990s and early 2000s. The language used by most international organizations reflects these continuities. For example, the argument, “Big data gives unprecedented power to understand, analyze, and ultimately change the world we live in”, included in the United Nations Big Data and the 2030 Agenda for Sustainable Development in 2015, dramatically resembles “The rapid progress of ICTs opens completely new opportunities to attain higher levels of development”, that could be found in the Declaration of Principles of the World Summit on the Information Society, twelve years earlier.

Obvious discontinuities are the amount of data produced everyday, both in the Global North and in the Global South, and in the processing power that can turn even digital exhaust into potentially useful information to understand human behavior. This quality, well exemplified by the Data for Development Challenge launched by Orange Telecom in Ivory Coast, can prove particularly important in countries striving to build strong enough institutions able to offer reliable statistics (a challenge well explained in Morten Jerven’s Poor Numbers).

A less obvious discontinuity, and the one that I want to emphasise here, depends on the different tone that can – potentially – characterize the relationship between the powerful (e.g. global corporations, donor countries) and the less powerful (e.g. low-income users, developing countries). In the 1990s and 2000s, most initiatives aimed at reducing the global digital-divide built on the assumption that those at the margin should – and wanted to? – catch up with the innovations created at the center. The dominant imagery was somehow bulkier than what we are used to today. It revolved around hardware that had to be moved from one place to another, and on the idea of a gap, or a time that had to pass for an innovation to reach the less innovative. Even the most well-meaning initiatives were built on a relationship of dependency, between those who had the technology and the skills, and those who did not, but might aspire to get some of it.

The big data and development agenda does share some of these traits, but, as the conversation at Wits highlighted, it does not have to. So far users, academics, entrepreneurs, even governments in the Global South seem to have been relatively compliant with the status quo, accepting also in this case to depend on the goodwill of those that appear to be driving the development agenda. In the case of Orange’s Data for Development Challenge, for example, while the data released were those of Ivorian users, as highlighted by Taylor and Schroeder, the teams making use of them were all from institutions outside of Ivory Coast, mostly in the Global North. When compared to digital transformations of 1990s and 2000s, however, it is important to take note that the raw materials fuelling this new “revolution” are of a different kind. Data are not hardware that need to be moved, or innovations that need to be mastered. They are already “there”, they are produced everyday, by – almost – everyone, skilled and unskilled. Orange was generous in releasing some of the data in its possession, allowing experimentation. But users, when collectively organized, can push the agenda a bit further. They can turn someone else’s generosity into a duty. They can demand that some of the data, those with the greatest public relevance is released in the public domain. This logic does not apply only to telecom operators, but to many other players that are contributing to shape the information society.

Facebook has recently boosted its philanthropic and developmental rhetoric, with initiatives like Free Basics, aimed at using satellites, drones, and other innovations to connect the disconnected, even in the most remote areas. But, as we illustrated in a recent study we published with UNESCO, Facebook has also strongly resisted any request from academics, civil society groups, and other organizations to release some of the data it possesses, even those with a clear public utility. Uses of Facebook to incite hatred and possibly lead to violence are a good example. Facebook has developed a notice and takedown approach, asking users to flag content they consider abusive or dangerous. The aggregated data emerging from this process could offer a useful early warning system to understand whether and to which extent a minority group is at greater risk of being attacked in a specific national context. Similarly to Free Basics, this issue is of great relevance to those interested in the impact of digital technology in the Global South and in their uses that can support human development. Facebook, however, appears to have shown little interest in responding to these types of requests, and trailing on agendas that are not of its own making.

More than in the past, however, the relationship of digital dependency – one that assumes that a more powerful actor will generously relinquish some of its rights, or act in support of the less powerful – could be broken. Playing with the contradictions some of the tech giants (e.g. Facebook, Twitter, Google) are creating for themselves as they seek to boost their image by helping the “poor”, or seek to combat injustices, can be a way to do so. Users in the Global South may be the first, asking for their data to be freed for “development”, seeking to set the rules for global corporation to help them, rather than waiting for their innovations and generosity. After all, as the global resonance of #Rhodesmustfall and #feesmustfall movements that first emerged in South Africa has indicated, innovations (in imagery, not just in technology) do not have to travel only in one direction to be effective.

Symposium on Big Data and Human Development – closing remarks

It has been an extremely rewarding two days at the Symposium on Big Data and Human Development that Eduardo Lopez and I organised. We had a full room of people from academia, government, and the development sector – all speaking about how we might better use big data in the contexts of development.

There are many threads that we’ll try to tie up over the next few weeks (an edited book, some workshop reports, perhaps another conference next year, etc.). But in the meantime, it might be useful if I reproduce the notes that I used to sum up the event here. Those of you who attended, please do comment if you see that I omitted anything. Those of you who didn’t, please feel free to use this as a prompt to get involved.


This has been a much-needed conversation at a moment in which we’re awash with hype about ‘big data’.

We’ve learnt a lot about some of the potentials of big data: We’ve got new sorts of early warning signals. And – as we move from data to information to knowledge – we seem to be getting better at figuring out what to look for when it comes to disease tracking, or predicting things like student failure rates or corruption.

The fact that so much data comes from mobile phones has also created a specific opportunity to look at human mobility. And the relative democratisation of connectivity has important implications for deliberation and public participation at scales that have never before been possible.

But, with all that in mind, I want to pick up with areas that I think we still need to find ways to resolve as we all move forwards at this intersection of topics:

First, one theme that keeps coming up is that of data presences and absences really mattering. We have great data about some places, processes, people. But there are still big gaps – and, going forwards, we’ll really to address this head-on. If we’re using data to deploy scarce resources or deliver essential services, but there are blank spots on our map – then what strategies should we be employing to deal not just with our known unknowns, but also our unknown unknowns? Some of this might entail really getting good about asking questions about outliers in our models: Where are they, who are they, when are they?

Second, another important theme is not just data presences and absences – but even within the presences, there is the question of open versus closed data. So, for instance – many of us – me included – tend to use Twitter data to ask and answer a range of questions. And we do this because it is easily available and free and relatively straightforward to use.

But we should be careful that we don’t get into the sort of situation in which the tail wags the dog rather than the dog wags the tail – as my colleague Ralph Schroeder puts it. What sorts of questions are we prevented from asking because of a lack of open, available data sources? What sorts of questions or topics are we perhaps focusing too much energy on? And what sorts of questions do our data lend or not lend themselves to?

Third, and relatedly, we’re faced with some tension between issues of privacy, ownership, and control. How do we balance the desire to have more open data with best practices that prevent data leakage and still afford citizens with some control over their own data shadows?

There was an interesting discussion in the session that I organised with Richard Heeks at the DSA conference earlier this week about what we might learn from the literature on resource management – if we treat data as a resource.

And more broadly, are we happy with the current political-economy of development data? What current rights of access, control, and use should be rethought and challenged?

Fourth, how do we ourselves operate with maximum transparency – especially when we’re not just dealing with descriptive analytics, but predictive analytics, and even prescriptive analytics? If our research, and the data we use, impacts on real people in real ways – are we happy with the current scientific models of dissemination that we use – or do we need any sort of alternate strategies that better engage with the communities that are the users – or subjects – of development?

Fifth, what can, or should, we learn across contexts? Or specifically, what should we rethink and relearn in different places or contexts? What sorts of things aren’t transferrable? This is maybe where the repeated call throughout this conference for all of us to be thinking and collaborating in a multidisciplinary way comes in useful.