New Big Data and Development Fellowship: Apply to work with our Data Scientist

I am happy to report that the Big Data and Human Development Incubator has just recruited a data scientist to work with the network for the next 9 months.

Fabian Braesemann has a PhD in Social and Economic Sciences from the University of Vienna, and has been working as a Data Scientist over the last year. He has experience/skills in diverse aspects of data science and data mining. However, his main expertise is on datafication and the application of data science to social science questions. He will be using the computing facilities at the Oxford Internet Institute.

In order to kick-start some pilot projects within our network, we are proposing that teams from around the university can bid to work with Fabian on small projects that use computational/big data approaches and speak to key questions in human development (we can take a broad view of what constitutes human development, and will consider research that focuses on poverty, inequality, economic development, political voice and representation, activism, and other topics).

We ask that any interested teams submit a 1-2 page pitch. Each pitch should outline what questions the group seeks to answer, what data they seek to use, and what further work this might result in. It should also outline the cross-disciplinary or cross-departmental nature of the work (these bids can be a way of building links or deepening existing links, but should not just be used by teams or scholars within a single department). Please also include short bios of the proposed team members. More details about what we need are included in the bullet point list below.

Proposals could be speculative in nature (scraping data, trialing analysis), but do not have to be.

We hope to ask Fabian to work with three successful teams over his 9 month contract at Oxford, so please pitch an idea that will take roughly 2-4 months of his time either as full time engagement or on a part-time basis. We ask for all pitches to be submitted by March 1.

Please feel free to circulate this CFP. Submit your pitches to Mark Graham, and please get in touch if you have any questions.

 

  • Projects can address questions in any area of human development.
  • Projects should be completed by March 2018 at the latest.
  • There is no further funding available from the Big Data and Human Development Incubator. In other words, these your proposal will be to work with Fabian, and will not be a bid for any additional funding from us.
  • Projects can be exploratory or at early stages of development. Most important is that you have a vision for how this might turn into a larger funded project after completion of your pilot work.
  • You may work with collaborators outside of Oxford, but the core team should be based at the university.
  • Anyone at the university who is eligible to apply for external funding will be eligible to lead one of these bids.
  • If we receive more than three three month bids, we will distribute all applications to lead-researchers on each of the bids and other scholars in the big data and development network in order to ask them to rank submissions.

Webcasts for the Symposium on Big Data and Development

The Symposium on Big Data and Development that we hosted in Oxford  last month was a great success. There were some excellent papers there, and there are now range of follow-up ideas and initiatives that have been spun out of it (including an edited book containing some of the contributions to the meeting).

If you weren’t able to make it to the symposium (and I know many didn’t because we were massively oversubscribed), we managed to film the three keynotes, and the concluding panel. You can watch them all below:

Big data, the Global South, and digital transformation

The University of the Witwatersrand (Wits), in Johannesburg, recently made the news for the launch of one of the most innovative centers in Africa seeking to leverage academic research to promote new applications in the fields of big data, digital business, and social innovation.

In parallel with this effort, we (the Media Department, and Journalism Department and the LINK Center), launched a series of seminars reflecting on the transformative power of the Internet. The focus on the first seminar was on big data and development. In resonance with the spirit informing the Big Data and Development Incubator at Oxford, scholars and practitioners from different disciplines – from information systems, to journalism, media studies and urban planning – reflected on the promises and drawbacks of the abundance of digital born data, and its relevance for countries seeking to combat poverty, exclusion, and guarantee basic services.

While the format was a typical round-table, the debate and discussion that followed, offered unusual lessons that are important to share more broadly with the community reflecting on – and seeking to shape – the relationship between big data and human development.

The first part of the seminar highlighted continuities and discontinuing between the rhetoric characterizing “new” debates on big data and development and “older” debates on the “digital divide”, those that dominated the late 1990s and early 2000s. The language used by most international organizations reflects these continuities. For example, the argument, “Big data gives unprecedented power to understand, analyze, and ultimately change the world we live in”, included in the United Nations Big Data and the 2030 Agenda for Sustainable Development in 2015, dramatically resembles “The rapid progress of ICTs opens completely new opportunities to attain higher levels of development”, that could be found in the Declaration of Principles of the World Summit on the Information Society, twelve years earlier.

Obvious discontinuities are the amount of data produced everyday, both in the Global North and in the Global South, and in the processing power that can turn even digital exhaust into potentially useful information to understand human behavior. This quality, well exemplified by the Data for Development Challenge launched by Orange Telecom in Ivory Coast, can prove particularly important in countries striving to build strong enough institutions able to offer reliable statistics (a challenge well explained in Morten Jerven’s Poor Numbers).

A less obvious discontinuity, and the one that I want to emphasise here, depends on the different tone that can – potentially – characterize the relationship between the powerful (e.g. global corporations, donor countries) and the less powerful (e.g. low-income users, developing countries). In the 1990s and 2000s, most initiatives aimed at reducing the global digital-divide built on the assumption that those at the margin should – and wanted to? – catch up with the innovations created at the center. The dominant imagery was somehow bulkier than what we are used to today. It revolved around hardware that had to be moved from one place to another, and on the idea of a gap, or a time that had to pass for an innovation to reach the less innovative. Even the most well-meaning initiatives were built on a relationship of dependency, between those who had the technology and the skills, and those who did not, but might aspire to get some of it.

The big data and development agenda does share some of these traits, but, as the conversation at Wits highlighted, it does not have to. So far users, academics, entrepreneurs, even governments in the Global South seem to have been relatively compliant with the status quo, accepting also in this case to depend on the goodwill of those that appear to be driving the development agenda. In the case of Orange’s Data for Development Challenge, for example, while the data released were those of Ivorian users, as highlighted by Taylor and Schroeder, the teams making use of them were all from institutions outside of Ivory Coast, mostly in the Global North. When compared to digital transformations of 1990s and 2000s, however, it is important to take note that the raw materials fuelling this new “revolution” are of a different kind. Data are not hardware that need to be moved, or innovations that need to be mastered. They are already “there”, they are produced everyday, by – almost – everyone, skilled and unskilled. Orange was generous in releasing some of the data in its possession, allowing experimentation. But users, when collectively organized, can push the agenda a bit further. They can turn someone else’s generosity into a duty. They can demand that some of the data, those with the greatest public relevance is released in the public domain. This logic does not apply only to telecom operators, but to many other players that are contributing to shape the information society.

Facebook has recently boosted its philanthropic and developmental rhetoric, with initiatives like Free Basics, aimed at using satellites, drones, and other innovations to connect the disconnected, even in the most remote areas. But, as we illustrated in a recent study we published with UNESCO, Facebook has also strongly resisted any request from academics, civil society groups, and other organizations to release some of the data it possesses, even those with a clear public utility. Uses of Facebook to incite hatred and possibly lead to violence are a good example. Facebook has developed a notice and takedown approach, asking users to flag content they consider abusive or dangerous. The aggregated data emerging from this process could offer a useful early warning system to understand whether and to which extent a minority group is at greater risk of being attacked in a specific national context. Similarly to Free Basics, this issue is of great relevance to those interested in the impact of digital technology in the Global South and in their uses that can support human development. Facebook, however, appears to have shown little interest in responding to these types of requests, and trailing on agendas that are not of its own making.

More than in the past, however, the relationship of digital dependency – one that assumes that a more powerful actor will generously relinquish some of its rights, or act in support of the less powerful – could be broken. Playing with the contradictions some of the tech giants (e.g. Facebook, Twitter, Google) are creating for themselves as they seek to boost their image by helping the “poor”, or seek to combat injustices, can be a way to do so. Users in the Global South may be the first, asking for their data to be freed for “development”, seeking to set the rules for global corporation to help them, rather than waiting for their innovations and generosity. After all, as the global resonance of #Rhodesmustfall and #feesmustfall movements that first emerged in South Africa has indicated, innovations (in imagery, not just in technology) do not have to travel only in one direction to be effective.

Symposium on Big Data and Human Development – closing remarks

It has been an extremely rewarding two days at the Symposium on Big Data and Human Development that Eduardo Lopez and I organised. We had a full room of people from academia, government, and the development sector – all speaking about how we might better use big data in the contexts of development.

There are many threads that we’ll try to tie up over the next few weeks (an edited book, some workshop reports, perhaps another conference next year, etc.). But in the meantime, it might be useful if I reproduce the notes that I used to sum up the event here. Those of you who attended, please do comment if you see that I omitted anything. Those of you who didn’t, please feel free to use this as a prompt to get involved.

***

This has been a much-needed conversation at a moment in which we’re awash with hype about ‘big data’.

We’ve learnt a lot about some of the potentials of big data: We’ve got new sorts of early warning signals. And – as we move from data to information to knowledge – we seem to be getting better at figuring out what to look for when it comes to disease tracking, or predicting things like student failure rates or corruption.

The fact that so much data comes from mobile phones has also created a specific opportunity to look at human mobility. And the relative democratisation of connectivity has important implications for deliberation and public participation at scales that have never before been possible.

But, with all that in mind, I want to pick up with areas that I think we still need to find ways to resolve as we all move forwards at this intersection of topics:

First, one theme that keeps coming up is that of data presences and absences really mattering. We have great data about some places, processes, people. But there are still big gaps – and, going forwards, we’ll really to address this head-on. If we’re using data to deploy scarce resources or deliver essential services, but there are blank spots on our map – then what strategies should we be employing to deal not just with our known unknowns, but also our unknown unknowns? Some of this might entail really getting good about asking questions about outliers in our models: Where are they, who are they, when are they?

Second, another important theme is not just data presences and absences – but even within the presences, there is the question of open versus closed data. So, for instance – many of us – me included – tend to use Twitter data to ask and answer a range of questions. And we do this because it is easily available and free and relatively straightforward to use.

But we should be careful that we don’t get into the sort of situation in which the tail wags the dog rather than the dog wags the tail – as my colleague Ralph Schroeder puts it. What sorts of questions are we prevented from asking because of a lack of open, available data sources? What sorts of questions or topics are we perhaps focusing too much energy on? And what sorts of questions do our data lend or not lend themselves to?

Third, and relatedly, we’re faced with some tension between issues of privacy, ownership, and control. How do we balance the desire to have more open data with best practices that prevent data leakage and still afford citizens with some control over their own data shadows?

There was an interesting discussion in the session that I organised with Richard Heeks at the DSA conference earlier this week about what we might learn from the literature on resource management – if we treat data as a resource.

And more broadly, are we happy with the current political-economy of development data? What current rights of access, control, and use should be rethought and challenged?

Fourth, how do we ourselves operate with maximum transparency – especially when we’re not just dealing with descriptive analytics, but predictive analytics, and even prescriptive analytics? If our research, and the data we use, impacts on real people in real ways – are we happy with the current scientific models of dissemination that we use – or do we need any sort of alternate strategies that better engage with the communities that are the users – or subjects – of development?

Fifth, what can, or should, we learn across contexts? Or specifically, what should we rethink and relearn in different places or contexts? What sorts of things aren’t transferrable? This is maybe where the repeated call throughout this conference for all of us to be thinking and collaborating in a multidisciplinary way comes in useful.

Symposium on Big Data and Human Development | Sept 15-16 2016 | Final Programme

This workshop aims to move forward the debate about the ways in which big data is used, can be used, and should be used in development.

This symposium will also serve as a bridge between methodological knowledge about big data, critical academic research on the topic, and the desires of stakeholders and practitioners to achieve key developmental outcomes and goals.

This conference will use the hashtag #datahumdev

With keynotes by:

  • Professor Bitange Ndemo, Former Permanent Secretary of Kenya’s Ministry of Information and Communication, and Lecturer at the University of Nairobi
  • Professor Alex (Sandy) Pentland, Academic Director of Data-Pop Alliance, and Director of the MIT Human Dynamics Lab
  • Dr Linnet Taylor, Assistant Professor in Data Ethics, Law & Policy, Tilburg Institute for Law, Technology and Society (TILT)

Organizers:

 Programme

Thursday 15 September
Start End Schedule
12:30 13:00 Registration
13:00 13:15 Welcome & Opening Remarks

13:15 14:15 Keynote

  • Dr. Bitange Ndemo, Former Permanent Secretary of Kenya’s Ministry of Information and Communication and University of Nairobi Business School
14:15 14:45 Coffee Break
14:45 15:15
15:15 15:45
15:45 17:45 ‘Health and Big Data’

  • Speakers TBC
Friday 16 September
Start End Schedule
09:00 10:00 Keynote

10:00 10:30 Coffee Break
10:30 11:30 Parallel Session A (Lecture Theatre 04)Title TBC

“The Diffusion of Ultrasound Technology and Missing Women: An Analysis based on Google Searches for India”

 10:30 11:30 Parallel Session B (Seminar Room)“Exploring the Potential of Open Data and Aid Transparency for Development”

“Building the Online Labour Index”

“Emergency Event Detection Using Mobile Phone Data”

11:30 12:30 Parallel Session A (Lecture Theatre 04)“The Economic Geography of the Internet 2.0: Digital Social Capital and Cities.”

“Data, Visualisation and Human Development”

“Measuring the Hidden Contours of the Global Knowledge Economy with Big Data”

 11:30 12:30 Parallel Session B (Seminar Room)“The Making of Beneficiaries: On the Datification of Anti-Poverty Programmes”

“Big data for development research in the Global South: Experiential lessons from LIRNEasia”

“Combining Big and Traditional Data Sources to Enhance Public Policy Decision Making for Sustainable Development”

  • Dr. Jonggun Lee, Pulse Lab Jakarta – United Nations Global Pulse
12:30 13:30 Lunch (provided)
13:30 14:30 Keynote

14:30 15:30 Parallel Session A ( Lecture Theatre 04)“Algovernance: Can Open Algorithms Revive Democratic Principles and Processes?”

“Digging into big data and development: Findings from three Indian cases”

14:30 15:30 Parallel Session B (Seminar Room)“Using IATI Data in “big data for human development” Research”

“Big Data and Development: the Role of Competition Law and Policy”

Title TBC

15:30 15:50 Coffee Break
15:50 16:50 Panel & Keynote

16:50 17:00 Closing Remarks

Symposium on Big Data and Human Development

We are happy to announce a two-day symposium (Sept 15-16) that we are running in Oxford on the topic of big data and human development. This workshop aims to move forward the debate about the ways in which big data is used, can be used, and should be used in development.

This symposium will also serve as a bridge between methodological knowledge about big data, critical academic research on the topic, and the desires of stakeholders and practitioners to achieve key developmental outcomes and goals.

We are lucky to have keynotes lined up from the following speakers:

  • Professor Bitange Ndemo, Former Permanent Secretary of Kenya’s Ministry of Information and Communication, and Lecturer at the University of Nairobi
  • Professor Alex (Sandy) Pentland, Academic Director of Data-Pop Alliance, and Director of the MIT Human Dynamics Lab
  • Dr Linnet Taylor, Fellow at the Department of International Development, University of Amsterdam

Call for abstracts

We welcome the submission of abstracts (of max 250 words) for talks, panels, and sessions at the workshop. Submit them to christopher.dobyns@oii.ox.ac.uk by 15 July 2016.

Papers presented in the conference will be considered for an edited volume in big data and human development.

Please contact Mark Graham (mark.graham@oii.ox.ac.uk) with any questions.

The Oxford Human Development and Big Data Incubator is working to stimulate policy-oriented research. Topics that we seek to focus on in our workshop include (but are not limited to):

  • What ‘big data’ can tell us about human development; how we can facilitate better decision-making and accountability in previously data-sparse environments;
  • What presences and absences of data tell us about issues of participation and exclusion among marginalised populations;
  • What tools have emerged globally that can maximise citizen ownership of big data, by making data meaningful within the cultures of participation that characterise different localities.
  • Research results of projects employing big data in the contexts of development.

Submissions may include:

Talks: Contributors are invited to submit full-length talks (15 min) related to the conference themes

Panels: Contributors are invited to pitch a panel discussion on core conferences themes

Demonstrations: Contributors are invited to submit an idea for a demonstration (which may be facilitated as part of a panel as a stand-alone event)

To attend, please email your name and affiliation to events@oii.ox.ac.uk. Attending this conference is free of charge. Please note space is limited and registration preference will be given to contributors of selected abstracts. Otherwise, check out our official events page for up-to-date information, and hope to see you in Oxford!

Jobs in Big Data and Development

The Alan Turing Institute is now hiring 3-year fellows in social data science and the digital humanities. One of the specific areas that they are looking for is someone who does research at the intersections of big data and development studies.

Check out specifics on their website (closing date July 13).

Historicizing Big Data and Geo-information

I was asked by my colleague Oliver Belcher to act as a discussant in a session that he put together at the 2016 meeting of the Association of American Geographers: ‘Historicizing Big Data and Geo-information’.

The session contained a set of truly excellent and provocative talks by Louise Amoore, Matthew Hannah, Patrick McHaffie, and Eric Huntley. I’ve now had a chance to type up my discussant notes (although apologies for the hastily-put-together nature of the text).

I think that this has been a much-needed set of papers at a moment in which we’re awash with hype about ‘big data’. We hear that we’re at a significant moment of change; that there’s a big data revolution that will transform science, society, the economy, security, politics, and just about everything else.

And so, it’s important that these sorts of conversations are brought together. To allow us to think about continuities and discontinuities. To allow us to think about what is and what isn’t truly new here. And to do that in order to hone our responses as critical scholars.

One way to start – perhaps – is to recognize, as all of the papers in this session have, that while ‘big data’ may not be new, we’re in, have been in, or at least have long been moving towards, what Danny Hills refers to as an Age of Entanglement. I think it is maybe useful as a starting point for me to quote him here.

He says “We are our thoughts, whether they are created by our neurons, by our electronically augmented minds, by our technologically mediated social interactions, or by our machines themselves. We are our bodies, whether they are born in womb or test tube, our genes inherited or designed, organs augmented, repaired, transplanted, or manufactured. We are our perceptions, whether they are through our eyes and ears or our sensory-fused hyper-spectral sensors, processed as much by computers as by our own cortex. We are our institutions, cooperating super-organisms, entangled amalgams of people and machines with super-human intelligence, processing, sensing, deciding, acting.”

In other words, while big data may not be new, we do now undoubtedly live in a digitally infused, digitally-augmented world. One in which we’re entangled in complex digital ecosystems; hybrid complex ecosystems in which it is increasingly hard to disentangle agency and intent.

Why’s my phone telling me to go left and not right? Why is the supermarket creating some personalized economic incentives for me and not others? Why is the search engine prioritizing some knowledge and not others? As researchers, it is hard to address questions like these because there is often no straightforward way of knowing the answers. Do we look to the code embedded within algorithms? Do we look to the people or institutions who created the algorithms? Do we look to the databases? Do we look to the people who manage the databases? Or do we look to the people, processes, and places emitting the data?

What today’s talks have all usefully done is point to the fact that we need to be addressing some combination of all of those questions.

So, let me just pick up with a few general reflections and concerns about what the histories of big data mean for the futures of big data – that emerge from listening to these talks. Like all of the speakers, what I’ll especially focus on here is what geography can bring to the table.

First, is a thought about our role as geographers. Many geographers, me included, spend a lot of time thinking about the geographies of the digital; thinking about how geographies still matter.

We probably do a lot of this to counter some of the ongoing, relentless, Silicon Valley narratives of things like the cloud. Narratives that claim that – provided they are connected – anyone can do anything from anywhere at any time. So, we end up spending a lot of our energy pushing back: arguing that there is no such thing as the cloud. That there are just computers. Computers all in other places.

But I wonder if we’re missing a trick, by not also asking more questions about the contextual, specific, but likely present ways in which some facets of geography might actually matter less in a world of ever more digital connectivity. Not as a way of ceding ground to the breathless ‘distance is dead’ narrative – but in a critical and grounded way. Are there any economic, social, and political domains is distance, or spatial context actually becoming less relevant?

Second, when we speak about big data, or the cloud, or the algorithms that govern code-spaces, we often envision the digital as operating in a black box. In many cases, that is unavoidable.

But we can also draw more on the work from computational social science, critical internet studies, human computer interaction, and critical GIS. In all of those domains, research is attempting to open the black boxes of cloud computing; of big data; of opaque algorithms. Scholars are asking and answering questions about the geographies of the digital. Where is it; who produces it; who does it represent; who doesn’t it represent; who has control; to whom is it visible.

There is much more that clearly needs to be done, and this work needs to be relentless, ongoing, and – of course – critical. But, one hope for the future is to see more cross-pollination with those communities who are developing tools, methods, and strategies to empirically engage with geography in a more data-infused world. So, yes – there are black boxes. But those boxes are sometimes hackable.

Third, and relatedly. A key critique of ‘big data’ – that I see – in the critical data studies community, is the one about correlative reasoning (in other words, if your dataset is big enough, you no longer need theory; no longer need to understand context; and can just look for correlations in the dataset). And relatedly a lack of reflexivity within those data practices of data analysis. But I wonder if we aren’t also overplaying our hand a little here. Some big data practitioner work does stop at correlations, but a fair amount of it can be quite reflexive and aware of its limitations.

These researchers are still building or doing multi level models, social network analytics, community detection models, influence analysis, predictive analytics, and machine learning. My point here, is that whilst a lot of ‘big data’ work is undoubtedly naïve, let’s also not underestimate the power that those with access to the right datasets, the computing resources to analyse those datasets, and the methods to analyse those datasets – have.

My broader point is that we really need to find the balance of not understating and not overstating the work being done by governments and corporations in the domain of big data. Yes, some big data practices out there are naïve and dumb. And yes, some are terrifyingly precise in the ways that they can anticipate human behavior.

To get that balance, I think we need a few things. The first is to pay attention to what has been called the Fourth Law of Thermodynamics: That, the amount of energy required to refute bullshit is an order of magnitude larger than required to create it. Let’s make sure our energy is wisely spent.

To get the right balance, it also seems clear that all of us need to not just try to better understand the nuances of key techniques and methods being employed.  But also to think about what we can specifically add to the debate as geographers – and on this latter point, this is something that I think the papers in this session did very well.

Fourth, when thinking about the political economy of data, it’s becoming ever more clear that we need a full throated response to reclaim our digital environments (a point that Agnieszka Leszczynski has been forcefully making). Privacy and security scholars and activists have been especially vocal here. But let’s again think about what our role as geographers can be in this debate.

The way in which my colleague Joe Shaw and I are thinking about this is (and – my advertising pitch here is that this is something we’re speaking about in three sessions we’ve organized on the topic on Friday morning) – is to argue that we need to translate some of the ongoing ‘right to the city’ debate into the digital sphere. The point being that if places we live in are bricks, mortar, and digital data – we need to think about rights of access, control, and use – in our digitally-infused world.

This is just one type of intervention; and I’m sure that building on the foundations of critical historicisations of big data can offer us fertile ground for reimagining what we actually want our data-infused futures to look like.

Fifth, something that I saw, and really appreciated, in all of the papers was a forceful reflection on how data are always political. Too often data, and especially ‘big data’, gets presented as a neutral and objective reflection of the object of its measurement. That big data can speak for themselves. That, big data are a mirror of the world around them. What a lot of today’s work has done is reflect on not just how data reflect the world, but also how they produce it; how they drive it. As we tread deeper into Danny Hills’ ‘Age of Entanglement’, this is something we’ll need much more of.

As Trevor Barnes, in the last session mentioned, the best kinds of papers leave you hungry for more detail – and a few more things I would have loved to have heard more about are:

From Louise –a bit more about what our vision of the cloud enables beyond the cloud; the cloud in many ways can make some facets of the cloud – or life – perceptible – the cloud being deployed to study life online. But how much of the cloud vision is about moving beyond the cloud – being deployed to study life offline; to study the facets of life that aren’t directly emitting digital data shadows? Also, the empirical work you spoke about sounds fascinating – and I hope the questions give you some more time to bring out ways in which you’ve gone behind the algorithms – and underneath the cloud – to look at how these knowledges are created.

From Matthew – it was interesting to see how some of our contemporary concerns about the power of big data to aid the surveillance powers of the powerful – are far from new. So what might protests against contemporary regimes learn from the earlier moments you spoke about? There are many of us who want to opt out; is this now less possible because of the more encompassing nature of contemporary data collection regimes?

From Eric – I wonder if the idea of the ‘world inventory’ in the 80s; the details of it; what it means in practice, were similar to large tech firm like Google’s vision of a world inventory of geospatial information today. Does a world inventory now mean something significantly different from what it used to?

From Patrick – You didn’t use the term ‘smart city’. But I wonder if you’ve looked into any so called ‘smart city’ initiatives – and if you could say more about how we should we should be honing our inquiry into the so-called ‘smart city’ based on what you’ve learnt here; based on what we know about the visions that brought the Cartographatron into being?

For all of us – scholars in this field – I wonder if we’re all speaking about the same thing in this session when we talk about ‘big data’. Are we taking about datasets that are a census rather than a sample of a population. Are we just using ‘big data’ as a proxy for ‘digital data’? Are we using that term to refer to the whole contemporary apparatus of data trails, shadows, storage, analysis, and use? Are we using it to refer to digital unknown unknowns – the digital black box? Is the term actually helping us as short-hand for something else? Or do we need more precise language if we want to make sure we’re genuinely having a conversation?

And finally, for all of us, I want to ask why this seems to continue to be such a male dominated field? In two sessions, seven speakers, and two discussants, we had only one female speaker. Are we reproducing the gendered nature of earlier computational scholarship? One of the dangers of telling these histories – is that it can end up being white men speaking about white men. This is not a critique of the organiser, as I know Oliver is well attune to these issues, but rather a question about how and why we might be re(producing) masuclinist knowledges.

So, to end – I want to again thank Oliver and the speakers for putting together this session on historicizing Big Data. We need more conversations like this; and we need more scholarship like this. And this is work that will have impacts beyond the boundaries of geography.

We know that we can’t go backwards; and I think the goal that many of us have is a more just, more democratic, more inclusive data-infused world. And to achieve that, one thing we all need to be doing is participating in ongoing debates about how we govern, understand, and make sense of our digitally-augmented contexts.

And perhaps one thing that we can all take away from this session is that if we want to take part in the debate – to influence it – we’ll need to understand big data’s history if we want to change its futures.

Cross-posted from Geonet: Investigating the Changing Connectivities and Potentials of Sub-Saharan Africa’s Knowledge Economy

Using Alternative Data Sources to Validate International Surveys?

The narrative of misleading development statistics has been iterated in recent years. Morton Jerven’s well-known text, How We are Misled by African Development Statistics and What to Do about It, challenges the reliability of development statistics and calls for new approaches to data collection. More recently, Michael Robbins and Noble Kuriakose have determined that approximately 1 in 5 international surveys contain fabricated data.

By reviewing responses from more than 1000 surveys, Robbins and Kuriakose identified 17% of these surveys as “likely to contain a significant portion of fabricated data. For surveys conducted in wealthy westernized nations, that figure drops to 5%, whereas for those done in the developing world it shoots up to 26%.”

The two researchers came to these conclusions by identifying duplicate responses in surveys. One existing hypothesis is that many survey responses are biased due to the presence of data collection assistants (who are often collecting data door to door). Although some researchers have challenged Robbins and Kuriakose’s methods, many others believe that the problem is even larger than the stated 17 percent.

Clearly, there is a strong argument to be had that additional data sources should be used to validate survey responses. One of our objectives at the Big Data and Development Incubator is to understand how data has been effectively harnessed in the context of development. By furthering the conversation on data and development, we aim to support development professionals, practitioners, and scholars.