Supersize my data

Notes from the Global Social Science Research Meeting, University of Pittsburgh, July 22-3 2016

The world of Big Data. Source: ">Slideshare

The world of Big Data. Source: Slideshare

We live in the Digital Age. The development of communications and research technologies in recent years have made the world smaller—travel and communication is faster, easier, and wider, connecting the world in ways never seen before. The flip-side to living in an ever-shrinking world, though, is that information keeps growing; the easier it becomes to collaborate with colleagues living in disparate parts of the globe, the better that digital technologies get at storing massive amounts of data, and the more refined tools we develop to connect different sources of knowledge together, the more information we have at our disposal.

What this means is that we are now able, for the first time in human history, to explore properly and systematically some of life’s truly huge questions: How and why did humans first form states? Why is there so much inequality in the world, not only economic but in terms of social well-being, health, and access to resources? Why do people move around so much and what effects to different sorts of migrations have on society? We’ve dealt with some of this issues in previous blog posts here. The crucial point is that to answer such Big Questions requires truly Big Data—massive amounts of information on human society offering a global sample and extending as far back into history as possible.[1]

Well in Bhaktapur, Nepal. Source: Pixabay

Well in Bhaktapur, Nepal. Source: Pixabay

Take, for instance, inequality, one of the most pressing problems in the world today. Understanding how social and economic inequalities develop requires that we look beyond only the recent past. Deep historical events actually play a huge role in how we live today: the sort of tax laws in place in the early days of Industrialization, long-run changes in the environment alter the wealth of nations, and cultural norms that dictate how well people cooperate and sharing resources evolve over many centuries, to name only a few. It is crucial, then, to take into account as mane different societies as possible and track the development of key variables over as long a time frame as possible.

This is precisely what the Seshat: Global History Databank[2] and a handful of other large-scale social science projects are attempting to achieve. The issue now is not what can we do with all of the information out there, but what ‘s the best way to bring it all together so we can start to answer some of these key questions and, hopefully, improve the lives of people around the world?

Cathedral of Learning, University of Pittsburgh. Source: Wikipedia

Cathedral of Learning, University of Pittsburgh. Source: Wikipedia

Recently, a few of these groups got together for two days at the University of Pittsburgh to discuss these critical issues. Mainly, we wanted to see if there was any overlap between what we are all trying to do and, if so, how can we best pool our resources to get the most out of our collective efforts?

The meeting was organized by Prof. Pat Manning, director of the World History Center at Pitt and one of the world’s most vocal and productive champions of large-scale, data-intensive social science research. Together with his colleague Vladimir Zadhorozny (who was also in attendance) and many others, they have put together the CHIA database that seeks to streamline the process of connecting different historian’s datasets. Ruth Mostern, who will be taking over the World History Center now that Pat is retiring, was in attendance as well. Ruth is an expert on medieval China and has been working on geo-referencing projects to provide stable, authoritative lists of places online. This sort of geo-referencing is a key part of the ‘Big Data’ research we are all doing, because the different places described and defined by projects like Ruth’s become nodes through which the rest of us link up our data, since all history occurs in particular places! Molly Warsh, a professor of World History at Pitt and PhD candidate Bennet Sherry also were able to joint in the discussions.

We were also very fortunate that Leo Lucassen and Auke Rijpma were able to fly in from the Netherlands for the workshop. Leo is research director at the International Institute of Social History in Amsterdam and has for years been working with colleagues on the Global Collaboratory on the History of Labour Relations 1500-2000[3], a huge effort to digitally combine datasets that deal with labour history—types of production, different types of labour relations in a society, the ways that different segments of the population fit into the labour system, etc. Auke is an assistant professor at Utrecht University who works on the long-term historical dynamics of economic and social well-being. He also helps run the CLARIAH project, an enormous digital infrastructure project funded by the Dutch government that is developing a flexible, user-friendly system for integrating social science datasets.

Hiroko Inoue flew in from California to represent the IROWS center as well as to discuss the database of settlement and urbanization that she is constructing along with Chris Chase-Dunn, who joined us by Skype. Also joining us from California was Dennis Flynn, an economist who is working on a large project exploring the history of silver, from its use as a money, to its transport around the world as a valuable commodity, to the process of production from mining to minting to spending in different historical societies. Steve Ruggles of IPUMS and the Minnesota Population Center Skyped in as well. Steve works on integrating micro-historical datasets concerning census records from the United States, an invaluable source of information on social history.

For Seshat, overall coordinator Peter Turchin along with founding editor Pieter Francois and research associate Daniel Hoyer were in attendance. We were very happy to be involved with such a lively and accomplished group. It became clear quite early on in our discussions that there is a lot of overlap between all of our research.

Mainly, we have all come to realize that recent developments in digital technologies offer the possibility for a new kind of research, one that is data-driven, with a global and historical scope that is simply unprecedented in previous social scientific research, and therefore one that allows us to answer these Big Questions. We are all interested in exploring how and why different types of inequalities develop over time and, of course, how to reverse those trends? Answering this involves developing measures of well-being and access to resources for as many historical societies as possible, tracking the movement of people and goods across huge distances, exploring the dynamics of urban growth and decline, and a hole host of other factors that we are, collectively, gathering data on.

In the end, we decided to formalize our association so that we would keep in touch and can coordinate our work— Big Questions, Big Data Association. We hope to continue to add more and more groups to the association to be part of the discussion and to collaborate on even bigger joint research. Anyone who uses the available digital tools to answer Big historical questions, not just about inequality but on any important issue of human life our association, will be a welcome addition to the group. Look for the Association’s website to go online in the near future and stay tuned for some interesting and innovative research coming out of the group in the next few years!

[1] Peter Turchin, “Arise ‘Cliodynamics,’” Nature 454, no. 7200 (2008): 34–35, doi:10.1038/454034a.

[2] Peter Turchin et al., “Seshat: The Global History Databank,” Cliodynamics: The Journal of Quantitative History and Cultural Evolution 6, no. 1 (2015),

[3] Karin Hofmeester et al., “The Global Collaboratory on the History of Labour Relations, 1500-2000: Background, Set-Up, Taxonomy, and Applications,” 2016.


Name *
Email *