Seshat: A brief look at 150,000 data points

It has taken a long time – five years of data input and the assistance of many researchers – to acquire 150,000 sourced data points for the Seshat databank: an epic historical time series which incorporates hundreds of variables.

Over this period the Seshat databank has overcome technical obstacles, refined its research methods, and at every step of the way acquired leading academic experts to advise on chronologies and the state of knowledge.

Still none of this would have happened without the vision of our sponsors who have funded the project.

Now, in autumn 2016, we are finally at a stage where a critical mass of data has been gathered and Seshat begins to speak.

All the numbers the database now produces are interesting – even the numbers about the numbers.

Chart 1 below graphs one portion of the Seshat data (click to enlarge).

Seshat data

Light Orange: Total figure for data points for all polity sheets within the NGA.
Dark Grey: Average number of data points per polity sheet within the NGA.

To understand this chart one needs to understand Seshat data on polities (historical kingdoms, states, empires and sometimes time periods and archaeological traditions) is collected by sample regions called Natural Geographic Areas (NGAs).

NGA sample regions are time-portholes through which the researcher travels to visit and jot down the social, religious, economic, political, military and environmental changes as they occur, dynamically, throughout its unique history.

The chart further samples this NGA data by including only the data for polities that existed at every 100-year interval between about 9000 BCE and 1900 CE (actual number of polities sampled is written next to their name along the y-axis).

Chart 1 holds two forms of interest: the constructive and the academic.

Constructive: we identify areas where we have a lot of data coded and areas that may need more data. For example, our Seshat coding sheets for polities contain about 700 different variables, split into three main sections called Phase I, Phase II and Ritual variables.

Whilst all polities have received Phase I and Ritual coding work on Phase II variables is on-going: complete for Cambodia, much done for Upper Egypt and the Middle Yellow River Valley and Latium, but a lot more to do for e.g. Konya Plain and Kachi Plain.

The academic interest is that this metadata is beginning to itself reflect the general pattern of global history.

That is, regions with longer histories of statehood have handed down to us much more data about themselves.

We also may notice we will find less data than expected for some NGAs compared to others even though they have equally long histories.

For example, the Konya Plain and Susiana regions have 9000 year histories that exceed Egypt and China but this is not yet reflected in our accumulated data – perhaps it never will be.

These regions – in modern day Turkey and Iran – were always at a geopolitical crossroads – where Persians, Romans, Mongols and Arabs all clashed – and so suffered a comparatively greater amount of destruction of archaeology and written records.

For both these regions the lack of data may also reflect the fact they have received less attention from archaeologists and historians.

In the case of Susiana particularly, but for various reasons true everywhere, early archaeological work and discoveries were unfortunately lost; a fact that emphasizes how important digitized historical databases will be for the preservation of our knowledge of historical societies.

Chart 2 (click to enlarge) shows more Seshat data which obviously is weighted to the Middle Yellow River Valley in terms of the state of its completeness.

Seshat data

The total figure of 150,000 data points for the Seshat database is an undercount. Not included in these figures: our data on wars and battles, the Archaeological database, and a dynamic record on Crop Yields and Carrying Capacity for each NGA.

All this data will help the Seshat databank serve its useful purpose as a resource against which historical and social scientific theories may be tested.

If you are an expert interested in advising us (especially on regions currently underrepresented in our data points) please contact 


Name *
Email *

November 14, 2016

How do you download the dataset? I couldn’t find a link.

    Jill Levine
    November 14, 2016

    Hi Tim. We will start to publish data this spring. At this time we only have a sample polity on the site (

    You can sign up for our newsletter (on the sidebar) or follow us on social media to be the among the first to find out when the databank becomes open access.

      November 15, 2016