ALIGNED: Quality-Centric Software and Data Engineering
Seshat: Global History Databank is a partner of the international ALIGNED consortium, which has been recently funded by Horizon 2020, the massive EU program for research and innovation.
ALIGNED develops models and tools to convert Big Data sources into high-quality, structured knowledge, using the Linked Data approach. The Seshat project will provide a rich, real-world case study to test the tools developed by the ALIGNED consortium for collecting high-quality historical and archaeological data.
Through the ALIGNED grant, Seshat will be able to tackle three issues common to large-scale social sciences or humanities database projects in innovative ways, answering questions such as these:
- How do we structure and control the uploading of data by very large numbers of users?
- How can we speed up data collection?
- How can we be sure of the data quality?
ALIGNED will help to tackle these issues by developing a sophisticated software platform that will underpin the Seshat database. This platform will be based on the RDF standard, and Seshat data will be stored in a Triplestore databank; it will be scalable and suited to deal with very large user numbers and user types; and it will structure the data-gathering process and will keep track of how every single data point is entered, augmented, approved, or challenged.
ALIGNED will also speed up the data-gathering process considerably by building state-of-the-art web crawlers, integrated within the Seshat workspace, that will go over large digital text corpora. The use of these crawlers will allow Seshat contributors to quickly identify precise information in a very large body of text.