Ph.D. Lausanne

« Torna all'elenco

Inserito il 19/08/2013

from digitalhumanities.org

Open Ph.D Position : Semantic extraction in large-scale databases of newspaper articles

Website : http://dhlab.epfl.ch/page-88103-en.html

Description of the project

The Digital Humanities Laboratory (dhlab.epfl.ch) at EPFL works on several large-scale digitization projects. One of the lab’s major long-term goals is a fine-grained reconstruction of Switzerland based on news archives. As a first step, EPFL has concluded a collaboration with Le Temps, responsible for an archive containing a digital version of several of important newspapers of French speaking Switzerland – about 4 millions articles covering a 200 year period. We estimate that several billion pieces of spatiotemporal information can be extracted from this rich resource alone.

Through text mining techniques, we will set up a process for extracting semantic information about places, people and events from the documents of these archives. The resulting large database of linked data, reconstructed from these disparate sources, could be used for instance to query the variety of events on a particular day, at a particular place, enriching historical understanding with a larger socioeconomic, political, cultural, and even meteorological context. The database could also be exploited for more complex services, such as the reconstruction of socio-biographical networks (a “Facebook” of the past).

Each step of this process will take into account that journalists’ records could diverge from one another, intentionally or not and that information extracted from the database are intrinsically uncertain. A probabilistic framework will therefore be created to deal with such inconsistencies and uncertainties in order to reason about more plausible reconstructions of past events and manage the existence of alternative hypotheses. Creating a system capable of dealing with the intrinsic uncertainty of such historical records is an unsolved and challenging issue for this type of Digital Humanities projects.

The information extracted will then be mapped into an historical large geographical information system. Relation between places, people and events will be mapped geographically (i.e. associated with georeferenced points, lines and polygons). The objective is to build a kind of “Google maps” of the past enabling to zoom into a particular region at a particular time and visualized different layer of information. It will also permit other forms of spatiotemporal visualization like dynamic representation of the biography of people.

Requirements

The ideal PhD candidates will have a background and interest in text mining and semantic web technologies, a strong Computer Science background, an interest for History and a good knowledge of French.

Starting Date

The position is available on Sept 1, 2013.

To Apply

Given the short starting date of this project, contact directly Prof. Frederic Kaplan by indicating your interest in this project and sending a CV (frederic.kaplan@epfl.ch).

Information about the EPFL DHLAB.

Digital Humanities is an interdisciplinary domain applying computational methods to conduct research in the humanities. The Digital Humanities Laboratory (DHLAB), founded in 2012 by professor Frédéric Kaplan develops new computational approaches for rediscovering the past and anticipating the future. Projects conducted at the lab range from building "Google maps of ancient places" to studying how algorithms transforms the way we write.

Benefiting from EPFL's strong technological expertise, the DHLAB conducts research projects in collaboration with prestigious patrimonial institutions and museums, all over Europe. The lab's interdisciplinary team includes computational scientists, mathematicians, experts in geographical information systems and interaction designers - all with transdisciplinary backgrounds facilitating interaction with humanities's scholars from all disciplines.



Università degli Studi di Siena - Via Banchi di Sotto 55, 53100 Siena - Italia