Wikiversity:Fellow-Programm Freies Wissen/Einreichungen/Building on Past Places: Using Wikidata to Connect Historical Sources
Building on Past Places: Using Wikidata to Connect Historical Sources[Bearbeiten]
Open data and open science are often viewed in the context of either the natural or life sciences, or the social sciences. But Humanities researchers, whose scholarly practice has, traditionally, been highly individuated and solitary in nature, are increasingly realising the advantage of working collaboratively, using large-scale sources of openly available data. Digitised texts, images, manuscripts and maps have become the subject of enquiries using data mining, natural language processing, and new transcription technologies. Unlike scientific data, however, humanities data often needs to be extracted from heterogenous sources, and the data itself is highly heterogenous. While using Linked Open Data makes it possible to connect these varied sources, it is essential that there is a central, verified source against which to resolve the data.
Wikidata offers scholars in the Humanities a powerful source of structured, lined open data with which to conduct new research in established fields such as History, Classics, and Area Studies, as well as in emerging Humanities fields such as the Digital and Spatial Humanities. However, the sheer volume of data, and it's heterogenous nature has made it difficult to imagine where to begin with enquiries.
At the same time, projects which use linked open data to connect historical sources have suffered from a lack of structured data sources. The most frequently used sources take the form of gazetteers - geographical dictionaries or directories used in conjunction with maps or atlases. These sources, however, are difficult to compile, and tend to reflect this biases of their context - ie: mostly Western in their orientation (although there is a long tradition of gazetteer compilation in China) and with a focus on distant historical periods. The increasing use of Wikidata by GLAMs and researchers, has highlighted an opportunity to explore the possibility of using Wikidata as a source of geo-coordinates for a gazetteer. The richness of Wikipedia (and therefore Wikidata) in areas where there is a lack of detailed gazetteers ( eg: the global south) and specific periods (eg: Imperial-era India) means that there is an opportunity to create valuable new datasets and resources.
This project will explore the ways in which Wikidata can be used to produce a comprehensive gazetteer of world-historical places for a specific period by linking place names from Wikidata (and potentially other sources as a test of the interoperability of the model) in order to produce an open resource for Humanities scholars, GLAM data managers and any other users of digital historical sources.
Beschreibung des Vorhabens
Open data and open science are often viewed in the context of either the natural or life sciences, or the social sciences. But Humanities researchers, whose scholarly practice has, traditionally, been highly individuated and solitary in nature, are increasingly realising the advantage of working collaboratively, using large-scale sources of openly available data. Digitised texts, images, manuscripts and maps have become the subject of enquiries using data mining, natural language processing, and new transcription technologies. Unlike scientific data, however, humanities data often needs to be extracted from heterogenous sources, and the data itself is highly heterogenous. While using Linked Open Data makes it possible to connect these varied sources, it is essential that there is a central, verified source against which to resolve the data. Many projects have used GeoNames as this source, but Wikipedia's underlying requirement that each entity be "notable" makes the gazetteer subset of Wikidata smaller than GeoNames, but also better defined and cleaner. For this reason, this project aims to extract and collate a set of Wikidata identifiers for a specific period and region, which may be used to link together a federation of historical sources, using the framework created by the gazetteers being constructed by the Mellon Foundation-funded Pelagios project (http://commons.pelagios.org). This set can then be added to their Linked Data ecosystem and search tool, Peripleo (http://peripleo.pelagios.org/) as a contribution to the larger linked data cloud.
By identifying a dataset which would be useful to the community, extracting the place names from Wikidata and creating a spinal gazetteer, which can then be added back into the ecosystem, the project will be able to create new linked data resources for use in scholarly research. At the same time, the project will aim to document how to do this extraction and resolution, into a step-by- step cookbook in order to enable other researchers and GLAM practitioners to replicate the process using Wikidata for their own research purposes. This will help to expose more humanities researchers to using Wikidata (and vice versa).
This project depends entirely on open, and everything it produces will be open - the principles are baked in from the start.
All the resources created will be openly available for use, reuse and modification, as well as being added to the global linked open data cloud. Likewise, the how-to cookbook will be make freely available, on both the Pelagios and Wikiversity platforms (and any other appropriate spaces) in order to maximise the number of people who will be able to use it to create their own Wikidata-based gazetteers. At the conceptual level, this project aims to bring the ideas of open scholarship and linked open data into the world of traditional humanities scholarship and GLAMS, in order to show how using openly available sources, and creating openly available knowledge products is compatible with the objectives of humanities research.
There is already a great deal of theoretical discussion in Linked Data and GLAM communities about how to use Wikidata in their institutions, but less discussion of how to extend this use beyond institutional walls. During the fellowship, I plan to continue these discussions with concrete examples, using the gazetteer and its uses to illustrate how open data in general, and wikidata in specific, can be used. At the same time, scholarly communities, both in Berlin and further afield, are beginning to show an interest in using Linked Open Data, but lack the practical know-how. Through interaction with these communities, I hope to expose them to the value and importance of open data, and convince them of the need to contribute their data to the network.
Intermediate report (in English only) / Zwischenbericht[Bearbeiten]
Final report(in English only) / Abschlussbericht[Bearbeiten]
Creating Fellow-Programm Freies Wissen/Einreichungen/Building on Past Places: Using Wikidata to Connect Historical Sources/Abschlussbericht
I. Infos zum eigenen Forschungsvorhaben (max. 3000 Zeichen)[Bearbeiten]
A. Zusammenfassung und Ergebnisse[Bearbeiten]
There were several changes to my original research plan during the course of the project. However, I am happy that they took place. Rather than extracting data from Wikidata in order to create a new resource, I was able to add data to Wikidata, by building connections between Wikidata entities, and their appropriate Linked Data Identifiers, such as the Pleiades gazetteer of ancient places, and the Digital Atlas of the Roman Empire. These connections are critical for the strengthening of the web of Linked Open Data, and allow researchers to connect resources across the web, this breaking the silos of information about the ancient world.
B. Beitrag zu Offener Wissenschaft[Bearbeiten]
Building connections between sources is critical for the strengthening of the web of Linked Open Data. This web is based on the concept of data being free, open, and reusable, in order to allow researchers to connect resources across the web, thus breaking the silos of information about the ancient world. Linked Open Data, as the data format which underlies both Wikidata and the gazetteers referenced above is an entirely open endeavour.
II. Zusammenarbeit mit Fellows und Mentor*innen (max. 3000 Zeichen)[Bearbeiten]
A. Zusammenarbeit mit deiner Mentorin/deinem Mentor[Bearbeiten]
I met regularly with my mentor, via face-to-face meetings, over video-calls and through Twitter. We met face-to-face twice, once in Berlin at my institution and once at their institution in Hannover. All of these interactions were extremely valuable, both for the project (I picked up mad Wikidata skills) and more generally for me, as an early career researcher. We both intend to keep the interaction going, since we are part of many of the same communities of librarians, information architects and cultural heritage scholars.
B. Austausch mit anderen Fellows[Bearbeiten]
I think I was lucky to be in Berlin - I was able to meet regularly with several of the other Fellows face-to-face at formal Wikimedia DE events, and socially. I also met with two other Fellows who were not based in Berlin, but who worked on similar projects. We used video conferencing tools for our meetings, which took place twice during the duration of the Fellowship.
III. Kommunikation und Vernetzung (max. 3000 Zeichen)[Bearbeiten]
A. Kommunikationsaktivitäten mit Bezug zum Fellow-Programm[Bearbeiten]
After participating in one of the Wikidata training workshops at the FU, I wrote a blogpost for Wikimedia DE, describing my project and how I planned to use Wikidata. I also described my research and how to use Wikidata for cultural heritage research as part of a session I taught in the spring at the University of Graz, and during a session on cultural heritage data at the DARIAH annual meeting in Warsaw.
B. Weitergabe von Wissen[Bearbeiten]
I found it very easy to share my experiences and knowledge with external institutions and at external events. The Wikimedia DE team provided me with materials which I was able to take with me to the lectures I gave, and there is a great deal of material online for those looking for Wikidata how-tos.
C. Neue Kontakte mit der Community für Offene Wissenschaft[Bearbeiten]
Within the Fellows programme, I made many contacts in the Open Science community - with other Fellows and with the other mentors. However, I have also been working in the Open Science space for several years. This is not to say that I know everyone in this environment - the movement is growing all the time, but I already have a fairly extensive network of contacts who work on these topics within the Linked Data and cultural heritage worlds.
D. Neue Kontakte mit Vertreter*innen der Wikimedia-Communitys[Bearbeiten]
Until I began this project, I understood the basic concepts behind Wikidata, but I had no idea of how powerful it is. There is a lot of discussion in the cultural heritage field about how to use Wikidata for our research, and this Fellowship, by allowing me to work with Wikidata and the community behind it, has made it possible for me to participate in these discussions from a position of understanding, and experience.
I would like to remain active via the Fellows Alumni list, and intend to continue working with the Wikidata community - hopefully at Wikidata Con .
IV. Förderung von Offener Wissenschaft (max. 4000 Zeichen)[Bearbeiten]
A. Neue Initiativen zur Förderung Offener Wissenschaft[Bearbeiten]
My institution is already committed to publishing a open access journal and an open science blog - these were established before my fellowship began. within my project, we already publish all our code openly on Github, and use open licenses for all our other outputs.
B. Initiativen zur Förderung Offener Wissenschaften[Bearbeiten]
I'm not sure it is possible to open up the work we do much more - this is a good thing! But I think discussing the usefulness of Open Science, and ways in which open data, and in particular Linked Open Data needs to be managed, in order to be useful, as well as open, is something that we will continue to work on. Open for the sake of open is not always the best route to take. Data must be useable, and manageable if it's open nature is to be exploited .
C. Interesse an Offener Wissenschaft[Bearbeiten]
There is certainly interest in the programme - less from my institution, and more from cultural heritage professionals (archivists, librarians and museum curators) who I work with, and who understand the value of open data.
D. Anwendung von Prinzipien Offener Wissenschaft[Bearbeiten]
The Programme was extremely valuable to me as a way of taking what I know theoretically about open data, and applying it, by teaching me how to work with Wikidata. The learning curve for tools and resources like Wikidata is steep, and it is important that we do not underestimate how critical it is to provide opportunities for people to learn how to use them, if we want to encourage the use of such open resources and tools.
Dein persönliches Gesamtfazit[Bearbeiten]
I have enjoyed my time as a Fellow enormously. It allowed me time to develop technical skills, as well as introducing me to a network of like-minded scholars. I hope that the end of the fellowship is not the end of my participation - I would like to continue working on my project, and to get more embedded with the Wikidata GLAM community.