Legal theory knowledge graph
Research Project
With the legal theory knowledge graph project at the Max Planck Institute for Legal History and Legal Theory, I aim to produce publicly accessible, machine-readable data on legal theory scholarship (focusing on socio-legal theory), mapping scholars, institutions, publications and citations in a network graph. I collect data for this graph by harvesting metadata, using text mining techniques and manually curating datasets on actors and knowledge production in (socio-)legal theory.
The project provides methodological input for my other projects, including Socio-legal Trajectories and A different legal science - the sociology of law in German-speaking countries after 1945. It consists of several phases:
- Exploratory phase (completed between 2022-2023): in addition to researching existing scholarship and available technologies, I conducted a pilot study to create a text corpus and metadata collection for the German-language Zeitschrift für Rechtssoziologie and the UK-based Journal of Law and Society. The primary aim of the pilot study was to learn the necessary technologies, identify relevant data sources, and connect with other researchers and institutions working on similar issues. As a result, I produced code and workflows that could produce statistical and network graph data on the journals. In particular, I co-organised a workshop on reference extraction technologies.
- Improving and validating methods (ongoing): while the code and workflows generated in the first phase were good enough for proof of concept, the resulting datasets are still deficient. Current open and proprietary bibliometric databases offer inadequate and often times faulty coverage of non-English literature as well as the humanities and social sciences in general. This is particularly evident in the area of (socio-)legal theory. Moreover, the performance of Open Source reference mining software still has to be considerably improved before it is able to compensate for this lack of existing data. Using the newly emerging technologies based on Large Language Models, this phase will focus on optimising available workflows. It will conclude with a methodologically rigid validation of the reference extraction algorithm, which is to be used in the next phase.
- Graph data ingestion: in this phase, a more substantial number of scholarly works and their authors will be ingested into a consolidated knowledge graph using the validated and optimised workflows. While the focus of the ingestion will primarily serve my own socio-legal research, it can easily be broadened to include legal theory and legal philosophy more generally to the extent that the relevant data is available or can be generated in collaboration with scholars who want to analyse the data for their own research questions.
- Open-sourcing of methods and data: in the final phase of the project, all suitable data and the code will be made public. Preferably, the data will be stored in an institutionally maintained repository of open knowledge graph data, such as WikiData.
For the project, I collaborate with Andreas Wagner, David Carreto Fidalgo (mpcdf) and Christine Rimmert (Kompetenzzentrum Bibliometrie). Code and Datasets produced within the project are published at zenodo.org.