Humanities Literature Navigation

Problem Statement

Over the last twenty years, materials from the humanities domain have been stored digitally and made available with associated metadata but have not been utilized effectively by scholars. Identifying associate articles and authors is challenging due to the interdisciplinary nature of digital humanities, but essential for effective collaboration. This poster showcased a search engine (https://humanities.lairhub.com/) to identify related scholarly works and authors for improving collaboration within the humanities domain.

Dataset

Before identifying the relationship between articles and authors, our study developed an Extract-Transform-Load (ETL) pipeline. Inputs of the pipeline were the bibliography images collected from scholarly articles. Input images collected using iPhone and screenshot tool in Windows 11. The articles were retrieved using the Humanities Index tool with “digital humanities” as a query term. The pipeline leveraged the Google Vision API tool for optical character recognition, extracted the text from the images, and stored them in a staging area (figure 2). The pipeline sends extracted text to the Open AI API for collecting article title, authors, and DOI (GPT-3.5-turbo model) using the prompt below:

Figure 1: Prompt to collect bibliography information
Figure 2: Pseudocode for bibliography Extraction
Figure 3: Pseudocode for abstract collection

The pipeline performs the initial search through the Crossref API using the title and author attributes generated by the ChatGPT and selects the top document. In the second step, the pipeline stores the document metadata (title, abstract, authors, creation date, document type) in a relation database management system in a normalized form following the Dublin Core metadata properties. The pipeline sequentially requests the Scopus, IEEE, and Semantic Scholars API if the abstract is unavailable on Crossref. (figure 3).

Scope of the Database

Number of Documents3162
Number of Abstracts2167
Number of Journal Articles2002
Number of Books78
Last Publication DateJune 2024