Category Archives: Databases

What Can Digital Humanities Do With Crowdsourcing?

Crowdsourcing in the Digital Humanities can entail a lot of different actions and concepts, so this post seeks to explore the effect that the public can have in contributing to the Digital Humanities; whether they realize it or not! Crowdsourcing is done through voluntary means, and is carried out by contributing to some sort of project, in the case of DH. This can mean a variety of things, but I want to focus on areas in which I believe the public can make the most significant difference in the field: the transcription and/or digitization of texts in existing collections run by an educational or non-profit institution, and community-driven collections.

The arduous process of digitizing collections to its fullest extent can be a daunting task for one group to accomplish. Some projects seek the aid of the community; a newer example being the Library of Congress (LOC). Their “By The People” project asks anyone to transcribe text from scanned images of their collections. The type of source varies: this can include handwritten letters, sheets of music (with lyrics and some indicators for the performers), promotional material, and more! Without making an account, one can go in and participate in transcribing texts. After the entire scan is transcribed, it is sent in for review by users and the LOC. Projects liken this process to a “puzzle” in some cases, and it is completely voluntary and non-committal.

The site will be linked below if you wish to explore the collection, or contribute!

Link: https://crowd.loc.gov/

Another way that the community engages in crowdsourcing content is doing it themselves! There are plenty of forums and projects dedicated to involving the community by offering a place to store and showcase collections digitally or traditionally. This can include common narratives, such as family memories being digitized that tell a greater story, lineage, and ties to the military being digitized, to name a few examples. When the public feels like they can contribute to something that honors memory, provides entertainment, or informs others, there is a value to contributing to a body of work or a project.

What seems challenging is selling that to the public. It is all about the framing, in other words. As I mentioned before, formal projects characterizing the transcription of scans as “puzzles” in some instances highlights an interesting approach to contributing to DH, without FEELING like one is contributing to something “boring” or perhaps “nerdy.” There must be an emphasis on non-committal entertainment and the true value of the work. Everyone is different, which makes outreach such a meticulous endeavor.

Overall, crowdsourcing can be full of uncertainty surrounding retention and garnering an adequate audience. When one expects  small contributions from hundreds, they may receive dozens of dedicated users that see the vision of the creator(s). The Digital Humanities needs the public, and can always learn from them and innovate in involving them. Crowdsourcing is a great step towards a sort of hands-off engagement, but more and more steps are being taken to ensure that voices are heard, and contributions are recognized across the Internet!

Why Metadata Matters

Metadata, to me, can be defined simply as “the details of data points.” By this, I mean that metadata serves as an organizational tool, while also providing context surrounding an object or text. If one were to manually fill in the metadata on an image of one’s own common frying pan, for instance, one would take dimensions, identify raw materials, when (and potentially where) the image was taken, the file format, copyright information, and so on. If the object or text is not one’s own, then this would necessitate the addition of where the item was found and analyzed; whether this be an archive, a collection, or other such creations.

Digital tools assist immensely in keeping this information together in an ethical and efficient way that provides proper context and credit. However, the effectiveness of these tools is dependent on the user compiling their primary and secondary sources. Omeka and Tropy, for instance, provide premade and customizable templates to fit the needs of the source one is adding to their online exhibit or archive, respectively.

In trying to understand the importance of proper, manually-generated metadata, we can start with the reliability of records versus human memory when one must utilize these kinds of online tools. Research requires a multitude of sources to make a convincing and holistic argument/narrative. When considering the arduous task of conducting research itself–let alone turning that into a coherent piece–one must consider that these programs are here for a reason. The field of history and Digital Humanities in general are dependent on ethical citation. They are fields that build off centuries of analysis and research to improve our understanding of the world. The manual creation of metadata, in my eyes, is a two-step process: the creation, and the observation. If one values their peers, it is vital to understand where one’s objects originate, which will help others build off of your own findings.

George Mason University Database Review (African American Periodicals: Voices of Black society and culture, 1825-1995)

Link to Database: https://infoweb-newsbank-com.mutex.gmu.edu/apps/readex/?p=EAPX

Overview: The African American Periodicals: Voices of Black society and culture, 1825-1995 database features “news, commentary, advertisements, literature, drawings and photographs” from African American society and culture in the United States, as described by their “How to use this database” page. This comes from the curation of “170 periodicals from 26 states” originating from collections at Harvard and the Wisconsin Historical Society. These are digitized texts that have been transcribed for user analysis.

History: This database is based on the work of award-winning historian James P. Danky (1947-Present) at the University of Wisconsin.

Info from Publisher: https://infoweb-newsbank-com.mutex.gmu.edu/apps/readex/product-help/eapx?p=EAPX

Search: Search options include the following: simple and advanced searches with filters based on origin of publication, date ranges, location, “eras in American History,” presidential administrations, and a text explorer. The text explorer was designed as a three-step process: one searches for their topic, selects any relevant documents, and then the user can analyze these documents by frequency of words, people, phrases, and other things of that nature. Once the user has found their documents, they can export them through a built-in email service.

Citations: One can use the following citation styles on the site: MLA, APA, AGLC, ASA, CMS, Harvard, and Turabian. If the preferred citation style is not available on the cite, the user can export the citation information into a different service or tool to edit in order to fit their formatting needs.

Reviews: Based on available reviews, the database is held in high regard. The following is an example of one review out of a University of Oxford blog: https://blogs.bodleian.ox.ac.uk/history/2020/06/18/new-african-american-periodicals-1825-1995/

Access: It is accessible through universities that have opted-in/purchased access to the archives. It seems that it is a paid service for students and researchers through their respective universities.