This is Julaine and my assignment is with the Digital Public Library of America (DPLA). The DPLA has more than 3 million unique subject headings, with only a portion of those being from controlled vocabularies which can lead to various issues arising when records use slight term variations or synonyms for the same concept.
The aim of my project is to continue working on the development and testing of an effective method for analyzing record content and matching content. This includes keywords with relevant controlled terms from a defined list, in an effort to create a consistent vocabulary to aid users and that can be reliably re-ingested as well as consistently support analytics.
I have spent the last couple of days reading through a ton of documentation about the work that has already been completed on this project. Familiarizing myself with the DPLA Metadata Application Profile and getting set up and familiar with the software and data that has been recommended for use. I have been exploring, for the first time, Apache Spark and I am slowly finding my way around it (downloading, installing and setting up the environment for its use on my machine and reviewing tutorials),so I haven’t really done much in terms of coming up with any solutions to this problem as I am just getting to know the tools and the data.
My mentors have been incredibly supportive and helpful and make themselves available to me in several ways. I expect I will learn a lot from working with them and am feeling really thankful for that. We use various tools such as Slack, Zoom and email to stay in touch so I am feeling positive about having access to direction or support if and when I need it.
Well, that is about all I have to report at this time.
I wish everyone the best of luck going forward with their projects.