In recent weeks, my project has taken an unexpected turn from data storytelling and visualization towards one of data processing. As it turns out, our partner organization (Densho.org) has already done some data cleaning in Open Refine, created a database, and began preliminary data processing. I’ll be using Python and Jupyter Notebook to continue the work they’ve started, first by testing previous processes and then by creating new processes. I also found out that the data doesn’t have unique identifiers so I’ll be using the following workaround for attempting to isolate pockets of data.
In this partial example (there’s more to it than what’s seen in this screenshot), I’ll need to query the data using a for loop that searches for a combination of first name, last name, family number, and year of birth in order to precisely locate data in a way that potentially replicates the use of a unique identifier. I’m finding that not having a unique identifier makes it much more difficult to access data quickly and accurately, but hopefully this for loop will do the trick. I’m looking forward to playing with the code more and seeing what can be discovered.