This past week, I’ve been working on a function in Python that merges the two different datasets (WRA and FAR) so as to simplify the process of querying the data.
The reason for merging the data was to find a simpler alternative to the previous function for searching developed by Densho which involved if/else for loops to pull data from each dataset.
Now, one can search the data for a particular person and recover all of the available information about that person in a simple query. After the merge, the data output looks something like this when formulated as a list:
In addition to this, I’ve also played with some basic visualizations using Python to display some of the data in pie charts. I’m hoping to wrap up the last week working on more visualizations and functions for querying data.
Decoration information of the manuscripts is one of the most complex categories of information in the dataset, and to visualize it needs much work of data pre-processing. There are two layers of information that is contained in the dataset: a) one is what decorations the manuscripts include; and b) the other is how those decorations are arranged across the manuscripts. Delivering such information in the dataset may potentially communicate the decorative characteristics of the book of hours. For the what part, I identified several major decorative elements of the manuscripts from the dataset and color-coded each element in the Excel sheet, such as the illuminated initial, miniature (large and small), foliate, border (border decorations), bookplate (usually indicating the ownership of the book), catalog, notation, and multiple pictorial themes and imageries (e.g., annunciation, crucifixion, Pentecost, betrayal, and lamentation, Mary, Christ). Figure 1 demonstrates my preliminary attempt to visualize the decorative information of the manuscripts. I coded the major decorative patterns of the visualizations for the left half of the coding graph and the major pictorial themes (e.g., Virgin, Christ, Annunciation) for the right half of the graph. From this preliminary coding graph, we could see that there appears two general decorative styles for the book of hours. One type of decoration focuses on making the manuscripts beautiful and the other type focuses on displaying stories and the meaning behind them using pictorial representations of the texts. I then went back to check the original digitized images of the manuscript collection and found that the patterns were mostly utilized to decorate texts (appear surrounding the texts) while the other style appears mostly as full-leaf miniatures supplementing the texts. A preliminary analysis of the two styles’ relationship with the geographic information also suggests that the majority of the first decoration style is associated with France while the other that’s more emphasized on the miniature storytelling is more associated with the production locations such as Bruges.
For the second step, I explored the transitions as well as relationships among different decorative elements using Tableau, Voyant, and Wordle. Figure 2 is a word cloud that demonstrates the frequency of the major decoration elements across the whole manuscript collection. The Voyant Tools, in comparison, provides a way to further demonstrate the strengths of relationships among different decorative elements across the dataset. Here is an example. Treating all the decoration information as texts, the “links” feature in Voyant demonstrates the relationships among different elements. For instance, we could see that the strength of the link between the “illuminated” and “initial” is the strongest and there are also associations between different elements of decoration, such as “decorated,” “line,” “miniature,” “border,” “bookplate,” and “vignette.” The dataset has also attested that patterns such as illuminated initials, miniature, and bookplates demonstrating the ownership of the book, are the most common elements. The links, however, do not present any of the relationships among different themes.
Finally I met my mentor, Peter Logan last Monday, and it was great to see him in person. In this meeting I presented the progress of the project and figured out that perhaps a TEI format would be a good data format for me to move forward. As pending action item, TEI format will be generated and provided by Peter.
Additionally, earlier today, LEADS-4-NDP 1-minute madness was held. I presented the progress of the project to co-fellow and the LEADS-4-NDP advisory board.
In recent weeks, my project has taken an unexpected turn from data storytelling and visualization towards one of data processing. As it turns out, our partner organization (Densho.org) has already done some data cleaning in Open Refine, created a database, and began preliminary data processing. I’ll be using Python and Jupyter Notebook to continue the work they’ve started, first by testing previous processes and then by creating new processes. I also found out that the data doesn’t have unique identifiers so I’ll be using the following workaround for attempting to isolate pockets of data.
In this partial example (there’s more to it than what’s seen in this screenshot), I’ll need to query the data using a for loop that searches for a combination of first name, last name, family number, and year of birth in order to precisely locate data in a way that potentially replicates the use of a unique identifier. I’m finding that not having a unique identifier makes it much more difficult to access data quickly and accurately, but hopefully this for loop will do the trick. I’m looking forward to playing with the code more and seeing what can be discovered.