Week 8-10: Re-organizing place and date information. Based on the problems that have appeared in the current version of visualizations, I performed another round of data cleaning and modification, especially for the date and geography information. With the goal of reducing the categories for each visualization, I merged some more data into others. For example, all the city information was merged into countries, single date information (e.g., 1470) was merged into the corresponding time period (e.g., in the case of the year 1470, it was merged into the 1450-1475 time period), and inconsistency of data across the time and geography categories was further manipulated. As demonstrated in the following example, the new version of visualizations gets more “clean” in terms of the number of categories and becomes more readable. For the last couple of weeks, I have also had discussions with my mentor about the visualizations, the problems I had, and have worked with my mentor for the data merge. I’m also working on a potential poster submission to iConference 2020.
Decoration information of the manuscripts is one of the most complex categories of information in the dataset, and to visualize it needs much work of data pre-processing. There are two layers of information that is contained in the dataset: a) one is what decorations the manuscripts include; and b) the other is how those decorations are arranged across the manuscripts. Delivering such information in the dataset may potentially communicate the decorative characteristics of the book of hours. For the what part, I identified several major decorative elements of the manuscripts from the dataset and color-coded each element in the Excel sheet, such as the illuminated initial, miniature (large and small), foliate, border (border decorations), bookplate (usually indicating the ownership of the book), catalog, notation, and multiple pictorial themes and imageries (e.g., annunciation, crucifixion, Pentecost, betrayal, and lamentation, Mary, Christ). Figure 1 demonstrates my preliminary attempt to visualize the decorative information of the manuscripts. I coded the major decorative patterns of the visualizations for the left half of the coding graph and the major pictorial themes (e.g., Virgin, Christ, Annunciation) for the right half of the graph. From this preliminary coding graph, we could see that there appears two general decorative styles for the book of hours. One type of decoration focuses on making the manuscripts beautiful and the other type focuses on displaying stories and the meaning behind them using pictorial representations of the texts. I then went back to check the original digitized images of the manuscript collection and found that the patterns were mostly utilized to decorate texts (appear surrounding the texts) while the other style appears mostly as full-leaf miniatures supplementing the texts. A preliminary analysis of the two styles’ relationship with the geographic information also suggests that the majority of the first decoration style is associated with France while the other that’s more emphasized on the miniature storytelling is more associated with the production locations such as Bruges.
For the second step, I explored the transitions as well as relationships among different decorative elements using Tableau, Voyant, and Wordle. Figure 2 is a word cloud that demonstrates the frequency of the major decoration elements across the whole manuscript collection. The Voyant Tools, in comparison, provides a way to further demonstrate the strengths of relationships among different decorative elements across the dataset. Here is an example. Treating all the decoration information as texts, the “links” feature in Voyant demonstrates the relationships among different elements. For instance, we could see that the strength of the link between the “illuminated” and “initial” is the strongest and there are also associations between different elements of decoration, such as “decorated,” “line,” “miniature,” “border,” “bookplate,” and “vignette.” The dataset has also attested that patterns such as illuminated initials, miniature, and bookplates demonstrating the ownership of the book, are the most common elements. The links, however, do not present any of the relationships among different themes.
Figure 3. Voyant analysis of the decorating information.
During week 3 I focused on working with the date information of the manuscripts data. Similar to the geographical data, working with date information also means working with variants. The date information of the manuscript data is presented in descriptive texts (e.g., early 15th century); and the ways of the description vary across the collection. Most of the time data appear as ranges, e.g., 1425-1450), and there is a lot of overlap between the ranges. Ambiguity of the information exists across the dataset, mostly because the date information was collected and pieced together from texts of the manuscripts. Additionally, some manuscripts appear to be produced and refined during different time periods, with texts created earlier during the 14th and 15th centuries and illustrations/decorations added at a later time – for example. So the first task I did was to regroup the date information and make it more clear for visualization. This graph shows how I color-coded the dataset and grouped the data into five general categories – before the 15th century, 1400-1450 (first half of the 15th century), 1450-1500 (second half of the 15th century), 16th century, and cross-temporal/multiple periods.
Based on the groupings, I created multiple line graphs, histograms, and bar charts to visualize the temporal distribution of the book of hours productions from different aspects. The still visualizations assisted me in finding some interesting insights – for example, the production of the book of hours experienced an increase from the 1450s onward, which was relatively the same period of the inventing of the printing press.
But one problem of the still graphs is that they can’t effectively combine the date information with other information in the dataset, to explore the relationships between various aspects of the manuscript data and to display the “ecosystem” of the book of hours production and circulation in the middle ages Europe. Some questions that might be answered by interactive graphs include: If, during certain periods of time, was the book of hours production especially popular in certain countries or regions? And, did the decorations or stylistics of the genre change over time? To explore more interactive approaches, I am also exploring TimelineJS and creating a chronological gallery for the book of hours collection. TimelineJS is a storytelling tool that allows me to integrate time information, images of sample book of hours, and descriptive texts into the presentation. I am currently communicating with my mentor about this idea and I look forward to sharing more about it in the next few weeks’ blogs.
During week 2 I started working with data that demonstrates the geolocation where the manuscripts were produced and used. Something I didn’t quite realize before I delved deep into the data is that they are not simply names of places, but geo-information represented in different formats and with different connotations. The variety of the geo-location data exists in the following aspects: a) missing data (i.e., N/A), b) different units presented in regions, nation-states, and cities, respectively; c) suspicious information (i.e., “?”) indicated in the original manuscripts, d) change of geographies over different historical periods, so being hard to visualize the inconsistency of geographies over time; e) single vs. multiple locations represented in one data entry. Facing this situation, I spent some time cleaning and reformatting data as well as thinking about strategies to visualize this part of the data.I merged all the city information with country/nation-states information and also conducted some search for old geographies such as Flanders (and found its complexities…). The geographies also transit during times, which is hard to present in one single visual. I created a pie chart that shows the proliferation and popularity of the book of hours in certain areas, and multiple bar charts showing the merged categories (e.g., city information, different sections of Flanders area). I also found a map of Europe during the middle ages (time period represented in the dataset) and add other information (e.g., percentage) to the map, which I think may be a more straightforward way to communicate the geographical distribution of the book of hours productions. As the geographical data are necessarily related to the temporal data and other data categories regarding the content and decorations of the manuscripts, for the next step I’m aiming to create more interactive visualizations that can connect different categories of the dataset. I’m excited to work with such complexities of the manuscripts data, which also reminded me of a relatively similar case I encountered before about Chinese manuscripts, where the date information was represented in various formats, especially in a combination of the old Chinese style and the western calendar style. Standardization might not always be a good way to communicate the ideas behind the data and to visualize the complexity is a challenge.
My LEADS fellowship placement is with the University of Pennsylvania Libraries, Digital Research Services. The project this year aims to visualize a digitized collection of book of hours manuscripts produced in middle ages Europe. The major idea behind the project is to better introduce and communicate this specific genre of book production to the audience, using visual forms and languages.
During the Drexel University boot camp between June 6-8, I took the best use of the time to visit the UPenn Library and had a meeting with my project mentor Ms. Dot Porter. We discussed the project goals and the major tasks to successfully deliver the project. We identified two possible ways to present and share our major project outcomes, one as a research paper and the other as an interactive website displaying and communicating the visualizations.
I spent the first week of LEADS project to get familiar with the “book of hours” as a genre and an artifact, reading secondary sources recommended by my mentor. By reading those materials I developed a better understanding of the book of hours in terms of its history, major characteristics, and uniqueness in the religious life of the middle ages, which has been helpful for me to think of ways to visualize the manuscript data. Week 2 was mostly utilized to browse the dataset and propose visualization strategies. The book of hours initial dataset contains information of 185 digitized manuscripts, including their dates of production, the provenance of production and circulation, the contents (i.e., passages of prayer), and the decorations. Thinking about the visualization strategies, my mentor and I had a Skype check-in and discussed issues regarding which types of visualizations and graphs to create and some potential problems involved in the visualization processes. I also reflected on the ideas and theories communicated in the information visualization session at the boot camp when trying to identify the most effective visualization strategies for the manuscript data. Following the discussion with my mentor, I started actually working with the initial dataset — the provenance data of manuscript productions in particular. As visualization goes on, I feel that each graph tends to be more complex than it appears and manuscripts data visualization is quite a craft.