The LIS Education and Data Science Integrated Network Group (LEADING), is a Laura Bush 21st Century Librarian (LB21) National Digital Infrastructures and Initiatives project, supported by the Institute of Museum and Library Services (IMLS). The LEADING project scales-up the highly successful LEADS-4-NDP initiative and will prepare a diverse, nation-wide cohort of 50 LIS doctoral students and early to mid-career librarians for data science endeavors.
To hear previous LEADS-4-NDP fellows discuss their internship experience, watch the videos below:
LEADING’s model includes community hubs at Montana State University Library and UC San Diego Library, a co-educational hub at OCLC, and 14 member nodes that also serves as project mentoring sites. Drexel University’s Metadata Research Center serves as the central-coordinating hub and will oversee the data science curriculum and bring together all project partners.
LEADING’s key goals include:
- Extending the Education Pipeline by leveraging Drexel University’s data science curriculum and faculty expertise.
- Collaborating and Community Development by bringing together educators, researchers, library leaders, frontline early and mid-career information professionals, and doctoral students, who will collaborate and advance data science in LIS education and practice.
- Facilitating Diversity and Inclusivity across the LEADING Fellowship program in partnership with LEADING’s Diversity and Inclusivity Task Force.
- Building a Sustainable and Extensible Partnership that enables a culture of mutual growth and continued sharing across the LEADING network.
LEADING Fellowship Projects at Member Node Sites
|Project Site||Project Title (Link to full project description)||Project Goals||Data|
|Academy of Natural Sciences (ANS)||From Natural History Literature to Linked Open Data Biodiversity Knowledge Graph||Automate the identification and disambiguation of specimen descriptions in the historical full-text of the Proceedings of the Academy of Natural Sciences of Philadelphia.||Source: Link|
Type: Plain text (OCR), GBIF datasets
|AI-Collaboratory, University of Maryland iSchool (AIC–MD)||Revisiting the WWII Japanese American Incarceration Camp Experience:|
Spatial and Temporal Representations from the “Inside Out”
|Analysis of Densho data records to identify people who were in camps, how they lived, and how they connected in order to resist and endure these circumstances.||Type: CSV, shapefile, graph database, text|
Size: 50K files, less than 1TB
|California Digital Library, University of California, Office of the President (CDL)||Moving a Metadata Meritocracy into Production||Work with the yamz.net database of terms and the yamz source code to refine workflows for gaining acceptance of proposed metadata terms in a stable production environment.||Source: Link|
Type: Postgres RDB, 10k rows
|Digital Scholarship, Tish Library, Tufts University (DTTu)||Data & Decision-Making for Consortial Ebook Acquisitions||Analyze and visualize consortium-wide ebook usage data to examine use patterns across the individual institutions.||Type: Tabular title lists and usage reports|
|Kislak Center for Special Collections, Univ. of Pennsylvania Libraries (UPenn)||Authority Creation and Exploitation in the Schoenberg Database of Manuscripts||Integrate or link UPenn authority records for manuscript name and place metadata into VIAF or Wikidata knowledge bases. Visualize authority-related data for publication, and experiment with the dataset.||Documentation: Link|
Type: Structured metadata, MySQL Relational database
|Loretta C. Duckworth Scholars Studio, Temple University Libraries (LCDSS)||Enhancing and Visualizing Philadelphia Black Artist Records in Wikidata||Develop SPARQL queries for Wikimedia records about black artists in Philadelphia to identify gaps and disparities in the records that can be enhanced with library catalogs, analyzed, and visualized.||Type: Wikidata database RDFs|
|LYRASIS Project 1||Measuring the impact of the LYRASIS Catalyst Fund:|
A Data Visualization/Program Impact Assessment
|Survey development, data mining, and data visualization to measure the impact of the LYRASIS Catalyst award program. Impact areas to assess include the development of innovative solutions; synergistic outgrowths; the impact on Libraries, archives and museums; sustainability; and the impact to the broad LYRASIS community.||Type: Structured and unstructured data, CSV|
|LYRASIS Project 2||Assessing Overlap and Aggregation Potential of Open-Source Software Platforms and Their Data||Data mining to assess the overlap between the services available through five open source software platforms and the Lyrasis hosted platform data. Development of single combined data model towards a unified search portal for research access.||Type: Structured and unstructured data, CSV|
|Montana State University Library||#ROIStats: Connecting Subscription Library Resources to the Research Enterprise||Develop a workflow for extracting and visualizing specific data types from grant award data files to demonstrate how subscription library services are used to secure research dollars.||Source: Link|
Type: .xlsx, CSV, XML, JSON, PDF (OCR) files
|Movement Alliance Project (MAP)||People’s Media Record: Activist Media Metadata||Development of workflows for enhancement, analysis, or generation of quality descriptive metadata records for audio/video media files and associated production materials.||Sample: Link|
Type: PBCore XML files
|OCLC—Research and Development (OCLC-R): Project 1||Metadata Record Similarity: Identification and Clustering||Use string similarity and match-scoring approaches to develop an algorithm and implementation that can identify match candidates or small fuzzy clusters for MARC records.||Sample: Link|
Type: XML MARC records
|OCLC—Research and Development (OCLC-R): Project 2||Detect Missing Marks (diacritics) in Text||Examine instances of missing diacritic marks among title fields in WorldCat MARC records and analyze the associated bibliographic records to identify predictive patterns for this occurrence.||Sample: Link|
Type: XML MARC records
|Smithsonian Libraries (SL)||Enhancing the museum data ecosystem through linking research publications to museum systems||Extract textual entities from Smithsonian publication PDF files and identify patterns among specimens, taxonomic names, localities, and other entities that can enable linking to museum systems.||Type: PDF research publications, Wikidata|
|UC San Diego Library: Project 1||Transformation and enhancement of the Farmworker movement collection||Develop text analysis and data mining techniques for extracting quality metadata to augment digital objects in the Farmworker Movement Documentation Project collection.||Source: Link|
|UC San Diego Library: Project 2||Creating a Community of Expertise among Fellows||Analysis of existing data science collaborative groups to identify models to support a learning community||TBD|
|University of New Mexico (UNM) with Montana State University Library||The Scholarly Elite: Characterizing Uneven Distributions in Access to Institutional Repository (IR) Content||Analyze a subset of aggregated institutional repository data to identify trends and disparities across access and use.||Documentation: Link|
Type: JSON, tabular data, 100+ GB
|University of North Texas (UNT)||Developing a Framework for Identifying Gaps in Large Newspaper Collections||Analyze the UNT digitized newspaper collection to identify collection underrepresentation of specific regions, time periods, languages, and ethnic groups.||Type: XML, CSV, tabular, MARC, JSON, markdown|
|University of Rochester (ROC)||Using Data to make Impactful Collection Decisions||Bibliographic and network analysis for ROC publications in Web of Science and Scopus to understand journal usage and impact.||Type: CSV, .xlsx|
1) Northeastern University Libraries
2) Charles Widger School of Law Library, Villanova University.
Please reach out to firstname.lastname@example.org if interested in learning more about joining the LEADING network.