LEADING: LIS Education And Data Science Integrated Network Group

LEADING Home | People | LEADING Fellows

The LIS Education and Data Science Integrated Network Group (LEADING), is a Laura Bush 21st Century Librarian (LB21) National Digital Infrastructures and Initiatives project, supported by the Institute of Museum and Library Services (IMLS). The LEADING project scales-up the highly successful LEADS-4-NDP initiative and will prepare a diverse, nation-wide cohort of 50 LIS doctoral students and early to mid-career librarians for data science endeavors.

To hear previous LEADS-4-NDP fellows discuss their internship experience, watch the videos below:

2019 LEADS Fellow Jessica Cheng
2019 LEADS Fellow Julaine Clunis
2018 LEADS Fellow/ LEADING Project Manager Sam Grabus

LEADING’s model includes community hubs at Montana State University Library and UC San Diego Library, a co-educational hub at OCLC, and 14 member nodes that also serves as project mentoring sites. Drexel University’s Metadata Research Center serves as the central-coordinating hub and will oversee the data science curriculum and bring together all project partners.

LEADING’s key goals include:

  1. Extending the Education Pipeline by leveraging Drexel University’s data science curriculum and faculty expertise.
  2. Collaborating and Community Development by bringing together educators, researchers, library leaders, frontline early and mid-career information professionals, and doctoral students, who will collaborate and advance data science in LIS education and practice.
  3. Facilitating Diversity and Inclusivity across the LEADING Fellowship program in partnership with LEADING’s Diversity and Inclusivity Task Force.
  4. Building a Sustainable and Extensible Partnership that enables a culture of mutual growth and continued sharing across the LEADING network.

LEADING Fellowship Projects at Member Node Sites

Project SiteProject Title (Link to full project description)Project GoalsData
Academy of Natural Sciences (ANS)From Natural History Literature to Linked Open Data Biodiversity Knowledge GraphAutomate the identification and disambiguation of specimen descriptions in the historical full-text of the Proceedings of the Academy of Natural Sciences of Philadelphia.Source: Link
Type: Plain text (OCR), GBIF datasets
AI-Collaboratory, University of Maryland iSchool (AIC–MD)Revisiting the WWII Japanese American Incarceration Camp Experience:
Spatial and Temporal Representations from the “Inside Out”
Analysis of Densho data records to identify people who were in camps, how they lived, and how they connected in order to resist and endure these circumstances.Type: CSV, shapefile, graph database, text
Size: 50K files, less than 1TB
California Digital Library, University of California, Office of the President (CDL)Moving a Metadata Meritocracy into ProductionWork with the yamz.net database of terms and the yamz source code to refine workflows for gaining acceptance of proposed metadata terms in a stable production environment.Source: Link
Type: Postgres RDB, 10k rows
Digital Scholarship, Tish Library, Tufts University (DTTu)Data & Decision-Making for Consortial Ebook AcquisitionsAnalyze and visualize consortium-wide ebook usage data to examine use patterns across the individual institutions.Type: Tabular title lists and usage reports
Kislak Center for Special Collections, Univ. of Pennsylvania Libraries (UPenn)Authority Creation and Exploitation in the Schoenberg Database of ManuscriptsIntegrate or link UPenn authority records for manuscript name and place metadata into VIAF or Wikidata knowledge bases. Visualize authority-related data for publication, and experiment with the dataset.Documentation: Link
Type: Structured metadata, MySQL Relational database
Loretta C. Duckworth Scholars Studio, Temple University Libraries (LCDSS)Enhancing and Visualizing Philadelphia Black Artist Records in WikidataDevelop SPARQL queries for Wikimedia records about black artists in Philadelphia to identify gaps and disparities in the records that can be enhanced with library catalogs, analyzed, and visualized.Type: Wikidata database RDFs
LYRASIS Project 1Measuring the impact of the LYRASIS Catalyst Fund:
A Data Visualization/Program Impact Assessment
Survey development, data mining, and data visualization to measure the impact of the LYRASIS Catalyst award program. Impact areas to assess include the development of innovative solutions; synergistic outgrowths; the impact on Libraries, archives and museums; sustainability; and the impact to the broad LYRASIS community.Type: Structured and unstructured data, CSV
LYRASIS Project 2Assessing Overlap and Aggregation Potential of Open-Source Software Platforms and Their DataData mining to assess the overlap between the services available through five open source software platforms and the Lyrasis hosted platform data. Development of single combined data model towards a unified search portal for research access.Type: Structured and unstructured data, CSV
Montana State University Library#ROIStats: Connecting Subscription Library Resources to the Research EnterpriseDevelop a workflow for extracting and visualizing specific data types from grant award data files to demonstrate how subscription library services are used to secure research dollars.Source: Link
Type: .xlsx, CSV, XML, JSON, PDF (OCR) files
Movement Alliance Project (MAP)People’s Media Record: Activist Media MetadataDevelopment of workflows for enhancement, analysis, or generation of quality descriptive metadata records for audio/video media files and associated production materials.Sample: Link
Source: Link
Type: PBCore XML files
OCLC—Research and Development (OCLC-R): Project 1Metadata Record Similarity: Identification and Clustering
Use string similarity and match-scoring approaches to develop an algorithm and implementation that can identify match candidates or small fuzzy clusters for MARC records.Sample: Link
Type: XML MARC records
OCLC—Research and Development (OCLC-R): Project 2Detect Missing Marks (diacritics) in TextExamine instances of missing diacritic marks among title fields in WorldCat MARC records and analyze the associated bibliographic records to identify predictive patterns for this occurrence.Sample: Link
Type: XML MARC records
Smithsonian Libraries (SL)Enhancing the museum data ecosystem through linking research publications to museum systemsExtract textual entities from Smithsonian publication PDF files and identify patterns among specimens, taxonomic names, localities, and other entities that can enable linking to museum systems.Type: PDF research publications, Wikidata
UC San Diego Library: Project 1Transformation and enhancement of the Farmworker movement collection
Develop text analysis and data mining techniques for extracting quality metadata to augment digital objects in the Farmworker Movement Documentation Project collection.
Source: Link
UC San Diego Library: Project 2Creating a Community of Expertise among FellowsAnalysis of existing data science collaborative groups to identify models to support a learning communityTBD
University of New Mexico (UNM) with Montana State University LibraryThe Scholarly Elite: Characterizing Uneven Distributions in Access to Institutional Repository (IR) ContentAnalyze a subset of aggregated institutional repository data to identify trends and disparities across access and use.Documentation: Link
Source: Link
Type: JSON, tabular data, 100+ GB
University of North Texas (UNT)Developing a Framework for Identifying Gaps in Large Newspaper CollectionsAnalyze the UNT digitized newspaper collection to identify collection underrepresentation of specific regions, time periods, languages, and ethnic groups.Type: XML, CSV, tabular, MARC, JSON, markdown
University of Rochester (ROC)Using Data to make Impactful Collection DecisionsBibliographic and network analysis for ROC publications in Web of Science and Scopus to understand journal usage and impact.Type: CSV, .xlsx
*Confirmed, additional partners for project year 2 include:
1) Northeastern University Libraries
2) Charles Widger School of Law Library, Villanova University.

Please reach out to mrc.metadata@drexel.edu if interested in  learning more about joining the LEADING network.