Past Research


Dryad is a curated general-purpose repository that makes the data underlying scientific publications discoverable, freely reusable, and citable.

(project wiki: Dryad wiki)

Funding: NSF (2008-2012, 2012-2016)


The Data-at-Risk Initiative (DARI) is a project of the Committee on Data for Science and Technology (CODATA) Data at Risk Task Group (DARTG) to create an inventory of valuable scientific data that are at risk of being lost to posterity. Examples of at-risk data include analog formats, such as loose paper documents, laboratory notebooks, photographs or glass plates, digital data stored in obsolete and deteriorating formats such as magnetic tapes or floppy disks, or any formats that face disposal due to inadequate storage. The DARI is not a repository for data. It is a descriptive inventory of endangered data that is held by others: individuals and research institutions. Our goal is for the DARI to be an initial step in identifying and locating at-risk-data.We are currently developing aprototype inventory using Omeka, an open source software package for online collections. Using this prototype, SILS students, in conjunction with DARTG, conducted an initial case study wherein scientists contributed dataset descriptions directly to the inventory through an online form.

Digging into Metadata

Digging into Metadata is a project aiming to aggregate metadata from three heterogeneous digital libraries for improving inter-operability.
Funding: IMLS, Jisc (UK), ESRC (UK) and AHRC (UK)


Subpopulations and InteRmediate Outcome Measures In COPD Study (SPIROMICS) is a study supporting “the prospective collection and analysis of phenotypic, biomarker, genetic, genomic, and clinical data from subjects with COPD for the purpose of identifying subpopulations and intermediate outcome measures.”

Funding: NIH (2009-2014)


Dublin Core Metadata Initiative Science and Metadata Community (DCMI-SAM)is a forum for individuals and organizations to exchange information and knowledge about metadata describing scientific data (data methodologically collected for research, analysis, tracking, forecasting, and other uses).
Sponsorship: Dublin Core Metadata Initiative (2009-current)

Automatic Metadata Maintenance for NC Health Info and Go Local (AMMGO)

NC Health Info (NCHI) is an exemplary and leading Web resource developed by the Health Sciences Library and the School of Information and Library Science, both at the University of North Carolina at Chapel Hill. NCHI provides North Carolina citizens with Web access to information about health conditions, diseases, wellness, and services (e.g., local doctors, hospitals, support groups and organizations). NCHI also connects to MedlinePlus, a global service of the National Library of Medicine and the National Institutes of Health.

The overarching goal of this project was to address growing metadata maintenance needs threatening to stem the accelerated growth of NC Health Info (NCHI) and the entire Go Local Initiative. Incorporating automatic techniques into NCHI’s metadata generation/quality control processes has allowed NCHI catalogers to direct more time to the metadata challenges requiring human skill and intellect and ultimately assist users who are increasingly using NCHI metadata to help them find health information.

The work included two phases: Phase 1, Automatic metadata application evaluation to identify the application/s we could enhance, and Phase 2, Automatic evaluation of metadata quality that determined the effectiveness of the automatic metadata maintenance process.

Automatic Metadata Generation Applications (AMeGA)

The goal of the AMeGA project was to identify and recommend functionalities for applications supporting automatic metadata generation in the library / bibliographic control community. The project was conducted in connection with Section 4.2 of the Library of Congress Bibliographic Control Action Plan, which is providing leadership to libraries and other information centers in this new millennium.

The Activity Plan’s charge for section 4.2 was to “Develop specifications for a tool that will enable libraries to extract [and harvest] metadata from Web-based resources in order to create catalog records and that can detect and report changes in resource content and bibliographic data in order to maintain those records, communicate the specifications to the vendor community and encourage their adoption.”

The metadata generation task force advised the AMeGA project staff.

Project Goals

Evaluate current automatic metadata generation applications. (The categories of tools being investigated include: document presentation software, tools created specifically for metadata generation, and online library cataloging modules for creating metadata.)

Survey metadata professionals to determine which aspects of metadata generation are most amenable to automation and semi-automation. Other metadata creators may also participate in the study.

Compile a final report of recommended functionalities for automatic metadata generation applications. The final report will be reviewed and endorsed by the Metadata Generation Task Force (MGTF) and be made publicly accessible via the Library of Congress.

Memex Metadata (M2) for Student Portfolios

The Memex Metadata (M2) for Student Portfolios project (hereafter referred to as M2) advanced work in the areas of student portfolio management, metadata generation, and contextual retrieval. Students need to manage an ever increasing amount of digital information disseminated during class sessions (e.g., lectures, whiteboard notes, slides, handouts, Web pages, emails, and exams) or self-authored as part of their educational experience (e.g., personal notes and term papers—including revisions).

We developed a series of metadata schemas to help university students manage their individual educational portfolios. The metadata schemas are based on learning scenarios and use cases identified for students in a field biology course. The metadata scheme is interoperable with the Context Awareness Framework (CAF) currently being developed by UNC’s Information Technology Services (ITS). The CAF links a software agent on a student’s computer with ontologies and rule sets specific to the university environment. We used the Microsoft Memex research kit to pilot test the effectiveness of M2 schemas and the CAF, and to explore annotation and automatic capturing of contextual metadata for personal educational information.

The long-term goal of our work is to advance contextual metadata activities for personal educational portfolio management and facilitate effective retrieval. The project was supported by a University of North Carolina at Chapel Hill partnership that includes the School of Information and Library Science, Metadata Research Center, ITS, and the Biology Department.

U-PLanT (University of North Carolina Plant Language Team)

The project explored vocabulary solutions for accessing digital plant information. The project was an extension of Project OpenKey and UNC’s Plant Information Center (PIC).

Vocabulary solutions included a suite of vocabulary tools, a preliminary process model with steps for the development of vocabulary tools, and guiding principles for the development of descriptive plant vocabulary. We worked with SKOS to address the student/scientist vocabulary gap and facilitate student access to primary scientific resources found in education digital initiatives.

MGR: Metadata Generation Research

The Metadata Generation Research (MGR) project examined efficient and effective means of metadata production by integrating human and automatic processes. The research was conducted in collaboration with the National Institute of Environmental Sciences (NIEHS), an Institute of the National Institutes of Health (NIH), which is a component of the U.S. Department of Health and Human Services (DHHS). The MGR project has been funded by Microsoft Research, OCLC, Online Computer Library Center, Inc., and UNC’s University Research Council.

Project Goals

Study human and automatic metadata generation processes.

Develop protocols for collaboration between resource authors and metadata professionals during the metadata generation process.

Evaluate the integration of collaborative human metadata generation processes and automatic generation processes.

Consider implications for the development of the Semantic Web.

PIC: Plant Information Center

The PIC project developed a web-based center that links digital images of herbarium specimens, associated data, outreach programs of the North Carolina Botanical Garden, the data base and web site expertise of the School of Information and Library Science, with a unique library that is a shared facility of Orange County Public Library and McDougle Middle School. Central to the project is the development and employment of a series of applications that facilitate resource discovery, interactive learning, and contributory opportunities within the PIC system. Initial testing of PIC was integrated into public school sixth-grade science curriculum activities involving plant identification and classification. On a larger scale, PIC promotes the flow of scientific information to researchers, amateur botanists, students (elementary through higher education), and other communities interested in botanical science.

Project Goals

Demonstrate a successful cooperation between the university, the public school system, and the public library

Create and test an interactive Plant Information Center for the general public, libraries, and public schools

Test the usefulness of digital images of herbarium specimens for plant identification and for inspiring the public and public school children with the aims and methods of professional botanical science

To develop educational experiences using primary research materials from the herbarium for 6th grade students.