The Metadata Research Center is hosting a Data Science Foundations Carpentry on January 23rd, 2020 at Drexel University’s College of Computing and Informatics.
The workshop will cover:
- GitHub basics
- Best data practices
The full carpentry abstract is provided below.
The Metadata Research Center at the College of Computing and Informatics, Drexel University, is please to host the Data Science Carpentry workshop. The Carpentry workshop is part of the Library Carpentry program advancing “software and data skills for people working in library- and information-related roles.” The workshop is also connected with Drexel University’s LEADS (Library Education and Data Science) program, supported by the Institute of Museum and Library Services (IMLS).
The Metadata Research Center: Data Science Foundations Carpentry is a hands on workshop that will take its participants through a short experience of the data science lifecycle, incorporating tools and best practice in versioning, sharing, analyzing and manipulating data.
Git, an open source tool that allows collaboration and versioning, is widely used for a group of people working on a shared document or code. Beyond that, it can be used to create websites, discuss issues, tag problems and allows a group to work effectively together. Participants will create their own repositories, add documents, and learn how to suggest changes to others’ work as well as incorporate suggested changes to their own.
OpenRefine is another open source tool that is used to view, analyze and manipulate data. It may look like Excel or Google Sheets, but it is much more powerful and is often used by those who need to do reconciliation on linked data. Participants will learn to create various date formats, how to use an API to extract Crossref title data from ISSNs, and to clean data using a variety of methods that OpenRefine offers.
The last hour will be dedicated to a discussion on open research, and how git and OpenRefine can be used in the context of the participants’ daily workflow.
Chris Erdmann is the Engagement, Support, and Training Expert at the RENCI (Renaissance Computing Institute) in North Carolina. RENCI develops and deploys advanced technologies to enable research discoveries and practical innovations. Juliane Schneider.
Julianne Schneider is the Lead Data Curator for Harvard Catalyst’s eagle-i. She has had a long, weird library career, starting as a traditional cataloger and is currently using ontologies and data harmonization to maximize discovery and data sharing/reuse.
Workshop made possible with IMLS Support: RE-70-17-0094-17