CCI Senior Design

Citemantic

Project Description

Extracting Scientific Claims using Deep Discourse Model, Transfer Learning, and Contextual Embeddings

Team Logo

Abstract

The automated manipulation of scholarly data can benefit from the identification and extraction
of excerpts that directly describe an article’s claim. Recent work to extract claims from scholarly
articles rely on modern natural language models based on word embeddings. Their accuracy
however is still considered low. The considerably low accuracy hinders advances in automated
manipulation and analysis of scholarly data, as well as advances in applications that can support
users in searching for relevant related works. The increasing volume of publications poses
substantial challenges for research projects to competently keep abreast of all related relevant
research. This makes the resolution even more crucial and the need for its high accuracy of
utmost necessity. Research in natural language models has led to a breakthrough allowing for
more accurate language models. The state-of-the-art language models make use of what we
know as contextual embeddings. Previously, embeddings were limited to produce one vector
representation for each word in a vocabulary regardless of their context. Contextual embeddings
improve upon this by providing vector representations for all contexts associated with a word.
These models have demonstrated increased accuracy in several natural language tasks. In this
project, we will re-implement a previously proposed claims extraction algorithm, based on non-
contextual word embeddings, with contextual word embeddings to determine if they can improve
the accuracy of claims extraction. It is our hypothesis that the use of contextual embeddings will
allow for greater accuracy in extracting claims from research papers.

Video Presentation

Screenshot 1