College of Computing and Informatics
Extracting Scientific Claims using Deep Discourse Model, Transfer Learning, and Contextual Embeddings
The automated manipulation of scholarly data can benefit from the identification and extraction of excerpts that directly describe an article’s claim. Recent work to extract claims from scholarly articles rely on modern natural language models based on word embeddings. Their accuracy however is still considered low. The considerably low accuracy hinders advances in automated manipulation and analysis of scholarly data, as well as advances in applications that can support users in searching for relevant related works. The increasing volume of publications poses substantial challenges for research projects to competently keep abreast of all related relevant research. This makes the resolution even more crucial and the need for its high accuracy of utmost necessity. Research in natural language models has led to a breakthrough allowing for more accurate language models. The state-of-the-art language models make use of what we know as contextual embeddings. Previously, embeddings were limited to produce one vector representation for each word in a vocabulary regardless of their context. Contextual embeddings improve upon this by providing vector representations for all contexts associated with a word. These models have demonstrated increased accuracy in several natural language tasks. In this project, we will re-implement a previously proposed claims extraction algorithm, based on non- contextual word embeddings, with contextual word embeddings to determine if they can improve the accuracy of claims extraction. It is our hypothesis that the use of contextual embeddings will allow for greater accuracy in extracting claims from research papers.