Drexel University

College of Computing and Informatics

FOREST

Research
Rule-based Named-Entity Recognition (NER) system using efficient fuzzy search and phonetic algorithms
An increasing amount of domains and industries have begun to digitize their documents. As a result, there is now an overwhelming abundance of digital text data. Increasingly sophisticated techniques, like machine learning and deep learning, are needed to use this wealth of data effectively, however,  training DL models is computationally expensive, and not all domains have labeled data or the capacity to label data for training DL models.

In this project, we build upon past work that leverages efficient fuzzy string matching along with the trie data structure and phonetic algorithms. We make on top of prior work by constructing a system that uses multiple tries with varying architectures and additionally identifies a way to use a logistic regression model to aid in filtering candidate results from the task.

We find our system to see improved performance in terms of accuracy compared to past work.
...
...

Team Members

...

Behind The Scenes

...
...