Recognition of Named Entities and Categories in Text using Stacked Embeddings

Published in IEEE 5th International Conference on Computing Communication and Automation (ICCCA), 2020

[url] [GitHub]


Named entities enable the identification of key elements in text while sentence classification provides for a summary of the same. Sequential labeling and sentence classification tasks together enable deeper extraction of information from text. Embeddings trained over a corpus pertaining to a specific domain, tend to generate strong vector representations thereby providing for the creation of better classification models. We propose custom fastText embeddings trained on a large Indian English news corpus. These embeddings are stacked with state-of-the-art Pooled Flair embeddings to generate an f1-score of 79 on a custom FIRE English NER dataset and 93.05 f1-score on a subset of the OntoNotes 5.0 dataset. The embeddings were also used for sentence classification on 20 news categories, to generate the best multi-class accuracy of 88.1%. We also propose two Indian news datasets, one based on the FIRE NER dataset and a custom multi-class sentence classification dataset.