Recognition of Named Entities and Categories in Text using Stacked Embeddings
Published in IEEE 5th International Conference on Computing Communication and Automation (ICCCA), 2020
Abstract
Named entities enable the identification of key elements in text while sentence classification provides for a summary of the same. Sequential labeling and sentence classification tasks together enable deeper extraction of information from text. Embeddings trained over a corpus pertaining to a specific domain, tend to generate strong vector representations thereby providing for the creation of better classification models. We propose custom fastText embeddings trained on a large Indian English news corpus. These embeddings are stacked with state-of-the-art Pooled Flair embeddings to generate an f1-score of 79 on a custom FIRE English NER dataset and 93.05 f1-score on a subset of the OntoNotes 5.0 dataset. The embeddings were also used for sentence classification on 20 news categories, to generate the best multi-class accuracy of 88.1%. We also propose two Indian news datasets, one based on the FIRE NER dataset and a custom multi-class sentence classification dataset.