Abstract

Currently, as a basic task of military document information extraction, Named Entity Recognition (NER) for military documents has received great attention. In 2020, China Conference on Knowledge Graph and Semantic Computing (CCKS) and System Engineering Research Institute of Academy of Military Sciences (AMS) issued the NER task for test evaluation, which requires the recognition of four types of entities including Test Elements (TE), Performance Indicators (PI), System Components (SC) and Task Scenarios (TS). Due to the particularity and confidentiality of the military field, only 400 items of annotated data are provided by the organizer. In this paper, the task is regarded as a few-shot learning problem for NER, and a method based on BERT and two-level model fusion is proposed. Firstly, the proposed method is based on several basic models fine tuned by BERT on the training data. Then, a two-level fusion strategy applied to the prediction results of multiple basic models is proposed to alleviate the over-fitting problem. Finally, the labeling errors are eliminated by post-processing. This method achieves F1 score of 0.7203 on the test set of the evaluation task.

This content is only available as a PDF.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.

Article PDF first page preview

Article PDF first page preview