Data Set and Evaluation of Automated Construction of Financial Knowledge Graph

With the technological development of entity extraction, relationship extraction, knowledge reasoning, and entity linking, the research on knowledge graph has been carried out in full swing in recent years. To better promote the development of knowledge graph, especially in the Chinese language and in the financial industry, we built a high-quality data set, named financial research report knowledge graph (FR2KG), and organized the automated construction of financial knowledge graph evaluation at the 2020 China Knowledge Graph and Semantic Computing Conference (CCKS2020). FR2KG consists of 17,799 entities, 26,798 relationship triples, and 1,328 attribute triples covering 10 entity types, 19 relationship types, and 6 attributes. Participants are required to develop a constructor that will automatically construct a financial knowledge graph based on the FR2KG. In addition, we summarized the technologies for automatically constructing knowledge graphs, and introduced the methods used by the winners and the results of this evaluation.


Data Set and Evaluation of Automated Construction of Financial Knowledge Graph
properties of the FR2KG data set are described in Section 3. Section 4 surveys the current knowledge graph construction technologies, and Section 5 introduces the methods used by the winners and the results of this evaluation. Finally, the challenges and prospects for the automated construction of domain knowledge graphs are discussed in Section 6.

Task Definition
The content of this evaluation is constructing a financial knowledge graph from the text of unstructured financial research reports, which is based on the given knowledge graph schema: • Given: unstructured text of the financial research reports • Given: the schema of knowledge graph • Given: seed knowledge graph • Participants are required to develop a constructor that extracts entities, attribute triples, and relationship triples that conform to the schema from the unstructured text provided.
There are 1,200 financial research reports. After removing tables, images, headers, footers, and other useless and repetitive information, the remaining contents are converted into plain text format. Simultaneously, we worked with financial research experts to analyze these research reports, and designed a schema of the financial knowledge graph based on the characteristics of financial research and technical evaluation. Next, the unstructured text was annotated by trained annotators, and the results were reviewed by financial research experts. Therefore, the annotated knowledge graph (annotated KG) is composed of a data set that has been reviewed. The annotated KG will be randomly divided into seed knowledge graph (seed KG) and evaluation knowledge graph (evaluation KG). The random segmentation method was as follows: 1) Randomly select 200 copies of 1,200 TXT files; 2) Select the extracted entities, relationship triples, and attribute triples, corresponding to 200 TXT files as seed KG; and 3) Remove all the data in the seed KG from the annotated KG, and use the remaining data as the evaluation KG.
The above is a complete description of the data processing and annotation process. Therefore, we have obtained the entire FR2KG data set, including knowledge graph schema, unstructured text financial research reports, seed KG, and evaluation KG. The goal of this evaluation is to use FR2KG to develop a financial knowledge graph constructor to automatically extract entities, attribute triples, and relationship triples from unstructured text. The constructed financial knowledge graph excludes the data that already exist in the seed KG, and the evaluation procedure uses the metrics described in the next section to measure the quality of the knowledge graph. The entire process mentioned above is shown in Figure 1. Given the FR2KG data set, the goal of the participants is to develop a financial knowledge graph constructor that is the most efficient at automatically extracting entities, attribute triples, and relationship triples from unstructured text of financial research reports, and constructing a financial knowledge graph that is as consistent as possible with the knowledge graph annotated by experts. To be as close to the real application scenario as possible, and considering the fairness and reasonableness of all participants, this evaluation allows all participants to use various open or public data, including, but not limited to, pretrained models, open knowledge graphs from OpenKG, and other sources. If participants would like to use private data, the data must be publicly available, and other participants should be able to use them.

Evaluation Metrics
This evaluation task uses the F1 score defined as follows to evaluate the performance of the knowledge graph constructor. The higher the F1 score, the better is the performance. The data of the knowledge graph are divided into three types: entity, attribute triples <entity, attribute key, attribute value>, and relationship triples <entity, relationship, entity>. Precision (p), recall (r), and F1 score (F1) are defined as follows. First, we define the following variables: y E : The set of all the pairs of <entity, entity type> extracted by the constructor; |y E | represents the number of entities.

Data Set and Evaluation of Automated Construction of Financial Knowledge Graph
Attribute triples:

Data Set and Evaluation of Automated Construction of Financial Knowledge Graph
Relationship triples: Finally, we define the final evaluation F1-score of the entire knowledge graph as in Equation (10): We use the weighted average of the F1-score of three types of entity, attribute triple, and relationship triple, in which the weights from attribute triples and relation triples are twice that of the entities, because we believe extracting attribute triples and relation triples is twice as difficult as extracting entities. For an attribute triple, both the entity and attribute value must be extracted correctly, and the attribute value should match the attribute key, which is the same as identifying the entity and matching the entity type. So, we decide that the weight of attribute extraction is twice that of entity extraction. For a relation triple, it is necessary to extract two entities correctly and match the corresponding relationship type. It is a significant topic to study the difficulty of entity extraction, attribute extraction and relationship extraction in detail. In determining the evaluation metrics, we carried out a survey and did not find the corresponding research. So, we decided the weight of 2 based on our experience.

FR2KG Overview
Among data sets that have been constructed, some are manually annotated [8,9], some are collaboratively annotated by humans and algorithms and others are labeled with higher precision through better algorithms. However, most of the data sets are concentrated, in general, with news and common-sense articles (such as Wikipedia), as well as in some domains, such as biomedical and medical-related and scientific and technological literature data sets. Data sets for financial knowledge graphs are rare, and Chinese financial knowledge graph data sets are even rarer. For the first time in this evaluation, the FR2KG data set and the corresponding unstructured texts are published, aiming to promote the development of technologies for distant supervision or weak supervision, to automatically construct domain knowledge graphs.
The construction process of the FR2KG data set is described in the previous section, as shown in Figure 1. First, 1,200 financial research reports were collected. Experts in the financial field analyzed these reports, extracted the plain text from the main body, and saved it in the TXT format as the basic unstructured

Data Set and Evaluation of Automated Construction of Financial Knowledge Graph
text corpus. Then, the experts and the knowledge graph team studied these corpora together, designed the schema of the knowledge graph from the perspective of financial business, and performed iterative optimization according to the characteristics of the evaluation, and finally, determined that it contained 10 entity types, 6 entity attributes, and 19 relationships between the entities. Subsequently, these corpora were annotated with the help of the annotation system of the Yuanhai Knowledge Graph Platform, which is a product of DataGrand Inc. The annotation system is specifically used for the annotation of the knowledge graph, and supports the annotation of entities, entity attributes, and relationships between entities. Before annotating, all annotators were trained by financial experts to align their understanding of the schema. All annotated data were reviewed by experts, and then, divided into seed KG and evaluation KG, as described in the previous section. Examples of the FR2KG data set are shown in Figures 2 and 3.  A summary of FR2KG is shown in Table 1. It is currently the largest data set for the automatic construction of Chinese financial knowledge graphs. Table 1 describes the data, and the following sections introduce FR2KG in detail.

Financial Research Reports
The length of the financial research reports varies. As shown in Table 2, the longest text has 13,857 characters, while the shortest has only 242 characters, and the longest is close to 60 times the shortest. However, the length of most texts is concentrated in the range of 1,000-3,000 fields, accounting for 70% of the total text. In terms of paragraphs, the shortest has 4 paragraphs, and the longest has 74 paragraphs. Most reports were between 10 and 30 paragraphs, accounting for 82% of the total text.  Figure 4 shows the schema of FR2KG. There are 10 entity types in total, which are represented by ellipses, and 19 relationships between entity types, which are represented by directed arrows. For example, in the relationship of <人物/person, 投资/investment, 机构/organization>, the directed arrow in the figure points from "person" to "organization". Among these entity types, the three entity types have attributes (Table 3). Notably, the attribute value of the time type is normalized to the "YYYY-mm-dd" format during annotation. The participants were also required to normalize the time data in the construction of the knowledge graph.  The FR2KG schema, as shown in Figure 4, is very rich in applications that can be used in investment research, financial risk assessment and control, product analysis, industrial chain analysis, and other fields. For example, with the relationships of <人物/person, 投资/invest, 机构/organization> and <机构/organization, 投资/invest, 机构/organization>, it can be used for in-depth investment and financing analysis. Another example is the relationship between <机构/organization, 生产销售/sale, 产品/product> and <机构/ organization, 采购买入/buy, 产品/product>, which can be used for supply chain analysis, mining the company's advantages in the supply chain and assessing supply chain risks.

Entities and Attributes
Tables 4 and 5 summarize the entities and their attribute triples of FR2KG.  Table 6 summarizes the statistics of relationship triples in the FR2KG.

Properties of FR2KG
As a complete domain knowledge graph data set, FR2KG is currently the largest Chinese financial knowledge graph data set dedicated to the automated construction of knowledge graphs. In the future, we plan to continue expanding and enriching its content.
Scale: FR2KG is committed to promoting the development of automated construction of domain knowledge graphs, including rich and diverse data types and the largest data scale currently. In addition, the content is abundant, including common stock research reports, industry research reports, and macroeconomic research reports in the financial industry.

Relationship:
The data set provides 19 common relationships in the financial field, as shown in Figure  4. These relationships can help realize multiple and diverse analyses in the financial industry. For example, through the relationship of <机构/organization, 拥有/has, 风险/risk> and <行业/industry, 拥有/has, 风险/ risk>, industry or enterprise risk analysis, risk assessment, and risk early warning could be performed better. The relationship between <人物/person, 任职于/work for, 机构/organization>, <人物/person, 投资/invest,

Data Set and Evaluation of Automated Construction of Financial Knowledge Graph
机构/organization>, and <机构/organization, 投资/invest, 机构/organization> can apply to deep-level equity relationship mining, which has great value in bank loans and investment analysis.

Professionalism:
The schema of FR2KG is jointly designed by experts in the financial industry and knowledge graph experts. The annotators were trained by financial experts before annotation, and the results were reviewed by financial experts to ensure the high professionalism of the data set.

Diversity:
The goal of FR2KG is to evaluate the performance of the automated construction of financial knowledge graphs; however, the application of the data set is not limited to this objective. Various other technologies related to the knowledge graphs can also be evaluated using FR2KG. For example, common tasks, such as link prediction and node classification in graph neural networks, tasks related to various graph algorithms, and tasks based on deep learning techniques to implement traditional graph algorithms, can be evaluated.

Entity Extraction
Entity extraction, also known as named entity recognition (NER), aims to recognize the mentions of rigid designators from text belonging to predefined semantic types such as person, location, and organization. The two popular data sets from recent work, CoNLL03 [14] and OntoNotes5.0 CoNLL03 contain annotations for Reuters' news in English and German. The English data set contains a large portion of sports news with annotations in four entity types: person, location, organization, and miscellaneous entities. OntoNotes5.0 contains annotations for a large corpus, comprising various genres with structural information and shallow semantics. The data set was annotated using 18 entity types. BOSON [15], People's Daily [16], and MSRA [17] are Chinese entity extraction data sets in general fields, while FR2KG proposed in this article focuses on the Chinese financial field.
In recent years, research on supervised entity extraction has mainly been focused on how to input representation and design neural models, including context encoders and tag decoders. In addition, unsupervised and semi-supervised entity extraction has achieved remarkable development.

Supervised Entity Extraction
Input representation is the first step in the entity extraction. In this subsection, we summarize word-level representation, character-level representation, language model, and other representations. Since Mikolov et al. [18] proposed word2 vec, many studies on entity extraction have used the word2vec toolkit to train word-level representation on different corpora, such as PubMed [19], Gigaword [20], NYT [21], and SENNA [22]. In addition, GloVe [23] and FastText [24] are widely used. Ins tead of only considering wordlevel representations, character-level representation has been found to be useful for exploiting explicit sub-word-level information and naturally handling out-of-vocabulary information [21,25,26]. Both wordlevel and character-level representations only contain the meaning of the word, without its context.

Data Set and Evaluation of Automated Construction of Financial Knowledge Graph
Therefore, many studies have added context-dependent language model representations to the input representation. Peters et al. [27] proposed TagLM, a language model augmented sequence tagger. This tagger considers both pre-trained word embeddings and bidirectional language model embeddings for each token in the input sequence. Based on TagLM, Peters et al. [26] proposed the famous pre-trained bidirectional language model ELMo. The key difference between ELMo and TagLM is that ELMo allows the task model to learn a weighted average of all bidirectional LM layers, whereas TagLM only uses the top bidirectional LM layer. In contrast to CNNs and recurrent neural networks (RNNs), transformers [28] utilize stacked selfattention and point-wise, fully connected layers to build basic blocks for the encoder and decoder. Based on the transformer, BERT [6] was proposed to pre-train a deep bidirectional transformer by jointly conditioning both the left and right contexts in all layers. Combining pre-trained language model embedding with traditional embedding has become a de facto standard [29,30,31,32,33]. In addition, novel input representations are still being explored, such as external knowledge from Wikidata [30], dependency trees [31], and global contextual embedding [32,33].
After converting the input sentence into a representation, the context encoder captures the context dependencies, and the tag decoder predicts tags for tokens in the input sequence. Collobert et al. [34] used a CNN to produce local features around each word, and applied a maximum or averaging operation to extract global features. Strubell et al. [35] proposed an iterated dilated CNN (ID-CNN), where four stacked dilated convolutions having a width of three obtained more contextual information. Compared to CNN, the bidirectional RNN makes full use of the forward and backward information in the sentence, which can effectively extract the features of the entire sentence. Therefore, a bidirectional RNN is the most popular encoder for entity extraction tasks. Although we can directly use the hidden layer of the bidirectional RNN to connect to the softmax layer, adding the CRF layer as the tag decoder can help in understanding the limitations of the sentence, to ensure the effectiveness of the prediction. Huang et al. [22] were the first to utilize the BiLSTM-CRF architecture to implement sequence tagging tasks, including POS, chunking, and NER. Similar to [22], many studies also used BiLSTM as an encoder and CRF as a decoder [21,36,37]. In addition to CRF, RNN [38] and pointer network (PtrNet) [39] have also been explored as tag decoders. Shen et al. [40] reported that RNN tag decoders outperform CRF and are faster to train when the number of entity types is large. However, a major disadvantage of RNN and PtrNet decoders lies in greedy decoding, meaning that the input of the current step requires the output of the previous step. Because the pre-trained model can capture sufficient semantic information, some studies only use BERT and completely abandon BiLSTM-CRF. In particular, Li et al. [41] framed the NER task as a machine reading comprehension (MRC) problem, which can be solved by fine-tuning the BERT model.

Semi-supervised Entity Extraction
The semi-supervised entity extraction method aims to manually add a small number of appropriate entities as training corpus according to the entity type designed in advance by humans, and use the pattern learning method for continuous iterative learning and manual adjustments to finally generate a named entity data set, which reduces the dependence on manual annotation corpus. Liu et al. [42] used a small amount of existing labeled data to train the initial KNN and CRF models, performed semi-supervised learning on

Data Set and Evaluation of Automated Construction of Financial Knowledge Graph
tweet data, and improved the training data by learning and supplementing data with a large amount of unlabeled text. Etzioni et al. [43] proposed the KNOWITALL system, based on a set of predicate inputs, using pattern learning, subclass extraction, list extraction (uppercase), and other modes to perform NER on unlabeled data. In addition, Zhang and Elhadad [44] proposed a method for extracting named entities from biomedical text based on terminology, corpus statistics (such as inverse document frequency and context vector), and shallow syntactic knowledge (such as noun phrase chunks), and conducted experiments on two mainstream biomedical data sets to verify the method.

Unsupervised Entity Extraction Methods
Based on the vocabulary resources, vocabulary patterns, and statistical data that are calculated on a large corpus, named entities can be inferred by clustering and combining the similarity in the sentence context. Nadeau et al. [45] proposed an unsupervised system for the construction of geographical name dictionaries and the resolution of named entity ambiguities. Zhang and Elhadad [44] proposed an unsupervised method that uses terminology, corpus statistics and shallow grammatical knowledge to extract named entities from biomedical texts, and proved the effectiveness and versatility of the unsupervised method. Brooke et al. [46] performed Brown clustering based on pre-segmented expectations, combined with the rank value of each class after clustering, and constructed bootstrap seeds for training, which can extract entities for specific domain knowledge. Jia et al. [47] used cross-domain language modeling and obtained task and domain vectors to complete NER entity extraction in unsupervised and supervised fields, respectively. Collins and Singer [48] only used seven simple "seed" rules to realize NER on the original data, and proposed two unsupervised named entity classification algorithms.

Relation Extraction
Relation extraction is usually considered to be a classification task, which predicts semantic relationships between pairs of nominals and can be defined as follows. Given sentence S with annotated pairs of nominals e 1 and e 2 , we aim to identify the relationships between e 1 and e 2 . Relation extraction is usually divided into supervised, unsupervised, and distant supervision relation extraction. End-to-end entities and relation extraction are also popular. Supervised data sets are of high quality and contain almost no noise, but are often small. SemEval2010 Task 8 [49] contained nine directed relation types and 10,717 samples, of which 8,000 were used for training and 2,717 for testing. ACE2005 contains 599 documents, which are related to news and e-mail and divided into seven main types of relations. Each type of relationship has an average of 700 instances for training and testing. In addition to ACE2005, which contains the Chinese corpus, DuIE [50] is another large-scale Chinese data set for information extraction. The FR2KG proposed in this study focuses on relation extraction in the Chinese financial field. For distant supervision relation extraction, the New York Time (NYT) data set is formed by aligning the relation with Freebase. The data set contains 52 possible relationship categories and a special relationship category NA (indicating that there is no relation between entities). The training data contain 522,611 sentences, 281,270 entity pairs, and 18,252 relations.

Supervised Relation Extraction
Zeng et al. [51] used a CNN to extract vocabulary and sentence-level features for relation extraction tasks. The lexical-level feature vector is concatenated by the word vector of the labeled entity as well as the context and semantic category feature in WordNet. The sentence-level feature representation was automatically extracted using the maximum pooling CNN. In order to eliminate the impact of artificial class, Santos et al. [52] used a pairwise ranking loss function for training instead of cross entropy. Because there is a lot of irrelevant information in the sentence, the method of extracting sequence features cannot accurately predict the relationship between the two entities. Therefore, Xu et al. [53] noted that the shortest dependency path (SDP) is beneficial for determining the relationship between two entities. Specifically, [53] successively took the SDP from the subject to the object as input, passed it through the lookup table layer, produced local features around each node on the dependency path, and combined these features into a global feature vector through a CNN that was then fed to a softmax classifier. Similarly, Xu et al. [54] used a four-channel LSTM to extract words, parts of speech, grammatical relations, and WordNet semantic features along with the SDP. However, previous studies based on SDP may neglect crucial information. Zhang et al. [55] encoded a complete dependency structure over an input sentence with an efficient graph convolutional network (GCN), and then, extracted entity-centric representations to make robust relation predictions. To avoid the introduction of irrelevant information between entities in the complete dependency tree, Guo et al. [56] proposed AGGCN to automatically generate the substructures for relation extraction tasks.

Distant Supervision Relation Extraction
Supervised relationship extraction requires a large amount of expert-labeled data, which limits the application of this method. Therefore, Mintz et al. [57] proposed the hypothesis of remote supervision as follows. If two entities have a relationship in a known knowledge base, then all sentences that mention these two entities will express that relationship in some way. They then applied this assumption to align the document with the existing database and automatically generate a large amount of training data. Zeng et al. [58] proposed a piecewise convolutional neural network (PCNN) to extract features, and used a multi-instance learning method to alleviate the data noise problem. However, they failed to fully utilize the information across different sentences, and ignored the fact that there can be multiple relationships between the same entity pair. Therefore, after using PCNN to extract the features of each sentence in the package, Jiang et al. [59] used cross-sentence maximum pooling to select the features of different sentences, and then, aggregated the most important features into the representation of each entity pair. Finally, the feature is applied as a sigmoid instead of softmax to judge the possibility of multiple labels. Since different sentences have different contributions, [60,61] focused on how to use the attention mechanism to select sentences. Inspired by the transE model [62], for the two entities e 1 and e 2 of each package, e 1e 2 is used to represent the relation between the two entities. The features extracted by PCNN and relation e 1e 2 are concatenated to obtain the weight of each sentence, and the feature of the package is the weighted sum of all sentence feature vectors. Du et al. [63] proposed a new multi-layer structured self-attention model based on BiLSTM.

Data Set and Evaluation of Automated Construction of Financial Knowledge Graph
Among them, the word-level attention mechanism based on a two-dimensional matrix can focus on different aspects of a sentence to better learn the contextual representation. The two-dimensional sentence-level attention mechanism used for multi-example learning can focus on different effective examples to better select sentences. Many studies use existing knowledge bases to add information to alleviate the problem of mislabeling in remote supervision. Vashishth et al. [64] added an additional entity type and relationship as information from the knowledge base (KB) to improve prediction performance. In addition, Wang et al. [65] proposed a label-free distant supervision method that does not use the relation labels under this inadequate assumption, but only uses the prior knowledge derived from the KB to supervise the learning of the classifier directly and softly.

Unsupervised Relation Extraction
The unsupervised learning method assumes that the entity pairs with the same semantic relationship have similar context information, and the corresponding context information of each entity pair can be used to represent the semantic relationship of the entity pair. Hasegawa et al. [66] clustered entity pairs with the same contextual semantics, and then selected a core vocabulary to mark the semantic relationship between the categories. [67] improved Hasegawa's hypothesis by eliminating candidate entity pairs with multiple relationships or performing multi-level clustering to extract relationships. Davidov et al. [68] used Google search as the knowledge background to define concept words; however, without pre-defining any relationship types in advance, they can automatically extract related entities and semantic relationships. Yan et al. [69] combined dependency features and shallow grammatical templates, and used clustering methods to extract all the semantic relationships of entities in Wikipedia entries from a large-scale corpus. In addition, Bollegala et al. [70] analyzed the templates after clustering, found the implicit semantic relationship between the entity pairs, and selected suitable extraction templates from the candidate relationship templates, which expanded the scope of entity relationships and improved the accuracy and recall rate to a certain extent.

End-to-end Entity and Relation Extraction
Entity relation extraction can be pipelined for NER and relation classification. This independent framework is more flexible, but ignores the correlation between the two tasks. The result of entity recognition may affect the performance of the relationship classification and lead to incorrect transmission. In contrast to the pipeline method, the joint learning framework can use a single model to extract entities and relationships, and can effectively integrate information regarding entities and relations.
An end-to-end approach is to share the model parameters between the entity recognition task and relationship classification task. Miwa et al. [71] proposed an end-to-end model that captures both word sequence and dependency tree substructure information by stacking tree-structured LSTM on BiLSTM. Zheng et al. [21] designed novel tags that contain information regarding entities and the relationships they hold. Based on this tagging scheme, the joint extraction of entities and relations can be transformed into a tagging problem. However, it is difficult to solve the problem of overlapping triples of different relations in

Data Set and Evaluation of Automated Construction of Financial Knowledge Graph
sentences. Zeng et al. [72] divided the sentences into three types according to the triplet overlap degree: Normal, EntityPairOverlap, and SingleEntiyOverlap. They proposed an end-to-end model based on sequence-to-sequence learning using a copy mechanism. The encoder converts a natural language sentence into a fixed-length semantic vector, and then the decoder reads in this vector and generates multiple triplets. To better consider the interaction of different relations in sentences, especially overlapping relations, Fu et al. [73] proposed an end-to-end model, GraphRel. In the first stage, GraphRel learns to automatically extract hidden features for each word by stacking a BiLSTM sentence encoder and a GCN dependency tree encoder to tag entity mention words and predict relation triplets. In the second stage, GraphRel uses a novel relation-weighted GCN to better predict the interaction between triples.
The advantage of the shared model parameters is that there is no need to attach constraints to the two subtasks; however, the independent sub-models do not allow the relationship between the two subtasks to be fully utilized. Therefore, studies must be conducted for achieving global optimization of joint extraction. Based on sharing model parameters, Sun et al. [74] proposed a global loss function to explore the mutual influence of the entity and relational models. Most existing methods determine relation types only after all entities have been recognized so the interaction between relation types and entity mentions is not fully modeled. Takanobu et al. [75] applied a hierarchical reinforcement learning framework to enhance the interaction between entity mentions and relation types. The high-level process detects the relationship indicator at a specific location. If a relationship is determined, a low-level process is triggered to identify the entity corresponding to the relationship. When low-level tasks are completed, the high-level reinforcement learning process continues to search for the next relationship in the sentence. Li et al. [76] transformed the entity relationship extraction task into multiple rounds of questions and answers; that is, the entity and relationship extraction is transformed into a task of determining the answer from the context. This method provides a better way to capture the label hierarchy dependency. However, this intermediate method is computationally inefficient because it needs to scan all entity template questions and related relationship template questions in a single sentence.

PARTICIPANTS OVERVIEW AND EVALUATION RESULTS
A total of 740 teams participated in the evaluation. In the top 18 teams, three teams were from companies, 10 teams were from universities, three teams were a combination of universities and companies, and the other two teams did not disclose relevant information. Table 7 presents a summary of prize-winning teams. The top five teams in this evaluation have sorted out and submitted a brief description of the methods they used. These methods and descriptions are analyzed and summarized below.

Data Set and Evaluation of Automated Construction of Financial Knowledge Graph
• All teams used rule-based methods or labeling functions to produce a training corpus. Only one team manually labeled 20 research reports as supplementary and validation samples, in addition to the automatically generated samples.
• All teams used BERT-based models in entity extraction; in addition to models, rule-based methods are used to supplement specific entity types. One team used the BERT-softmax model, three teams used the BERT-CRF model architecture, and the other team used the BERT-MRC [38] architecture.
• In terms of relationship and attribute extraction, all teams used a method based on co-occurrence.
Co-occurrence is the basic assumption of distant supervision; that is, when two entities appear together in a short text, it can be assumed that they have a corresponding relationship. Based on the assumption of co-occurrence, the three teams used rule-based methods to determine whether this relationship existed, and the other two teams used BERT-based models to classify the relationships.
• A team used a clustering method to cluster research reports on similar or the same topics.
Summarizing the methods used by these teams, in the tasks of entity, relationship, and attribute extraction of knowledge graphs, the method based on the BERT pre-training model is still the best and most popular currently; it is also widely used. Because this evaluation is very close to the real application scenarios of the industry, in addition to using the BERT-based model, the rule-based method is still very effective in some cases and is an effective complement to the algorithms.

CHALLENGES AND LOOKING AHEAD
From the results of this evaluation, the highest F1 value is approximately 0.5, when automatically constructing a financial knowledge graph based on the predefined schema, which is far from the requirements of real applications. This sets out some more challenging topics and new directions for research in knowledge graphs.

Data Set and Evaluation of Automated Construction of Financial Knowledge Graph
• In the field of automatically constructing knowledge graphs with a given schema and seed knowledge graph, the existing methods are not very effective. Developing end-to-end methods or multi-step frameworks to automate the construction of knowledge graphs is still a difficult task.
• Given a knowledge graph with schema, it is worthwhile to determine how to automatically annotate training data for entity, attribute, and relationship extraction. In addition, with the automatically annotated training data, building an excellent model to construct a high-precision and high-recall knowledge graph is still a challenge.
• In terms of entity extraction, the prize-winning participants in the evaluation used the BERT-based model with the rule-based method. Further research should be conducted about the end-to-end model and unified framework for this real scenario.
• Relationship and attribute extraction is currently focused on the use of co-occurrence with a rulebased or model-based filter, which highly depends on the performance of entity extraction. Entity extraction, having high precision and high recall, results in good relationships and attributes extraction. However, a method for achieving a good relationship and attribute extraction when there is considerable noise from entities is worthwhile to be developed.
• This evaluation did not consider the use of an end-to-end model for the joint extraction of entities and relationships. A possible reason is that there are too many types of entities and relationships, but few train corpora. Therefore, developing an end-to-end model in this situation is challenging.
• Studies on extending the knowledge graph schema, such as 50 entity types, hundreds of entity attributes, and relationships between entities should be performed.
• Further research on automatic construction of multilingual knowledge graphs should be conducted.
The evaluation in this paper did not take multilingualism into account, since our goal is a financial research report knowledge graph (FR2KG) in the Chinese langauage. In particular, the FR2KG data set did not involve the fusion of entities among multiple languages. Constructing knowledge graph from a multilingual corpus involves many new topics, including entity alignment and extraction relationships between different language entities. Simultaneously, it would be meaningful to evaluate the automatic construction of multilingual knowledge graphs.
• This evaluation implies the disambiguation and fusion of a small number of entities. There is no explicit evaluation of this area. In this regard, the evaluation of knowledge disambiguation and fusion will be more and more active in the future.
• It is a significant topic to study the difficulty of entity extraction, attribute extraction and relationship extraction in detail. In addition, it is also valuable and meaningful to set reasonable metrics for the automatic construction of knowledge graph.

CONCLUSIONS
In this paper, we introduce a high-quality data set, named financial research report knowledge graph (FR2KG), which consists of 17,799 entities, 26,798 relationship triples, and 1,328 attribute triples covering 10 entity types, 19 relationship types, and 6 attributes. We present an overview of the evaluation task of automated construction of Financial Knowledge Graph at CCKS2020. In addition, we summarized the technologies for automatically constructing knowledge graphs, and introduced some challenging topics and new directions for research in knowledge graphs.

AUTHOR CONTRIBUTIONS
W.G. Wang (wangwenguang@datagrand.com) is the team leader for this project. He provided overall technical leadership, designed the data schema, performed curation work and contributed to writing and editing of the manuscript. Y.L. Xu (xuyonglin@datagrand.com) investigated the latest progress of Relation Extraction and wrote relevant chapters. C.H. Du (duchunhui@datagrand.com) investigated the latest progress of Entity Extraction and wrote relevant chapters. Y.W. Chen (chenyunwen@datagrand.com) organized the data annotation and contributed to writing and editing of the manuscript as a senior author. Y.Y. Wang (wangyijie@datagrand.com) and H. Wen (wenhui@datagrand.com) contributed to the statistics and review of evaluation results. All the authors have made meaningful and valuable contributions in revising and proofreading the resulting manuscript.

DATA AVAILABILITY STATEMENT
All the data are available at Data Intelligence's data repository at the Science Data Bank, https://doi. org/10.11922/sciencedb.01060, under an Attribution 4.0 International (CC BY 4.0).