Integrating Manifold Knowledge for Global Entity Linking with Heterogeneous Graphs

Abstract Entity Linking (EL) aims to automatically link the mentions in unstructured documents to corresponding entities in a knowledge base (KB), which has recently been dominated by global models. Although many global EL methods attempt to model the topical coherence among all linked entities, most of them failed in exploiting the correlations among manifold knowledge helpful for linking, such as the semantics of mentions and their candidates, the neighborhood information of candidate entities in KB and the fine-grained type information of entities. As we will show in the paper, interactions among these types of information are very useful for better characterizing the topic features of entities and more accurately estimating the topical coherence among all the referred entities within the same document. In this paper, we present a novel HEterogeneous Graph-based Entity Linker (HEGEL) for global entity linking, which builds an informative heterogeneous graph for every document to collect various linking clues. Then HEGEL utilizes a novel heterogeneous graph neural network (HGNN) to integrate the different types of manifold information and model the interactions among them. Experiments on the standard benchmark datasets demonstrate that HEGEL can well capture the global coherence and outperforms the prior state-of-the-art EL methods.


INTRODUCTION
Entity Linking (EL) is the task of mapping entity mentions with specified context in an unstructured document to corresponding entities in a given Knowledge Base (KB), which bridges the gap between abundant unstructured text in large corpus and structured knowledge source, and therefore supports many knowledge-driven natural language processing (NLP) tasks and their methods, such as question answering [1], text classification [2], information extraction [3] and knowledge graph construction [4].
Recently, EL task has been dominated by the global methods [5,6,7,8,9,10,11,12,13,14,15], which model the topical coherence among the linked entities of mentions in the same document. Global information relies on the semantic and topical coherence of entities related to various mentions in the same document, which is integrated with local mention-contextual information by most state-of-the-art models to alleviate the biases from local contextual information. For instance, as shown in Figure 1, for linking the mention "England", it is difficult to decide between the candidate entities England national football team and England national rugby union team when only using the surrounding sports-related local context where there are the scores of matches or the name of stadium, which may contain noises and lead the linking result to the more popular but wrong candidate England national football team. However, if an EL model can capture the topical coherence of the common topic "rugby" among all the mentions "Scotland", "Murrayfield", "Cuttitta" and "England" in the current paragraph, such as taking the nearby mention Cuttitta into consideration, which is linked to the candidate Marcello Cuttitta, a former Italian rugby union player, the model can correctly link the mention "England" to the candidate England national rugby union team. Figure 1. The illustration example. By considering the topical coherence, an EL model can accurately link the mentions "Scotland", "Murrayfi eld", "Cuttitta" and "England" to their corresponding entities (in bold) that share the common topic "rugby".
Although prior global EL approaches have greatly boosted the performance of local models, most of them do not simultaneously consider multiple types of useful information and the interactions among them, such as the semantics of mentions and their candidates, the neighborhood information of candidate entities in KB and the fine-grained type information of entities, when modeling the global coherence, and thus fail to precisely estimate the coherence among referred entities. As we will show in the paper, effectively modeling the interactions among the manifold linking knowledge can help to better model the topical coherence and achieve more accurate EL.

Integrating Manifold Knowledge for Global Entity Linking with Heterogeneous Graphs
Most recently, some global methods [14,16] construct a document-level graph with candidate entities of the mentions as nodes and exploit Graph Convolutional Networks (GCN) [17] on the graph to integrate the global information, delivering promising results. Inspired by the effectiveness of using GCN to model the global signal, we present HEterogeneous Graph-based Entity Linker (HEGEL), a novel global EL framework designed to model the interactions among manifold heterogeneous information from different sources by constructing a document-level informative heterogeneous graph and applying a heterogeneous architecture in GNN aggregation operation. We first constructed a document-level informative heterogeneous graph with mentions, candidate entities, and neighbors of entities and extracted keywords as nodes, and we created different types of edges to link these different types of nodes. Then we applied a meticulously designed heterogeneous graph neural network (HGNN) on the constructed heterogeneous graph to encode the global coherence, which allows information propagation along the informative graph structure and encourages sufficient interactions among different types of information. Followed by traditional scoring combining and ranking procedure, our model can be trained to use the information under an end-to-end fashion.
Our contributions can be summarized as follows: • We designed a novel approach to construct a document-level informative heterogeneous graph to collect manifold linking knowledge from different sources to support the linking process.
• We proposed a meticulously designed heterogeneous graph neural network on the constructed graph, which integrates different sources of information and encourages sufficient interactions among them, more precisely characterizing the topic features of candidate entities and better capturing the topical coherence. To the best of our knowledge, this is the first work to employ a heterogeneous graph neural network in Entity Linking tasks.
• Extensive experiments and analysis on six standard EL datasets demonstrate that our HEGEL achieves state-of-the-art performance over mainstream EL methods.

E ntity Linking
Most existing models not only use local methods relying on local context of individual mentions independently [18,19,20,21,22,23], but also use global methods considering the coherence among the linked entities of all mentions by jointly linking on the whole document [9,13]. Most local methods make use of extracted local features through feature engineering, which includes pair-wise statistic features, like Wikipedia linking frequency, and the similarity scores between mentions and candidate entities, like the mention-entity similarities implemented as cosine similarities between document local contexts and entity Wikipedia titles in [13]. Recently, Pretrained Language Models (PLMs), which achieve leading performance in other natural language processing tasks, are also used in local linking models. The PLM-based linking models focus on unique settings, such as zero-shot [22] and multilingual [23] scenarios, to exploit the superiority of PLMs in understanding tasks under these settings. To alleviate the noise led by the local information, global methods try to model the semantic coherence and relationship between linked entities Integrating Manifold Knowledge for Global Entity Linking with Heterogeneous Graphs within the same document. As the global coherence optimization problem is NP-hard, different approximation methods are often used. Apart from traditional methods like loopy belief propagation [8,11], several works approximate the problem into sequence decision problem [6] or graph learning [5,7,14,16].
Following the graph based neural network modeling methods, HEGEL expands the graph utilization in EL task to heterogeneous style, which not only enjoys the strong representation ability of heterogeneous graph structure, but also becomes effective enough because of avoiding other additional inference steps required in sequence-style models.

G raph Neural Networks
Graph Neural Network (GNN) is a strong and flexible framework to learn on data with graph structure. After the Graph Convolutional Network (GCN) [17] appeared, GNN is more and more widely used in many tasks, while several popular GNN architectures, such as GraphSAGE [24] and GAT [25], are proposed to learn the representation on graphs. The natural graph structure entailed in EL task becomes a favorable condition to apply GNN methods to model the global information. NCEL [5] performs GCN on constructed subgraphs for every mention, where the nodes are entity candidates of current mention and surrounding mentions with edges linked from the former to the latter. SGEL [7] combines the features of mention-bymention sequential model and GAT by building a graph containing previous predicted entities, current candidates and later unpredicted mention candidates as nodes. GNED [16] builds a homogeneous graph by embedding the entities and words into the same vector space, and extracts words from the description and context in KB for every candidate to form the nodes and edges.
As the emergence of massive heterogeneous information, many works about Heterogeneous GNN have been proved to be effective. The mainstream of HGNN models is based on the construction of metapaths [26], but several HGNN architectures free of metapath are proposed recently [10]. Our HEGEL follows these works, and utilizes the heterogeneous structure to model the interactions among different types of linking information.

P ROBLEM FORMULATION
Given a list of entity mentions M = {m 1 ,…m |M| } in a document D, the EL task can be formulated as linking each mention m i to its corresponding entity i e from the entity collection E of KB or NIL (i.e., = i e NIL , which means the mention m i cannot be linked to any corresponding entity in E reasonably). Generally speaking, EL methods usually consist of two stages.

C andidate Generation Stage
EL tasks generally start with generating a small list of candidate entities ⊂

Integrating Manifold Knowledge for Global Entity Linking with Heterogeneous Graphs
For candidate generation, we used the method proposed in [8,11], which simply uses (1) computed mention-entity prior ˆ( | ) p e m by averaging probabilities from mention entity hyperlink statistics of Wikipedia; and (2) the local context-entity similarity, which is simply calculated as the similarity between candidate entity embeddings and average embeddings of context words.
This stage aims to contain the correct entity i e into C i , and the ratio of candidate lists containing corresponding entity is referred to as the recall of candidate generation.

Candi date Disambiguation Stage
In this stage, EL methods assign a score calculated in EL model to each candidate i k e and select the top-ranked candidate as the predicted answer, or predict NIL under some specified situations. Most EL methods, including this work, focus on improving performance at this stage. As mentioned in Section 2.1, different local and global models are used to calculate the linking scores. Local methods focus on the corresponding mention itself, regardless of other mentions or linked entities. That is to say, the local methods deal with the linking problem by independent calculation for every mention: where Ψ Local is a scoring function for the mention-entity pair. Different from the local methods without inter-mention interaction, interdependencies measured by global methods can be generally represented as the coherence scoring function which takes into account entity topic coherence:

THE PROPO SED APPROACH
In addition to separately encoding the local features for every mention within a document as local models do, HEGEL constructs an informative heterogeneous graph for each document and then applies a heterogeneous GNN on it, which encodes the global coherence based on different types of information. Finally, HEGEL combines the local and global features and generates a final score for each mentioncandidate pair. Figure 2 gives an overview of HEGEL that follows a four-stage processing pipeline: (a) encoding local features for each candidate independently, (b) informative graph construction for the document, (c) applying heterogeneous GNN on the graph, and (d) combining local and global features for scoring. Figure 2. The overall framework of our proposed model HEGEL with a real experiment case. The blue nodes denote the mentions in one document; the orange nodes denote the candidate entities; the black nodes indicate the neighbors extracted from Wikipedia KB; and the green nodes indicate the keywords extracted from the fi rst sentence of entities in Wikipedia. The heterogeneous graph in right part (c) can provide the discriminative linking information through the fl ow on the topological connections.

Encoding L ocal Features
Given a mention m i in D and a candidate entity ∈ where v e , v w entity embeddings and word embeddings trained in [8], and diagonal matrix A, B are both trainable ; (c) the Type Similarity Ψ ( , ) T i i k e m , which estimates the similarity between the types (PER, GPE, ORG and UNK) of m i and i k e by training a typing system proposed in [15]:

Integrating Manifold Knowledge for Global Entity Linking with Heterogeneous Graphs
where Emb T (t) is trainable type embedding for type t. As the mentions and entities use the same embedding set, m i and i k e with the same type will have a higher  T than other different types.

Informative Heterogeneous Graph Construction
For the document D, HEGEL builds an informative heterogeneous graph G D to collect different types of linking clues.
As shown in Figure 2, G D = <V D , E D > contains three types of nodes: mention nodes V Ment , entity nodes V Ent and keyword nodes V Word . Therefore, the node set duplicate is removed, and the common neighbors in KB of at least two candidate entities in V Ent,1 , or formally As reserving all neighbors in KB of V Ent,1 is computationally unacceptable, we eliminate those nodes with only one neighbor in V Ent,1 from V Ent,2 because neighbors bridging two candidates are more informative for determining the relation between candidates, which is theoretically explained and experimentally proved in [12,27]. V Word consists of the keywords extracted from the Wikipedia page of each candidate in V Ent,1 . We found that the first sentence on the Wikipedia page of an entity usually contains more fine-grained type information of the entity, which is a very useful linking clue. Therefore, for e in V Ent,1 , we extracted the first sentence s from its Wikipedia page, found the first link verb in s, and picked the continuous phrase immediately after the link verb, which contains nouns, adjectives and conjunctions only. We regarded the words in the picked phrase, except stopwords, as keywords characterizing the fine-grained type of e, and added them into V Word .
After the node set V D is generated, HEGEL creates heterogeneous edges between nodes of the same or different types by following rules: (a) the edges between two mention nodes E MM ⊂ V Ment × V Ment are created between adjacent mentions (m i ,m i+1 ) in D; (b) the edges between two entity nodes E EE ⊂ V Ent × V Ent are created while there is a relation between them in KB; (c) the edges between two word nodes E WW ⊂ V Word × V Word are created while the cosine similarity of two word embeddings is higher than a given threshold e; (d) the edges from entities to mentions E EM ⊂ V Ent,1 × V Ment are consistent with the mention-candidate relation; (e) the edges from words to entities E WE ⊂ V Word × V Ent,1 are created while the word is one of the keywords for the entity. Note that (d) and (e) are uni-directional while (a)-(c) are bi-directional, and the performance

Integrating Manifold Knowledge for Global Entity Linking with Heterogeneous Graphs
of constructing bi-directional edges for (d) and (e) will be discussed later. In short, the entire edge set can be represented as

Heterogeneous Gra ph Neural Network
Given a constructed heterogeneous informative graph G D , HEGEL applies a designed heterogeneous graph neural network (HGNN) on it to integrate different sources of manifold information and encourage the interactions among them, generating information-augmented embeddings of V Ment and V Ent,1 for later candidate scoring and ranking.
In order to avoid the requiring of expertise knowledge and information loss led by the former metapathbased HGNN methods, we designed a novel metapath-free HGNN model. For the heterogeneous graph G D , we represent an edge e ∈ E D from node i ∈ V D to node j ∈ V D with edge type r as (i, j, r). Note that in our informative graph, the node type (t i , t j ) can exclusively determine the edge type r, and therefore we denote (t i , t j ) as r in following explanation.

Inter-Node Propagatio n
A node should receive different types of information from its heterogeneous neighborhood in different ways. Motivated by previous work about metapath [26], HEGEL models the different types of information propagation with multiple feature transformations on different adjacent relations. Taking edge type r = (t i , t j ) into consideration, a node v j with type t j collects information from its neighborhood N(v j ) with type t i in l-th layer by a Graph Convolutional Network (GCN):

Integrating Manifold Knowledge for Global Entity Linking with Heterogeneous Graphs
is v i 's embedding before the l-th layer, is a trainable matrix in the l-th layer, is v j 's new embedding related to t i , and Z is the normalization factor. Note that for edge types (t i , t i ) connecting nodes with the same type, self-loop connections are added into its edge set.

Intra-Node Aggregation
In order to preserve the information from different types of relationship with neighborhoods, for the node v j , HEGEL aggregates new embeddings to generate the input +1 l v j h for next layer: where

Global Score Calculation
After obataining the information-augmented embeddings L m i h for mention m i and L e i k h for corresponding candidate i k e , we ensure that d L,Ment = d L,Ent , HEGEL applies a bi-linear similarity calculation to represent the global compatibility between the mention-candidate pair: is a trainable diagonal matrix.

Feature Combining and Model Training
HEGEL combines local features and the global compatibility score to compute the linking score for each candidate

Integrating Manifold Knowledge for Global Entity Linking with Heterogeneous Graphs
where f is a two-layered fully connect neural network. Following previous works, HEGEL attempts to make the ground truth entity i e ranking higher than other candidates, and therefore minimizes the following margin-based ranking loss: where c > 0 is the margin hyper-parameter, and [x] + is equal to x when x > 0, or equal to 0 otherwise.

Datasets
Fo llowing previous EL practice, we evaluated HEGEL on the benchmark dataset AIDA CoNLL-YAGO [19] for training, validation and the in-domain testing. To examine its cross-domain generalization ability, we used five popular datasets for cross-domain testing: MSNBC [29], AQUAINT [30], ACE2004 [13], CWEB [9] and WIKIPEDIA [9]. Table 1 shows the statistics and corresponding recall of candidate generation of all datasets used in our experiments. Note: Recall represents the ratio of ground truth entities appearing in the generated candidate lists of corresponding mentions in the datasets.

Model Variant
To examine our claim that the heterogeneous feature of GNN plays a crucial role in HEGEL, we implemented a semi-heterogeneous version of HEGEL, called HEGEL-semi, which shares the parameters of GCN about different node types in every layer, respectively, except the first layer, as the dimensions of input node embeddings are different and unable to be processed in the non-heterogeneous way:

Integrating Manifold Knowledge for Global Entity Linking with Heterogeneous Graphs
As the K1 parameter-sharing layers do not use different parameters to deal with different types of nodes, they do not enjoy the benefit from heterogeneous graph structure. Therefore, the performance of HEGELsemi should be lower than HEGEL according to our claim about the effect of a heterogeneous method.

Exp eriment Settings
As we used the pre-trained Word2vec [31] word embeddings, and entity embeddings released by [8], the embedding dimension d h is fixed to 300. The hyper-parameters are manually tuned based on the validation performance on AIDA-A. CNN output dimension d cnn = 64, all informative graph embedding dimensions d l,t = 32, l = 1, ..., L, number of HGNN layers L = 2, margin c = 0.01, K = 40, dropout rate is set to 0.5, and E WW threshold e = 0.5. To confine the graph size within a computable range, all documents with more than 80 mentions will be split into several documents as average as possible.
We used Adam optimizer to train HEGEL with a learning rate of a = 2e  4. The model is evaluated per 3 epochs, and the training process is terminated while the highest validation performance does not exceed 10 evaluations.
Because of achieving the best performance on AIDA-A, HEGEL-semi is implemented under the same settings with HEGEL.

Com pared Baselines
To illustrate the effect of modeling the interactions among different types of information, we evaluated and compared the performance of our HEGEL with nine existing methods on in-and cross-domain datasets: • AIDA [19] builds a graph whose weights are coherent score and similarity, and applies traditional statistics method on it.
• GLOW [13] designs several statistics features of both local and global with Wikipedia linking structure.
• RI [18] provides an Integer Linear Programming (ILP) formulation of Wikification and incorporates the entity-relation inference problem.
• WNED [9] builds disambiguation graphs and applies iterative random walks on it based on Information Theory.
• Deep-ED [8] leverages learned neural representations based on local context windows for joint document-level entity linking.

Integrating Manifold Knowledge for Global Entity Linking with Heterogeneous Graphs
• Ment-Norm [11] treats and exploits relations between entities as latent variables based on Deep-ED [8].
• GNED [16] applies GCN and CRF on a homogeneous graph with extracted words and entities as nodes.
• NCEL [5] applies GCN on a bipartite to integrate both local contextual features and global information.
• SGEL [7] builds a graph for every mention sequentially, containing previous linked entities and candidates of unpredicated mentions.
It is worth noting that GNED claims they firstly construct a heterogeneous entity-word graph to model global information, but their nodes are not heterogeneous indeed as entity nodes share the same vector space with words. In addition, they do not apply any heterogeneous architecture in their GNN, as they regard all edges as the same type. Therefore, HEGEL is the first work to employ a heterogeneous GNN in EL tasks to our best knowledge.

Mai n Results
We report the performance of all the compared baselines and our HEGEL in Table 2. The top part shows the performance of non-GNN-based baselines, and other baselines are GNN-based.

In-domain
Cross-domain

Integrating Manifold Knowledge for Global Entity Linking with Heterogeneous Graphs
The in-domain test dataset AIDA-B, which shares the similar data distribution with training dataset AIDAtrain and validation dataset AIDA-A, is the most important benchmark. By modeling the latent relation between mentions and injecting entity coherence into it, which can be regarded as simply interaction between two types of information, Ment-Norm outperforms all baselines on AIDA-B. It shows that the interactions of heterogeneous information are beneficial for capturing global coherence. We observed that HEGEL, which integrates manifold linking knowledge in a more interactive and effective way for capturing the global coherence, significantly outperforms the Ment-Norm method. The fact shows that our HEGEL can encourage richer interactions among different types of information and greatly improve the performance.
It should be figured out that none of the models can consistently achieve the best F 1 -score on the all five cross-domain datasets. HEGEL outperforms the other two GNN-based methods, NCEL and SGEL, on MSNBC and ACE2004. It shows that our HEGEL can handle cross-domain linking cases better than them in some extent.
Our HEGEL performs extremely well on in-domain cases by making full use of different types of linking clues for better capturing the global coherence, but it seems that there is no advantage on the cross-domain datasets. We found that the ground truth entities of cross-domain test sets are less popular, where the linking clues are sparse. To improve the generalization ability on such tough cases, the only effective way seems to be introducing large-scale corpus for training, aiming to more or less "see" the linking clues of crossdomain entities at the training stage. We will try to introduce large-scale pre-trained language models, such as BERT, to improve the generalization ability of our HEGEL in the future.
As the HEGEL-semi is also implemented under L = 2, it contains one heterogeneous layer and one parameter-sharing layer. The results shown in Table 2 approve that although the HEGEL-semi outperforms the local model, its lack of heterogeneous information propagation in the second layer leads to the obvious drop of performance compared with HEGEL. The heterogeneous GNN is important for HEGEL to achieve the good performance.
Comparing GNED with our simpler and effective way to extract keywords within the first sentence from the Wikipedia page of corresponding entity, they search on the whole Wikipedia KB to find the hyperlinks to corresponding entity and extract contexts in preprocessing stage, which have to iterate through all |E| entities and become very time-consuming. Even with less keyword evidence, our strategy still ourperforms GNED on in-domain dataset with lower time overhead. GNED accesses more additional linking clues and reach better performance on cross-domain datasets, and we suppose that the richer information can also improve the generalization ability of our HEGEL, and further boost our performance on cross-domain datasets.

Ablation Study
As shown in the bottom part of Table 2, HEGEL boosts the performance of local model with an average improvement of 1.77%, which shows that HEGEL is able to greatly enhance the local model.

Integrating Manifold Knowledge for Global Entity Linking with Heterogeneous Graphs
To further examine the effect of our heterogeneous model, we removed the keyword nodes V Word and neighbor nodes V Ent,2 from V D , respectively, and therefore the related edges from E D as well. After that, there is a significant drop in performance (0.89% and 0.61% on average, respectively) across datasets, especially in-domain AIDA-B (1.71% and 1.43%). The results demonstrate the effectiveness of introducing the keyword (fine-grained type) information and neighborhood information of candidate entities and modeling the interactions among them, which can help to accurately capture the topical characteristics of candidates.

1 The Impact of Edge Directions
As referred in Section 4.2, HEGEL only keeps one direction for E EM and E WE . We suppose that adding edges from V Ment to V Ent,1 and from V Ent,1 to V Word will lead to the over-smooth problem, as candidates to be disambiguated are related to the same mention and maybe the same keywords, where they might entangle with each other and make the disambiguation harder. As expected, the results shown in Table 3 prove that keeping these edges uni-directional can alleviate over-smooth and enhance the performance.

The Impact of the Number of GNN Layers
Despite the powerful ability of GNN to process graph-structured data, most of them are shallow, which means that they do not have many propagation layers. As shown in [32], stacking many layers with nonlinear function will degrade the performance of GNN-based models due to the over-smoothing problem. Therefore, we examined the performance of HEGEL with different number of layers. The results shown in Figure 3 agree with previous GNN-related works as HEGEL with K = 2 layers leads to the best performance in EL task. Too many layers will lead to the over-smoothing problem, and 1-layer model is not enough to propagate the heterogeneous information required for the aggregation and interaction on the graph. To alleviate the over-smoothing problem in training deeper GNN, residual connection [33] is used between the hidden layers of GNN as a variant in order to facilitate the information retention through deeper models [17]. The residual connection enables HEGEL to carry over the heterogeneous information from the input embeddings of previous layer by modifying Equation (14): However, as shown in Figure 3, applying residual connection on the model with K = 2 causes the dropping of both in-domain and cross-domain performance. Though the residual connection boosts the in-domain performance on the case of K ≥ 3, they are still not comparable with the best performance of K = 2. We thought it might be related to the information handling method varying from layers of HGNN, as the heterogeneous structure in various propagation steps is obviously too different to be handled by the same layer of network correctly.

Er ror Analysis
We randomly sampled and analyzed 100 mentions from all mentions that were incorrectly linked by HEGEL from in-domain dataset AIDA-B and the most difficult cross-domain dataset CWEB, respectively. As shown in Table 4, the four major error types contain: (1) Topic Errors, which happened when HEGEL links the candidate of different (usually unrelated) topics with gold entity, are the main challenge faced by current global methods; (2) Similar Entity Error, which means that the predicated candidate and gold entity have too similar semantics to be disambiguated by local and global information, and might be solved by introducing more information in future works; (3) Related Entity Error, which happened when the predicated entity is semantically closely related to the gold one, such as a city and a stadium located in it or a hypernym of gold entity; (4) Dataset Annotation Errors, which means the gold entity offered in dataset is wrong and different from the predicted one, only occurs in CWEB. Note: Square brackets denote the current target mentions. Italicized and underlined entities are the prediction results of HEGEL and the gold entities given in datasets, respectively.

Case Study
As shown in Figure 2, HEGEL needs to map the mentions "Scotland", "Murrayfield", "Cuttitta" and "England" in the same document to corresponding entities. "Murrayfield" and "Cuttitta" are not ambiguous as they have only one candidate, respectively. However, "Scotland" and "England" are linked to wrong candidates by local model, where our HEGEL outputs the right answers by correctly modeling the interactions among heterogeneous types of information, especially from the neighborhood around "Marcello Cuttitta" (a former rugby union player) and "Rugby Union", and from the respective keywords related to "rugby". Ablation score calculating results shown in Table 5 manifest that information from keyword nodes V Word and neighbor nodes V Ent,2 and correctly handling the information are both important for HEGEL to correctly capture the topical coherence and model the heterogeneous interactions.

CONCLUSION AND FUTURE WORK
In this paper, we presented HEGEL, a novel graph-based global entity linking method, which is designed to model and utilize the interactions among heterogeneous types of information from different sources. We achieved this aim by constructing a document-level informative heterogeneous graph and applying a heterogeneous GNN to propagate and aggregate information on the graph, which is hard to achieve by previous homogeneous architectures. Extensive experiments on standard benchmarks show that HEGEL achieves state-of-the-art performance in EL task.