Abstract
Within Open Relation Extraction (ORE) tasks, the Zero-shot ORE method is to generalize undefined relations from predefined relations, while the Unsupervised ORE method is to extract undefined relations without the need for annotations. However, despite the possibility of overlap between predefined and undefined relations in the training data, a unified framework for both Zero-shot and Unsupervised ORE has yet to be established. To address this gap, we propose U-CORE: A Unified Deep Cluster-wise Contrastive Framework for both Zero-shot and Unsupervised ORE, by leveraging techniques from Contrastive Learning (CL) and Clustering.1 U-CORE overcomes the limitations of CL-based Zero-shot ORE methods by employing Cluster-wise CL that preserves both local smoothness as well as global semantics. Additionally, we employ a deep-cluster-based updater that optimizes the cluster center, thus enhancing the accuracy and efficiency of the model. To increase the stability of the model, we adopt Adaptive Self-paced Learning that effectively addresses the data-shifting problems. Experimental results on three well-known datasets demonstrate that U-CORE significantly improves upon existing methods by showing an average improvement of 7.35% ARI on Zero-shot ORE tasks and 15.24% ARI on Unsupervised ORE tasks.
1 Introduction
Relation Extraction (RE) is a fundamental task in Natural Language Processing (NLP) that aims to extract the relationships between pairs of entities mentioned in a given text, such as identifying the Effect-Cause relation between “fire” and “fuel” in the sentence “The fire was caused by exploding fuel.” RE is an essential component of NLP systems that can facilitate diverse downstream tasks, including Question Answering (Soares and Parreiras, 2020), Knowledge Graphs (Ji et al., 2021), and Dialogue Systems (Chen et al., 2017).
While supervised methods have demonstrated great success in extracting predefined relations, in reality, new relations frequently arise, and it can be time-consuming and labor-intensive to define them manually. As a result, open-domain relation extraction has become a popular research topic. Based on prior studies, Open Relation Extraction (ORE) tasks can be categorized into two types, namely, Zero-shot Open Relation Extraction (ZORE) and Unsupervised Open Relation Extraction (UORE). ZORE aims to extract novel relational facts where the target relation types are not observed in the training set (Levy et al., 2017). On the other hand, UORE has the objective of extracting undefined relations without any annotation or prior knowledge (Elsahar et al., 2017).
In recent years, significant attention has been devoted to ORE tasks (Obamuyide and Vlachos, 2018; Hu et al., 2020; Chen and Li, 2021). Despite variations in complexity and annotations, both ZORE and UORE aim to develop an optimal encoder that can generate an appropriate relational representation using limited resources. Additionally, practical scenarios involving ZORE and UORE may overlap, such as when training data contain both predefined and undefined relations. Consequently, the previous ZORE and UORE methods, which concentrate on predefined or undefined relations, respectively, may yield suboptimal results due to their inability to account for all relations. Given the similarities between ORE techniques, as well as their potential overlap in practical applications, we propose a unified framework that addresses diverse ORE tasks.
Recent studies have attempted to improve relation representation in ZORE by leveraging Contrastive Learning (CL) approaches (Chen and Li, 2021; Wang et al., 2022). These methods typically rely on instance-wise CL, which aims to bring together relations from the same instances while separating those from different instances. However, Li et al. (2021) have pointed out that instance-wise CL may treat instances with similar semantic information as negative pairs, leading to a drift in their representations and resulting in performance degradation. In the ORE task, instances that belong to the same relation type share “similar semantic information” and should not be treated as negative paris. To address this limitation, we propose employing a cluster-wise contrastive learning approach, which facilitates the alignment of relations within the same clusters and the separation of relations across different clusters.
Contrarily, in UORE tasks, the training process is unsupervised, and existing models tend to learn structural information from clustering techniques (Hu et al., 2020; Liu et al., 2021). However, the majority of existing UORE methods depend on conventional clustering algorithms, such as k-means, for defining clustering centers. Re- clustering at the end of each epoch is typically required, which may be time-consuming and computationally intensive. Consequently, we incorporate deep clustering into our ORE framework to eliminate the need for frequent re-clustering and enhance the clustering performance.
The combination of cluster-wise contrastive learning and deep clustering plays a crucial role in our unified ORE framework. Deep clustering enhances the performance of cluster-wise CL by providing more accurate clusters, while cluster-wise CL improves deep clustering by generating better relation representations. However, during the training process, while the encoder is updated in each minibatch, the clustering assignment is only updated at the end of each epoch. This inconsistency between the relation representations and cluster centers in the feature space is known as the data-shifting problem, which has been identified in previous works (Liu et al., 2022). This problem requires careful attention in our unified framework to avoid the misalignment between the relation representations and the cluster centers.
Based on the above analysis, we propose U-CORE: A Unified Deep Cluster-wise Contrastive Framework for Open Relation Extraction in this article. The proposed framework aims to establish a unified approach to enhance the performance of both ZORE and UORE methods. Specifically, the Cluster-wise Contrastive Learning approach is employed in our ORE framework, which increases the inter-cluster spacing of clusters while minimizing intra-cluster spacing, enabling us to overcome the limitations of previous CL-based ZORE methods. To mitigate the need for regular re-clustering and enhance overall accuracy and efficiency, we introduce a deep Cluster Center Updater. Moreover, we propose the integration of Adaptive Self-paced Learning in the proposed U-CORE to address issues regarding data-shifting and produce a more stable model. The framework is capable of obtaining an effective representation of relations and able to handle both Zero-shot ORE and Unsupervised ORE tasks. Overall, our proposed U-CORE framework contributes towards the development of highly effective and versatile techniques for Open Relation Extraction. The architecture of our framework is shown in Figure 1.
We briefly summarize our contribution as follows:
We propose U-CORE, a novel deep cluster-wise contrastive framework that effectively addresses both zero-shot open relation extraction and unsupervised open relation extraction tasks.
We introduce the Cluster-wise Contrastive Module for the ORE task, which combines instance-wise and cluster-wise Contrastive Learning to optimize relation representations both locally and globally.
We introduce a deep-cluster-based Cluster Center Updater and Adaptive Self-paced Learning techniques, which enhance the efficiency and stability of our model.
We conduct experiments on 3 well-known datasets. The results demonstrate that U-CORE outperforms existing state-of-the-art methods in both ZORE and UORE tasks.
2 Related Work
In this section, we survey the related work on Open Relation Extraction, Contrastive Learning, and Deep Clustering.
Open Relation Extraction
In recent years, ORE has emerged as a significant research topic due to its practical applicability and downstream task potential. ORE tasks can be broadly classified as Zero-shot and Unsupervised open relation extraction. The former aims to distinguish novel relations without relying on prior training instances. Some researchers, such as Levy et al. (2017) and Obamuyide and Vlachos (2018), have drawn a parallel between this goal and reading comprehension or question answering. Zhao et al. (2021) have proposed a relation-oriented clustering method to solve the ZORE problem. In contrast, UORE is an unsupervised learning method that identifies semantic relation features from unannotated data. Several authors have pursued this strategy: Elsahar et al. (2017) have used re- weighted word embeddings for clustering free text, while Hu et al. (2020) have employed clustering-based techniques to generate pseudo-labels for new relation discovery. Both ZORE and UORE require robust representations of the relations. The key difference is that while ZORE is focused on extracting undefined relations from pre-existing ones, UORE optimizes representations from undefined relations themselves. Our proposed model U-CORE leverages supervised and self-supervised learning to optimize representation, making it well-suited for both ZORE and UORE tasks.
Contrastive Learning
Contrastive Learning (CL) is a powerful strategy that extracts common attributes for each data class by contrasting samples while simultaneously identifying distinguishing characteristics. The effectiveness of Contrastive Learning was first widely recognized in the field of computer vision (He et al., 2020; Chen et al., 2020). This strategy has also seen successful applications in NLP. For instance, SimCSE (Gao et al., 2021) has been developed for NLP tasks. Recently, some researchers have attempted to apply CL for ZORE tasks. For example, Wang et al. (2022) employed instance-wise CL and a relational classification module in the ZORE task. These CL methods used in ZORE rely on instance-wise Contrastive Learning designed to emphasize the relationships among similar instances while distinguishing them from different instances. However, Li et al. (2021) has highlighted a critical issue with instance-wise CL, as it can often regard entity pairs with similar semantic information as negative pairs, resulting in a local smoothness but ignoring the global semantics. Accordingly, in our ORE framework, U-CORE implements the cluster-wise contrastive learning approach, which aligns relations in the same clusters and separates relations in different ones. By doing so, we can avoid treating relations with similar semantics as negative pairs, thereby achieving improved performance.
Deep Clustering
Deep clustering (Ma et al., 2019; Guo et al., 2019) has demonstrated significant improvements over conventional algorithms in recent years. Subakti et al. (2022) has provided evidence that DEC (Xie et al., 2016) and IDEC (Guo et al., 2017) perform better than k-means in clustering sentence embeddings. In the past two years, several studies (Li et al., 2021; Caron et al., 2020; Liu et al., 2022) have attempted to integrate clustering into contrastive learning optimization. However, these works have primarily used conventional clustering algorithms such as k-means. These algorithms require re-clustering at the end of each epoch, which can significantly drain computational resources, particularly when processing massive datasets. Therefore, we have integrated deep clustering into our models. This approach results in two key benefits: 1) clustering centers can be refined in every epoch without excessive time or memory usage, and 2) the deep clustering approach has further improved the performance of cluster-wise contrastive learning.
3 Proposed Model
3.1 Relation Representation
3.2 Instance-wise Contrastive Module
3.2.1 Data Augmentation
In order to implement instance-wise contrastive learning, it is essential to utilize an appropriate augmentation method to generate positive pairs. Following the approach employed in prior research such as SimCSE (Gao et al., 2021), we employ Dropout noise as our data augmentation method. Specifically, the generated positive pairs consist of identical sentences with embeddings that differ only in terms of their dropout masks. For a relation r, its positive pair . Therefore, for a randomly sampled minibatch , we use Dropout noise to generate a pair of augmentations for each relation instance in B. This results in an augmented batch with double the size, represented as .
3.2.2 Instance-wise Contrastive Loss
3.3 Cluster-wise Contrastive Module
As described in Section 3.2, instance-wise contrastive learning results in an embedding space where each instance is distinctly separated and exhibits local smoothness. However, instance-wise contrastive learning treats two samples as a negative pair as long as they are from different instances, irrespective of whether they belong to the same relation type, causing alienation of the instances from the same type in the embedding space. To tackle this challenge, we integrate Cluster-wise Contrastive Learning.
3.3.1 Cluster Centers Initialization
3.3.2 Cluster-wise Contrastive Loss
3.3.3 Cluster Centers Updater
3.4 Adaptive Self-Paced Learning
3.5 Labeled Relation Prediction
4 Experiment
4.1 Datasets
Following previous studies, we adopt two datasets to evaluate zero-shot ORE: SemEval2010 Task8 and FewRel. SemEval2010 Task8 (Hendrickx et al., 2019) was designed to classify a set of semantic relations between pairs of concepts, such as cause-effect or instrument-agency. It contains 9 relations and an “Other” relation. Each relation possesses a distinct direction (e.g., “son of” and “father of”). Following previous works (Wang et al., 2022), we do not consider the direction of the 9 relations or use the “Other” relation in experiments. We combine the instances of the training set and testing set for each relation to obtain the overall instances. This collection consists of 10,717 instances, with different numbers of instances allocated to each relation. FewRel (Han et al., 2018), a publicly available dataset that utilizes data from Wikipedia, is specifically designed to evaluate the model’s performance in carrying out few-shot relation extraction tasks. Unlike SemEval2010 Task8, FewRel is a balanced dataset comprising 80 relations, with 700 instances for each relation. Although FewRel is primarily utilized for a few-shot learning approach, it can also be effective for zero-shot learning if the relation labels between the training and testing data are distinct.
We also carry out our unsupervised open relation extraction experiments on TACRED (Zhang et al., 2017), which is one of the largest and most widely used datasets for relation classification. TACRED is a comprehensive supervised relation extraction dataset that focuses on Text Analysis Conference’s Knowledge Base Population (TAC KBP) relations. The dataset contains an extensive collection of 21,773 positive examples sourced through crowdsourcing, encompassing a wide range of relationships.
4.2 Evaluation Settings
Zero-shot ORE Settings
Following Wang et al. (2022), we randomly select m relations as the undefined relation set Rtest, and n relations as the predefined relation set Rtrain. Note that (m + n) equals to the whole numbers of relations in the dataset and . The training data only contains the instances of predefined relations while the testing data only contains undefined relations. We repeat experiments 10 times on SemEval2010 Task8 and FewRel, then report the average clustering results on k-means. To show an appropriate clustering result, the clustering number is set to m.
Unsupervised ORE Settings
The TACRED2 dataset has been officially split into the training, validation, and testing sets. Following Tran et al. (2020) and Liu et al. (2022), we train the unsupervised models on the training set and report the clustering results of the testing set. For our U-CORE model, we suppose the training and testing set have the same relation types, so we directly use the clustering centers generated in the training process to assign cluster labels.
Evaluation Metrics
Baselines
To conduct the Zero-shot experiment, we conduct a comparative analysis of U-CORE against two distinct sets of models. The first set comprises supervised relation extraction models, including CNN (Zeng et al., 2014), Attention-BiLSTM (Zhou et al., 2016), and MTB (Baldini Soares et al., 2019). These models have demonstrated remarkable efficacy in supervised learning settings; however, their effectiveness in the zero-shot environment remains untested. The second set includes three zero-shot relation extraction models, namely, Supervised RSN (Wu et al., 2019), ZS-BERT (Chen and Li, 2021), and RCL (Wang et al., 2022). Following the previous setting (Wang et al., 2022), we have modified the supervised relational extraction models to fit the zero-shot experiments. These models’ outputs will be replaced with vectors that have the same dimensionality as the U-CORE. Subsequently, we utilize the k-means algorithm to predict undefined relations in our sample data.
For the Unsupervised Clustering experiment, we choose five representative models. 1) RAE (Marcheggiani and Titov, 2016) proposes a reconstruction-based method for ORE by reconstructing entities from pairing entities and predicted relations. 2) RW-HAC (Elsahar et al., 2017) involves re-weighting word vectors based on the sentence’s dependency parse tree. 3) EType+ (Tran et al., 2020) incorporates entity type knowledge into the relation extraction task. 4) SelfORE (Hu et al., 2020) leverages a pre-trained language model to detect weak self-supervised signals and group contextualized relational features into clusters. 5) HiURE (Liu et al., 2022) introduces a contrastive learning framework that utilizes cross-hierarchy attention to derive hierarchical signals from relational feature space. Note that in the studies of EType+ (Tran et al., 2020) and HiURE (Liu et al., 2022), their models were trained on the NYT-FB dataset (Marcheggiani and Titov, 2016) and tested on TACRED. However, we fail to obtain the NYT-FB dataset as it is private. Thus, we train and test EType+ and HiURE on TACRED. In order to ensure a fair comparison, the number of clusters for each baseline model has been set to 16, following previous work (Tran et al., 2020).
4.3 Results
Results on Zero-shot Open Relation Tasks
Table 1 displays the results of our experiments on ZORE tasks. Our proposed method U-CORE outperforms other state-of-the-art models on FewRel and SemEval datasets. U-CORE effectively learns the relation representations from both predefined relations and global semantics. A decrease in performance is observed with an increase in undefined relation set Rtest for all models. Moreover, our evaluation shows that SemEval is a more challenging dataset with the lower performance of all models, attributable to its imbalanced data and limited relationship with the general domains on pre-trained BERT, as also observed by Wang et al. (2022). Directly using pre-trained BERT for clustering only yields a 5.73% ARI.
FewRel | SemEval | |||||||||||
Model . | m = 5 . | m = 10 . | m = 15 . | m = 4 . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
F1 . | NMI . | ARI . | F1 . | NMI . | ARI . | F1 . | NMI . | ARI . | F1 . | NMI . | ARI . | |
CNN | 74.47 | 68.51 | 66.31 | 60.87 | 64.59 | 53.79 | 55.3 | 62.35 | 49.87 | 38.42 | 17.06 | 15.43 |
Att-BiLSTM | 82.75 | 79.36 | 76.63 | 75.89 | 79.10 | 71.46 | 69.84 | 75.94 | 66.03 | 41.6 | 21.45 | 19.97 |
Supervised RSN | 73.33 | 67.89 | 64.49 | 59.11 | 64.96 | 48.66 | 50.99 | 59.98 | 39.74 | 38.41 | 11.98 | 10.96 |
ZS-BERT | 74.51 | 69.24 | 66.96 | 70.63 | 74.10 | 65.23 | 63.33 | 70.7 | 59.24 | 35.03 | 12.47 | 9.53 |
MTB | 88.06 | 85.32 | 84.03 | 82.7 | 84.16 | 79.19 | 76.72 | 77.66 | 71.65 | 44.35 | 25.25 | 20.59 |
RCL | 89.69 | 87.12 | 85.69 | 85.61 | 86.59 | 80.36 | 81.48 | 85.64 | 78.18 | 68.02 | 55.91 | 54.71 |
U-CORE | 96.38 | 95.04 | 95.33 | 90.37 | 90.08 | 82.45 | 83.35 | 89.03 | 79.55 | 78.83 | 66.79 | 70.88 |
FewRel | SemEval | |||||||||||
Model . | m = 5 . | m = 10 . | m = 15 . | m = 4 . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
F1 . | NMI . | ARI . | F1 . | NMI . | ARI . | F1 . | NMI . | ARI . | F1 . | NMI . | ARI . | |
CNN | 74.47 | 68.51 | 66.31 | 60.87 | 64.59 | 53.79 | 55.3 | 62.35 | 49.87 | 38.42 | 17.06 | 15.43 |
Att-BiLSTM | 82.75 | 79.36 | 76.63 | 75.89 | 79.10 | 71.46 | 69.84 | 75.94 | 66.03 | 41.6 | 21.45 | 19.97 |
Supervised RSN | 73.33 | 67.89 | 64.49 | 59.11 | 64.96 | 48.66 | 50.99 | 59.98 | 39.74 | 38.41 | 11.98 | 10.96 |
ZS-BERT | 74.51 | 69.24 | 66.96 | 70.63 | 74.10 | 65.23 | 63.33 | 70.7 | 59.24 | 35.03 | 12.47 | 9.53 |
MTB | 88.06 | 85.32 | 84.03 | 82.7 | 84.16 | 79.19 | 76.72 | 77.66 | 71.65 | 44.35 | 25.25 | 20.59 |
RCL | 89.69 | 87.12 | 85.69 | 85.61 | 86.59 | 80.36 | 81.48 | 85.64 | 78.18 | 68.02 | 55.91 | 54.71 |
U-CORE | 96.38 | 95.04 | 95.33 | 90.37 | 90.08 | 82.45 | 83.35 | 89.03 | 79.55 | 78.83 | 66.79 | 70.88 |
The results of CNN, Att-BiLSTM, and Supervised-RSN are relatively low without the performance boost provided by Pre-trained Language Models (PLMs). Although ZS-BERT can achieve impressive ZORE performance, as demonstrated in the original paper, it relies on a manual description of novel relations, resulting in decreased clustering performance. While MTB can capture information from predefined relations effectively, its ability to generalize on undefined relations is insufficient. RCL, the previous state-of-the-art method, uses instance-wise CL to enhance performance, but it tends to separate similar semantics and only preserve the local smoothness of instances. The visualization of RCL in Section 4.9 reveals its failure to differentiate some similar relations. The performance of U-CORE proves that it can optimize the encoder both locally and globally to generate a better relation representation.
Results on Unsupervised Open Relation Tasks
Table 2 displays the performance of various models on the UORE tasks. The challenge of TACRED is extracting undefined relations without annotations. TACRED has 41 relations, yet we used only 16 clusters based on previous work, resulting in a higher value of B3 recall than B3 precision. Our proposed method, U-CORE, outperforms state-of-the-art models on TACRED datasets with remarkable improvements of 6.62% B3 F1, 12.74% NMI, and 15.24% ARI. The proposed cluster-wise contrastive module of U-CORE minimizes intra-cluster distances while maximizing inter-cluster distances, leading to a more accurate clustering distribution closer to the actual distribution. This has led to substantial improvements, especially in the ARI value. The performance on the UORE task shows that U-CORE excels in self-training and can effectively learn relation representations from global semantics.
TACRED . | |||||
---|---|---|---|---|---|
Model . | F1 . | P . | R . | NMI . | ARI . |
RAE | 40.82 | 34.70 | 49.55 | 33.51 | 26.42 |
RW-HAC | 50.94 | 42.61 | 63.33 | 51.67 | 28.15 |
EType+ | 49.91 | 41.05 | 63.65 | 45.51 | 31.85 |
SelfORE | 54.16 | 51.06 | 57.64 | 61.91 | 44.70 |
HiURE | 57.12 | 54.13 | 60.46 | 63.03 | 46.16 |
U-CORE | 63.74 | 59.61 | 68.49 | 75.77 | 61.40 |
TACRED . | |||||
---|---|---|---|---|---|
Model . | F1 . | P . | R . | NMI . | ARI . |
RAE | 40.82 | 34.70 | 49.55 | 33.51 | 26.42 |
RW-HAC | 50.94 | 42.61 | 63.33 | 51.67 | 28.15 |
EType+ | 49.91 | 41.05 | 63.65 | 45.51 | 31.85 |
SelfORE | 54.16 | 51.06 | 57.64 | 61.91 | 44.70 |
HiURE | 57.12 | 54.13 | 60.46 | 63.03 | 46.16 |
U-CORE | 63.74 | 59.61 | 68.49 | 75.77 | 61.40 |
4.4 Ablation Study
Effect of Cluster-wise Contrastive Module
We have introduced a Cluster-wise Contrastive Module (CCM) to prevent the identification of instances with similar semantics as negative pairs. As shown in Table 3, the performance of U-CORE without CCM has a significant decrease. Additionally, the performance of U-CORE without CCM on the SemEval dataset is similar to that of RCL, which utilizes instance-wise contrastive learning. This highlights the ability of our cluster-wise contrastive module to capture global semantic structures and effectively generalize over undefined relations.
Dataset . | Model . | F1 . | NMI . | ARI . |
---|---|---|---|---|
SemEval | w/o CCM | 68.85 | 53.17 | 54.17 |
w/o Updater | 75.39 | 65.61 | 68.16 | |
w/o ASP | 73.44 | 60.35 | 62.71 | |
U-CORE | 78.83 | 66.79 | 70.88 | |
w self-training | 80.28 | 70.24 | 70.94 | |
TACRED | w/o CCM | 60.17 | 71.43 | 54.22 |
w/o Updater | 61.07 | 73.25 | 58.19 | |
w/o ASP | 62.49 | 74.11 | 60.39 | |
U-CORE | 63.74 | 75.77 | 61.40 | |
w self-training | 67.33 | 78.41 | 62.25 |
Dataset . | Model . | F1 . | NMI . | ARI . |
---|---|---|---|---|
SemEval | w/o CCM | 68.85 | 53.17 | 54.17 |
w/o Updater | 75.39 | 65.61 | 68.16 | |
w/o ASP | 73.44 | 60.35 | 62.71 | |
U-CORE | 78.83 | 66.79 | 70.88 | |
w self-training | 80.28 | 70.24 | 70.94 | |
TACRED | w/o CCM | 60.17 | 71.43 | 54.22 |
w/o Updater | 61.07 | 73.25 | 58.19 | |
w/o ASP | 62.49 | 74.11 | 60.39 | |
U-CORE | 63.74 | 75.77 | 61.40 | |
w self-training | 67.33 | 78.41 | 62.25 |
Effect of Cluster Center Updater
Our proposed Cluster Center Updater is a deep-cluster-based mechanism that enables U-CORE to update cluster centers in parallel with the training process. We show in Table 3 that U-CORE without Center Updater, which utilizes k-means to update the centers, results in an average performance loss of 3.06% B3 F1, 1.85% NMI, and 2.97% ARI compared to U-CORE. The experimental results demonstrate that the proposed module significantly improves the accuracy of clustering. Furthermore, the Center Updater also improves efficiency, which will be discussed further in Section 4.6.
Effect of Adaptive Self-paced Learning
The main objective of introducing the Adaptive Self-paced Learning (ASP) module is to enhance training stability by apprising the model of the optimal timing for learning. Our preceding experimental analyses reveal that SemEval represents a demanding dataset with unsatisfactory clustering outcomes in the absence of training. Additionally, the feature space undergoes rapid changes due to predefined relations, leading to significant data-shifting problems. The results presented in Table 3 demonstrate that U-CORE without ASP performs considerably worse, exhibiting a loss of 5.29% in B3 F1, 6.34% in NMI, and 8.11% in ARI. In comparison, the severity of the data-shifting problem in TACRED is relatively lower due to the self-supervised nature of UORE and consequently results in relatively smaller performance degradation in the absence of ASP.
4.5 Effect of Self-training on Testing Set
It is worth noting that U-CORE with self-training represents a special case of our proposed model. In Table 3, we present the results of conducting self-training on U-CORE with testing data, which yields improved performance over U-CORE in both SemEval and TACRED, as it can optimize relation representations without requiring any human annotations. This aspect is not featured in the baseline comparison section, as no other baseline in ZORE is capable of self-training in the absence of predefined relations. Furthermore, our analysis reveals that the scenarios of ZORE and UORE may converge in situations where both predefined and undefined relations are present. As a unified framework, U-CORE facilitates supervised training on predefined data and self-training on both predefined and undefined relations, leading to an enhanced performance by optimizing global semantics.
4.6 Efficiency Analysis
As previously discussed, U-CORE’s cluster center updater is more efficient compared to conventional clustering algorithms. To provide a comparison, we employed HiURE, which uses re-clustering with a k-means-based approach after every epoch. Results are presented in Table 4, indicating the average epoch time and epoch interval of both methods on the TACRED dataset. Despite having similar epoch time, U-CORE’s epoch interval is only a quarter of HiURE’s.
4.7 Effect of Predefined Relations Numbers
This section investigates the impact of predefined relation quantity on model performance on the FewRel dataset. To this end, we selected 10 undefined relations for the testing set and varied the number of predefined relations (n) in the training set from 10 to 70. The experimental outcomes are depicted in Figure 2 (Left). The results reveal that U-CORE displays significant performance improvements with increasing numbers of predefined relations, achieving nearly a 10% lead over both RCL and MTB models when trained on equivalent quantities of data. It is noteworthy, however, that the benefits of U-CORE are less pronounced in settings with a small number of relations, such as n = 5, highlighting potential areas for future research.
Moreover, as illustrated in Figure 2 (Right), the ARI score experiences a prominent upswing within the range of n = 10 to n = 30, indicating an intensifying impact of the cluster-wise contrastive loss, which is relation-based. This phenomenon can be attributed to the larger vocabulary of relational knowledge that emerges as the number of predefined relations rises, significantly amplifying the effect of this loss on the model.
4.8 Effect of Additional Complex Settings
In our previous experiments, we follow mainstream work to design our evaluation settings for fair comparisons. In this section, we delve deeper into assessing the robustness of U-CORE by exploring additional complex real-world scenarios. In realistic scenarios, the “no relation” type may appear in the dataset, and the test set may contain both predefined and undefined relation types. We present the experimental results for these two challenging real-world settings on the SemEval dataset in Table 5. Additionally, we include the results of the two best-performing models from our previous experiments, RCL and MTB. In the “w/ negative” setting, we add the “no relation” type based on the proportion of train and test sets. In the “Mixed Test Set” setting, we randomly allocated 20% of the data in predefined relation types to the test set. The experimental results demonstrate that the “Mixed Test Set” setting presents a greater challenge, as it involves reduced training data and an increased number of relation types in the test set. Consequently, all models experience a significant performance loss in this scenario. However, even under these more complex real-world conditions, U-CORE consistently outperforms the other models and achieves the best performance. This highlights the robustness and effectiveness of U-CORE in handling these intricate settings.
Model . | w/ Negative Relation . | Mixed Test Set . | ||||
---|---|---|---|---|---|---|
F1 . | NMI . | ARI . | F1 . | NMI . | ARI . | |
MTB | 32.03 | 16.10 | 11.87 | 30.06 | 14.02 | 10.22 |
RCL | 55.52 | 44.64 | 39.55 | 53.52 | 46.86 | 35.58 |
U-CORE | 64.63 | 55.57 | 52.23 | 58.17 | 54.68 | 49.54 |
Model . | w/ Negative Relation . | Mixed Test Set . | ||||
---|---|---|---|---|---|---|
F1 . | NMI . | ARI . | F1 . | NMI . | ARI . | |
MTB | 32.03 | 16.10 | 11.87 | 30.06 | 14.02 | 10.22 |
RCL | 55.52 | 44.64 | 39.55 | 53.52 | 46.86 | 35.58 |
U-CORE | 64.63 | 55.57 | 52.23 | 58.17 | 54.68 | 49.54 |
4.9 Visualization
To visually illustrate how our method enhances the understanding of undefined relations, we employ t-SNE (Van der Maaten and Hinton, 2008) to visualize the representation by mapping relation representation to a low-dimensional space. We choose undefined categories (m = 5) for the zero-shot experiment on the FewRel dataset. In each figure, the relation instances are colored according to their ground truth labels. As depicted in Figure 3a, the RCL struggles to differentiate between the five relationship types effectively. Due to Instance-wise CL implementation, blue dots representing the same relation are pushed away from each other. In contrast, U-CORE has effectively separated and categorized these five types, exhibiting a noteworthy capability in identifying differences. This success may be attributed to the cluster-wise contrastive module that collaborates with Adaptive Self-paced learning to optimize relation clustering performance by expanding inter-cluster spacing while minimizing intra-cluster spacing.
5 Conclusion
In this paper, we present a unified deep cluster-wise contrastive framework, U-CORE, for Open Relation Extraction tasks. Our proposed framework can tackle various ORE tasks and overcome the limitations of previous instance-wise CL-based methods. Furthermore, we introduce the cluster center updater and adaptive self-paced learning to enhance the stability and efficiency of our model. The results of our experiments on three datasets provide evidence of the effectiveness of our framework, achieving new state-of-the-art performance. Recently, Large Language Models (LLMs) like ChatGPT3 have demonstrated remarkable performance in various NLP tasks, but Han et al. (2023) and Li et al. (2023) indicate that LLMs exhibit subpar performance in ORE tasks. From our aspect, we believe that LLMs have the potential to address ORE tasks. In light of this, our feature work is to further explore the potential of LLMs in ORE tasks.
Notes
The code can be found at: https://github.com/2kjiejie/U-CORE.
References
A Appendix
A.1 Implement Details
In the U-CORE model, the encoder utilized is BERT-base-uncased, and it undergoes 10 epochs of training with an AdamW optimizer (Loshchilov and Hutter, 2019) set to a learning rate of 1e-5, β1 = 0.9, β2 = 0.999, and weight decay of 0.01. Additionally, the values of τ and γ are set to 0.05 and 0.6, respectively. The value of η is set to 10. Following Gao et al. (2021), the dropout rate in data augmentation is 0.1. The training process utilizes double NVIDIA RTX 3090 with 24 GB memory, and the batch size is 128.
A.2 Popular Relations only as Predefined
In this section, we conduct an experiment considering only popular relation types as predefined, and the corresponding results are presented in Table 6. In this scenario, each model exhibits even better performance due to the availability of a larger training dataset.
A.3 Additional Ablation Studies
In this section, we have included two additional ablations. The first ablation replaces the Cluster Centers Initialization method in Section 3.3.1 by manually setting the number of clusters to match the number of predefined relation types during the training process. The second ablation replaces ϕj in Equation (6) with a fixed value, which is equivalent to τ in Equation (4). Table 7 presents the results of these two ablations. Note that in some cases CCI has no effect on the performance of U-CORE, as the number of clusters generated by CCI may align with the number of predefined relation types. CCI (Cluster Centers Initialization) and ϕj were introduced to avoid the artificial setting of two hyperparameters: the number of clusters and the temperature of cluster-wise contrastive loss.
Dataset . | Model . | F1 . | NMI . | ARI . |
---|---|---|---|---|
SemEval | w/o CCI | 78.25 | 66.34 | 68.70 |
w/o ϕj | 77.82 | 64.55 | 69.72 | |
U-CORE | 78.83 | 66.79 | 70.88 | |
TACRED | w/o CCI | 61.97 | 74.08 | 60.51 |
w/o ϕj | 60.52 | 73.27 | 58.53 | |
U-CORE | 63.74 | 75.77 | 61.40 |
Dataset . | Model . | F1 . | NMI . | ARI . |
---|---|---|---|---|
SemEval | w/o CCI | 78.25 | 66.34 | 68.70 |
w/o ϕj | 77.82 | 64.55 | 69.72 | |
U-CORE | 78.83 | 66.79 | 70.88 | |
TACRED | w/o CCI | 61.97 | 74.08 | 60.51 |
w/o ϕj | 60.52 | 73.27 | 58.53 | |
U-CORE | 63.74 | 75.77 | 61.40 |
Author notes
Action Editor: Kristina Toutanova