Abstract
Detecting fake news early is challenging due to the absence of labeled articles for emerging events in training data. To address this, we propose a Disentangled Event-Agnostic Representation (DEAR) learning approach. Our method begins with a BERT-based adaptive multi-grained semantic encoder that captures hierarchical and comprehensive textual representations of the input news content. To effectively separate latent authenticity-related and event-specific knowledge within the news content, we employ a disentanglement architecture. To further enhance the decoupling effect, we introduce a cross-perturbation mechanism that perturbs authenticity-related representation with the event-specific one, and vice versa, deriving a robust and discerning authenticity-related signal. Additionally, we implement a refinement learning scheme to minimize potential interactions between two decoupled representations, ensuring that the authenticity signal remains strong and unaffected by event-specific details. Experimental results demonstrate that our approach effectively mitigates the impact of event-specific influence, outperforming state-of-the-art methods. In particular, it achieves a 6.0% improvement in accuracy on the PHEME dataset over MDDA, a similar approach that decouples latent content and style knowledge, in scenarios involving articles from unseen events different from the topics of the training set.
1 Introduction
Social media platforms provide a convenient way for the public to create, distribute, and absorb a wide range of information. However, the authenticity of information shared on these platforms has become a growing concern. When users cannot verify the authenticity of the content they share, there is a significant risk of spreading fake information, leading to the widespread dissemination across platforms and posing serious consequences.
Advances in natural language processing (NLP) have facilitated the development of fake news detection methods (as described in Section 4.1), which have shown significant performance in controlled scenarios, as illustrated in Figure 1(b). However, on social media, sudden explosive events often lead to a flood of related news. In the early stage of news dissemination, resources related to newly emerging events are usually limited, and authenticity annotations for those news articles are often unavailable. This makes it challenging to quickly train new fake news detection models. Meanwhile, detection models trained on existing annotated data typically perform poorly on emerging “unseen” events during inference, as shown in Figure 1(c). Therefore, it highlights the challenge of quickly detecting and blocking fake news articles about these newly emerging events before they spread widely.
Illustration of different fake news detection scenarios: (a) The training set, composed of news articles from two existing events. (b) This represents a type of testing set including articles related to the same events as the training set, referred to as “seen” events. In contrast, (c) illustrates another type of testing set containing articles about new events not present in training, showcasing a detection scenario for “unseen” events.
Illustration of different fake news detection scenarios: (a) The training set, composed of news articles from two existing events. (b) This represents a type of testing set including articles related to the same events as the training set, referred to as “seen” events. In contrast, (c) illustrates another type of testing set containing articles about new events not present in training, showcasing a detection scenario for “unseen” events.
Recent methods of fake news detection address this challenge by treating distinct events as various domains and leveraging domain adaptation techniques (as described in Section 4.2). These methods aim to bridge the semantic gaps between events by enhancing their correlations. However, they require a portion of data from the target domain during training, making them inapplicable for the early fake news detection scenario with “unseen” events. Moreover, since newly emerging events on social media appear frequently, it is impractical to adapt the model each time a new event occurs. Therefore, a model is needed to identify fake news immediately from most newly emerging “unseen” events.
Based on this analysis, we assume that a news article contains two key latent attributes: authenticity-related knowledge, which determines if it is fake or real, and event-specific knowledge, which varies significantly among different events. We explore the performance of fake news detection for newly “unseen” events by utilizing the latent authenticity-related knowledge and mitigating the impact of the event-specific knowledge. Building on these insights from our exploration, we propose a Disentangled Event-Agnostic Representation (DEAR) learning approach to address the challenges of early fake news detection for newly “unseen” events (explained in Section 2). It begins with an adaptive multi- grained semantic encoder for contextual representation extraction. Next, a disentanglement architecture is utilized to decouple the latent authenticity-related and event-specific representations of news content. To enhance the disentanglement effect, we introduce a cross-perturbation mechanism that perturbs the generated authenticity- related representation with the event-specific one, and vice versa. This directs the discriminators to focus on each corresponding knowledge, thereby improving their robustness. Finally, a refinement learning scheme is employed to further reduce potential interactions between the two decoupled representations, ensuring the authenticity-related signal remains strong and unaffected by event-specific details. We demonstrate the effectiveness of our approach through comprehensive experiments conducted on datasets, showcasing its superiority over established methods (outlined in Section 3).
2 The DEAR Methodology
2.1 Preliminary Analysis
Existing fake news detection methods struggle with identifying fake news related to events not encountered during training. To analyze this challenge, we conduct a preliminary detection experiment using a multi-event configuration. In this setup, we train the model on news articles related to two distinct events, namely, the U.S. Election and COVID-19, and test it on articles about the “Ferguson Unrest” event. For the U.S. Election, we extract approximately 600 articles containing keywords such as “election” and “president” from the PolitiFact dataset (Shu et al., 2018). For COVID-19, we randomly select around 600 articles with balanced authenticity labels from the COVID dataset (Du et al., 2021), which contains articles related to COVID-19. For the Ferguson Unrest event, we randomly select around 600 articles related to this event from the PHEME dataset (Kochkina et al., 2018), which contains articles from multiple events with both event and authenticity labels.
In this experiment, we fine-tune BERT (Devlin et al., 2019) using articles from the U.S. Election and COVID-19 events for training. We then visualize the semantic representations learned by BERT for news articles from both training and testing events with t-SNE, as shown in Figure 2(a). The representations of various events exhibit notable disparities, indicating significant semantic differences between distinct events. Consequently, the authenticity detection performance on the unseen Ferguson Unrest event is notably compromised.
t-SNE visualization on distributions generated by various models: (a) and (b) show the embeddings of news articles learned from different encoders, while (ii)-(iv) illustrate the representations of news articles learned from event and authenticity generators across three distinct models. For the training set, we select news articles from two distinct events: U.S. Election event (yellow) and COVID-19 event (blue). The testing set comprises news from Ferguson Unrest event (purple). In the visualization, darker shades indicate real news articles while lighter shades represent fake ones.
t-SNE visualization on distributions generated by various models: (a) and (b) show the embeddings of news articles learned from different encoders, while (ii)-(iv) illustrate the representations of news articles learned from event and authenticity generators across three distinct models. For the training set, we select news articles from two distinct events: U.S. Election event (yellow) and COVID-19 event (blue). The testing set comprises news from Ferguson Unrest event (purple). In the visualization, darker shades indicate real news articles while lighter shades represent fake ones.
To uncover the latent factors influencing such distribution disparities, we introduce two separate branches over BERT’s embeddings, aiming to extract the authenticity-related and event-specific knowledge separately. Each branch comprises a single-layer MLP network as a generator, denoted as G, and a two-layer MLP network as a discriminator, denoted as D. To decouple event-specific and authenticity-related information from the BERT semantic embeddings, the generator Ge in the event branch is guided by the corresponding discriminator De using event labels (“COVID-19” or “U.S. Election”), while the generator Gc along with the discriminator Dc in the authenticity branch is trained according to the fake/real labels. The representations obtained from the event and authenticity generators are visualized via t-SNE in Figure 2 (a1) and (a2).
Compared with the original embeddings from BERT in Figure 2(a), the representations learned from the event generator in Figure 2(a1) cluster the events to some extent and slightly reduce the fake/real distance between events. Conversely, the representations learned from the authenticity generator shown in Figure 2(a2) marginally reduce event gaps and slightly increase the fake/real distances.
Based on these observations, we hypothesize that latent authenticity-related knowledge, independent of specific events, can reduce the semantic distance among events and enhance the performance of detecting fake news in unseen events. To improve the detection performance on unseen events, we prioritize learning authenticity-related knowledge while mitigating the impact of event-specific knowledge.
2.2 Proposed Method
The proposed method is outlined in Figure 3. It begin with an adaptive multi-grained semantic encoder to extract comprehensive representations of each news article. Then, a cross-perturbation decoupling mechanism is employed to extract authenticity-related and event-specific knowledge. The authenticity-related knowledge serves as an indicator of news authenticity, facilitating more effective detection of fake news across various events and scenarios, including those from unseen events. Additionally, a refinement step is included to further enhance the decoupled authenticity-related knowledge by filtering out any residual event-specific information.
The overview of DEAR methodology, composed of three principal modules: the adaptive multi-grained semantic encoder Et, the authenticity/event generators Gc/e (each of them constructed with a single-layer MLP network), and the authenticity/event discriminator Dc/e (each of them designed with a two-layer MPL network). Dual-phase training is utilized to learn the authenticity-related representation that serves as a reliable and robust signal for fake news detection.
The overview of DEAR methodology, composed of three principal modules: the adaptive multi-grained semantic encoder Et, the authenticity/event generators Gc/e (each of them constructed with a single-layer MLP network), and the authenticity/event discriminator Dc/e (each of them designed with a two-layer MPL network). Dual-phase training is utilized to learn the authenticity-related representation that serves as a reliable and robust signal for fake news detection.
2.2.1 Adaptive Multi-Grained Semantic Encoder
In the field of NLP, one of the fundamental and crucial challenges is to learn a comprehensive representation of a given text, and fake news detection is no exception. Conventional representation learning mechanisms often involve either fine-grained word-level learning or abstracted document-level learning. Local-based word-level representations, which express the embedding of each word in the document as semantic matrices, may capture detailed information but lack a broad view, potentially leading the model to focus on specific keywords. In contrast, global-based document-level representations summarize the semantic knowledge of the entire document as a single vector, providing a summarized understanding but potentially overlook important details.
To capture latent authenticity-related knowledge from given real and fake articles, we propose an adaptive multi-grained semantic encoder Et, as shown in Figure 4. This encoder aggregates both fine-grained and coarse information, encompassing both global (document-level) and local (word-level) representations, as well as their interactions within a given news article. In Et, we first extract the semantic embedding of the [CLS] token from BERT as the global-based representation Tg ∈ℝ1×D. We also obtain the overall semantic embedding from BERT as the local-based representation Tl ∈ℝL×D. These embeddings serve as the initial representations of the input text.
The architecture of the proposed adaptive multi-grained semantic encoder.
Based on the preliminary detection configuration described in Section 2.1, we replace the BERT encoder shown in Figure 2(a) with our encoder and visualize the distribution of our encoder’s output representations illustrated in Figure 2(b). It can be seen that the proposed encoder Et enlarges the overall distance between fake and real news articles by capturing the hierarchical differences between fake and real samples. Additionally, we embed Et into the primary disentanglement framework and extract the decoupled event-specific (b1) and authenticity-related knowledge (b2) for comparison. The comparison shows that the generated event-specific representations based on our encoder tighten the news distribution for each event, resulting in more apparent clustering (b1 vs. a1 in Figure 2). For authenticity-related knowledge comparison (b2 vs. a2 in Figure 2), our encoder’s generated features show more overlapping across various events.
2.2.2 Cross-Perturbation Decoupling
In Section 2.1, we mention two branches to generate and discriminate authenticity-related and event-specific representations over the semantic embeddings from BERT. Our analysis of the experimental results illustrated in Figure 2(a)–(a2) led us to conclude that latent authenticity-related knowledge, yet independent of specific events, shows promise for detecting fake news in unseen events. Based on this conclusion, we propose a novel cross-perturbation (CP) mechanism that randomly mixes authenticity-related and event-specific knowledge from two distinct news articles. The CP mechanism can enhance the learning process of authenticity-related features by introducing event-specific perturbations, and vice versa.
Given an input news article x, we extract o by the proposed encoder Et. This representation is then fed into the authenticity generator Gc and event generator Ge, each constructed with a single-layer MLP network. From these generators, we obtain c and e, representing the authenticity-related and event-specific aspects for o, respectively. To introduce perturbations, we also extract authenticity-related and event-specific representations (c′ and e′) from a randomly sampled news article x′.
To understand the effectiveness of our CP mechanism within the disentanglement framework, we visualize the t-SNE distribution of generated event-specific and authenticity-related representations learned from the corresponding generators during the inference process in Figure 2,(b3) and (b4).
Our model, which includes the CP mechanism, exhibits two clearly decoupled representations, as shown in (b3) and (b4) of Figure 2, compared to the primary disentanglement framework shown in Figure 2,(b1) and (b2). The authenticity-related representation captures the latent invariant knowledge across various events that determine authenticity. The authenticity-related distribution in Figure 2,(b4) effectively bridges the gap between articles from distinct events in both fake and real clusters while maintaining the distance between the fake and real clusters. On the other hand, the event-specific representation clearly categorizes the three distinct events, as shown in Figure 2,(b3). In contrast, the primary disentanglement framework shows significant overlap between the two decoupled representations, with the event-specific representation (b1) inadequately separating true and false samples. This indicates a failure to properly decouple the authenticity-related and event-specific knowledge.
These comparisons highlight how the proposed CP mechanism effectively minimizes the overlap between the two decoupled representations, enhancing the distinction of each representation.
2.2.3 Refinement Learning
To further enhance the disentanglement capability, we introduce a refinement learning strategy in the second phase of training, inspired by adversarial learning (Goodfellow et al., 2014). This strategy aims to ensure that the decoupled authenticity-related representation does not include any event-specific knowledge identifiable by a proficiently trained event discriminator, and vice versa. In this phase, we freeze both the authenticity and event discriminators while refining the two generators.
We present the t-SNE distributions of the authenticity-related and event-specific representations learned from the CP decoupling model with refinement learning, as illustrated in Figure 2,(b6) and (b5). Compared with the distributions in (iii), which lack the second phase strategy, the event-specific features from the refinement learning process (b5) are tightly clustered around independent event clusters. Meanwhile, the authenticity-related representations, as shown in Figure 2,(b6), exhibit more overlaps compared to the model without refinement learning, shown in Figure 2,(b4). This comparison demonstrates that the refinement learning strategy enhances the independence of each decoupled representation and reduces the potential inter-correlation between the two.
Upon completing the dual-phase training, only the pair of authenticity generator and discriminator is employed during the testing stage of fake news detection. This setup outputs the binary detection label based on the disentangled authenticity- related representation, effectively mitigating event- specific information. The CP mechanism and Refinement learning strategy are involved solely during the training process and are excluded during inference.
3 Experiments
3.1 Experimental Settings
Datasets
We use four datasets as listed in Table 1, to simulate two types of unseen-event detection scenarios:
In-Topic Detection (ITD): In this scenario, we utilize the PHEME dataset, which includes thousands of claims related to four different events centered around a similar topic of social unrest. We use three of the events as the resource for training and the remaining event as the target for testing, resulting in four different combinations.
Cross-Topic Detection (CTD): To further challenge our proposed approach, we incorporate news articles from three other datasets, i.e., PolitiFact, GossipCop, and COVID, which cover distinctly different events. In this scenario, we use two of these datasets as the resources for training and evaluate the trained model on the remaining dataset, resulting in three different combinations.
Statistics of selected public datasets.
Scenario . | Datasets . | Events . | Description . | Topic . | All . | Fake . | Real . |
---|---|---|---|---|---|---|---|
In-TopicDetection(ITD) | PHEME (Kochkina et al., 2018) | Charlie Hebdo: Terrorist attack on the French | Social Unrest | 2,079 | 458 | 1,621 | |
satirical magazine in Paris, resulting in 12 deaths. | |||||||
Sydney Siege: It was a 16-hour hostage crisis | 1,221 | 522 | 699 | ||||
at the Lindt Café in Sydney, resulting in three | |||||||
deaths, including the gunman. | |||||||
Ferguson Unrest: It involved protests and riots | 1,143 | 284 | 859 | ||||
in Ferguson, Missouri, after the fatal shooting of | |||||||
Michael Brown, an unarmed Black teenager, by | |||||||
a police officer on 2014. | |||||||
Ottawa Shooting: The shooting involved a gunman | 890 | 470 | 420 | ||||
killing a soldier at the National War Memorial | |||||||
before being shot dead after storming the | |||||||
Canadian Parliament. | |||||||
Cross-TopicDetection(CTD) | PolitiFact (Shu et al., 2018) | Comprised of multiple events related to politics, | Politics | 948 | 420 | 528 | |
such as U.S. Election and policy debates. | |||||||
GossipCop (Shu et al., 2018) | Comprised of multiple events related to the | Gossip | 9,947 | 4,947 | 5,000 | ||
gossip topic, such as Celebrity Death Hoaxes and | |||||||
entertainment stories | |||||||
COVID (Du et al., 2021) | Comprised of tweets related to COVID-19. | Health | 6,067 | 1,317 | 4,750 |
Scenario . | Datasets . | Events . | Description . | Topic . | All . | Fake . | Real . |
---|---|---|---|---|---|---|---|
In-TopicDetection(ITD) | PHEME (Kochkina et al., 2018) | Charlie Hebdo: Terrorist attack on the French | Social Unrest | 2,079 | 458 | 1,621 | |
satirical magazine in Paris, resulting in 12 deaths. | |||||||
Sydney Siege: It was a 16-hour hostage crisis | 1,221 | 522 | 699 | ||||
at the Lindt Café in Sydney, resulting in three | |||||||
deaths, including the gunman. | |||||||
Ferguson Unrest: It involved protests and riots | 1,143 | 284 | 859 | ||||
in Ferguson, Missouri, after the fatal shooting of | |||||||
Michael Brown, an unarmed Black teenager, by | |||||||
a police officer on 2014. | |||||||
Ottawa Shooting: The shooting involved a gunman | 890 | 470 | 420 | ||||
killing a soldier at the National War Memorial | |||||||
before being shot dead after storming the | |||||||
Canadian Parliament. | |||||||
Cross-TopicDetection(CTD) | PolitiFact (Shu et al., 2018) | Comprised of multiple events related to politics, | Politics | 948 | 420 | 528 | |
such as U.S. Election and policy debates. | |||||||
GossipCop (Shu et al., 2018) | Comprised of multiple events related to the | Gossip | 9,947 | 4,947 | 5,000 | ||
gossip topic, such as Celebrity Death Hoaxes and | |||||||
entertainment stories | |||||||
COVID (Du et al., 2021) | Comprised of tweets related to COVID-19. | Health | 6,067 | 1,317 | 4,750 |
Selection of Comparison Methods
We compare our method with two types of baselines, namely, content-centric and domain-adaptive approaches. The content-centric baselines include TextCNN (Kim, 2014) and RoBERTa (Liu et al., 2019), both of which leverage content knowledge for text classification. TextCNN is a convolutional network robust in various text classification tasks, while RoBERTa, a variant of the pre-trained transformer, generates embeddings of the [CLS] token for detection. The domain-adaptive baselines include EANN (Wang et al., 2018), MDDA (Zhang et al., 2021), Fish (Shi et al., 2022), and metaAdapt (Yue et al., 2023). EANN learns event-agnostic features using a TextCNN for text representation and an event discriminator for adversarial learning. MDDA disentangles the representation into content- and style-based branches, utilizing only style knowledge for detection. As a multi-modal framework, we consider its textual branch for comparison. Fish introduces a gradient-based framework for domain generalization, featuring an adaptive mechanism for handling different domains. MetaAdapt employs meta-training to explore optimal parameters, enabling rapid adaptation to unseen tasks without examples. To ensure a fair comparison, we train these frameworks using the same datasets as ours.
Implementation Details
In our experiment, we use the Adam optimizer with a learning rate of 2e-5 and a weight decay of 0.01. The training batch size is set to 32. The entire training process takes around 40 epochs to converge on both ITD and CTD scenarios All experiments are conducted on a single NVIDIA GeForce RTX 3090 GPU.1
Evaluation Metrics
We select Acc and F1 as the major evaluation metrics to measure the performance of various approaches, which are commonly utilized in the context of fake news detection. Furthermore, Wasserstein distance is also used as a metric to measure the distance between two distributions, such as real vs. fake news. We also perform t-SNE visualization as a visual evaluation mechanism in our experiments.
3.2 Discussion of the Proposed Components
Before presenting the final detection performance, we first validate the effectiveness of each proposed component.
Effectiveness of Et
We evaluate the encoder’s ability to differentiate between fake and real news in both CTD and ITD scenarios. Specifically, for both BERT and our encoder Et, we calculate the Wasserstein distances between the centers of fake and real representations for both detection scenarios, as shown in Table 2. It is clear that the proposed Et effectively enlarges the distance between fake and real news, indicating its effectiveness in understanding fine-grained and comprehensive distinctions and correlations among news instances. This comparison substantiates the network’s capability to discern and leverage the contextual information, contributing to improved detection capability.
Wasserstein distance between the embeddings of real vs. fake news is examined in both CTD and ITD scenarios, comparing between the original BERT and our proposed encoder Et. The Et increases the separation between distributions by a large margin on all datasets.
From . | Cross-Topic Detection (CTD) . | ||
---|---|---|---|
. | . | . | |
BERT | 0.0192 | 0.0164 | 0.0126 |
Et | 0.7256 | 1.0128 | 0.9069 |
From | In-Topic Detection (ITD) | ||
BERT | 0.0271 | 0.0198 | 0.0173 |
Et | 0.6184 | 0.7183 | 0.7294 |
From . | Cross-Topic Detection (CTD) . | ||
---|---|---|---|
. | . | . | |
BERT | 0.0192 | 0.0164 | 0.0126 |
Et | 0.7256 | 1.0128 | 0.9069 |
From | In-Topic Detection (ITD) | ||
BERT | 0.0271 | 0.0198 | 0.0173 |
Et | 0.6184 | 0.7183 | 0.7294 |
Effectiveness of CP Decoupling Mechanism
To gain insights about the effectiveness of the proposed CP decoupling mechanism, we inspect the development of representation distributions before and after feature disentanglement. We compare the t-SNE distributions of the original representation o, the decoupled authenticity-related c, and the decoupled event-specific e in ITD scenario using the PHEME dataset, as illustrated in Figure 5.
T-SNE visualization of the representation distributions via different event combinations evaluated on PHEME corpus, trained on a subset of three events and tasked with detecting fake news on the remaining one, as an unseen event. Different colors represent different events, while dark and light shades distinguish between real and fake samples.
T-SNE visualization of the representation distributions via different event combinations evaluated on PHEME corpus, trained on a subset of three events and tasked with detecting fake news on the remaining one, as an unseen event. Different colors represent different events, while dark and light shades distinguish between real and fake samples.
As shown in the first column of Figure 5, the original representations o corresponding to real and fake samples in training set are well separated. However, this separation is not so clear for the testing data, which features an unseen event (yellow color). With the introduction of CP decoupling, represented in the second column (b) of Figure 5, the gap between real and fake articles in the decoupled authenticity-related representations c is further enlarged, while the gap between different events is significantly reduced. Simultaneously, the decoupled event-specific representations e show clear isolation between distinct events, precisely capturing the unique knowledge characteristics associated with each event. This analysis confirms that the proposed CP decoupling mechanism effectively mitigates the event-specific knowledge from the original semantic representation, enhancing the model’s capability to discern real from fake samples by utilizing the authenticity-related information exclusively.
Hyper-Parameter Selection
To confirm the optimal hyper-parameter values, we conduct experiments using one dataset combination from each of the ITD and CTD scenarios. Among the parameters, the scaling factors λ1 and λ2 play crucial roles in determining the relative weight of the event branch and refinement learning component, respectively. Specifically, λ1 influences the trade-off between authenticity detection and event classification for the objective (1), while λ2 impacts the trade-off between the two loss items in the loss function (2). We test different combinations of hyper-parameters, with values from {0.0,0.2,0.4,0.6,0.8,1.0} in each iteration. The detection accuracy of from ITD scenario and from CTD senario are shown in Figure 6. For both cases, the combination of λ1 = 0.4 and λ2 = 0.6 results in the optimal detection performance. Hence, these two values are used throughout our experiment.
Comparison of detection accuracy results on multiple datasets with different λ1 and λ2.
Comparison of detection accuracy results on multiple datasets with different λ1 and λ2.
3.3 Quantitative Detection Evaluation
Evaluation of the ITD Scenario
To evaluate the proposed approach against comparative methods, we first present in Table 3 the overall accuracy for the ITD scenario, where the target events are different from the source events but share a similar topic. It is clear that the proposed approach outperforms all competitive approaches with significant improvements. Specifically, we achieve relative gains of 6.0% and 5.5% in accuracy and F1, respectively, for the combination of over the recent MDDA, a typical method utilizing a similar disentanglement mechanism for managing style and content knowledge discrepancies.
Performance comparison between DEAR and other recent approaches in the ITD scenario using the PHEME corpus, which contains articles from four events with similar topics. The evaluation is conducted by combining data from three events and predicting the remaining one, which is unseen, for the fake news detection task. Our method achieves the highest accuracy and F1 score, demonstrating its superior effectiveness in detecting fake news across different but related events.
Method . | . | . | . | . | ||||
---|---|---|---|---|---|---|---|---|
Acc. ↑ . | F1↑ . | Acc. ↑ . | F1↑ . | Acc. ↑ . | F1↑ . | Acc. ↑ . | F1↑ . | |
TextCNN (Kim, 2014) | 74.98 | 85.67 | 74.61 | 79.58 | 80.95 | 88.54 | 69.78 | 68.43 |
EANN (Wang et al., 2018) | 75.15 | 85.81 | 75.43 | 80.21 | 78.64 | 87.80 | 70.11 | 69.83 |
RoBERTa (Liu et al., 2019) | 76.29 | 86.07 | 76.49 | 80.38 | 83.16 | 88.87 | 77.53 | 77.97 |
MDDA (Zhang et al., 2021) | 76.10 | 85.92 | 76.81 | 80.54 | 82.42 | 88.21 | 79.51 | 78.68 |
Fish (Shi et al., 2022) | 76.30 | 85.97 | 77.87 | 81.19 | 82.88 | 88.23 | 78.08 | 78.03 |
MetaAdapt (Yue et al., 2023) | 76.81 | 86.19 | 78.78 | 81.59 | 83.74 | 88.99 | 78.65 | 78.32 |
DEAR (ours) | 79.27 | 87.75 | 81.46 | 84.97 | 84.82 | 89.57 | 80.62 | 80.03 |
Method . | . | . | . | . | ||||
---|---|---|---|---|---|---|---|---|
Acc. ↑ . | F1↑ . | Acc. ↑ . | F1↑ . | Acc. ↑ . | F1↑ . | Acc. ↑ . | F1↑ . | |
TextCNN (Kim, 2014) | 74.98 | 85.67 | 74.61 | 79.58 | 80.95 | 88.54 | 69.78 | 68.43 |
EANN (Wang et al., 2018) | 75.15 | 85.81 | 75.43 | 80.21 | 78.64 | 87.80 | 70.11 | 69.83 |
RoBERTa (Liu et al., 2019) | 76.29 | 86.07 | 76.49 | 80.38 | 83.16 | 88.87 | 77.53 | 77.97 |
MDDA (Zhang et al., 2021) | 76.10 | 85.92 | 76.81 | 80.54 | 82.42 | 88.21 | 79.51 | 78.68 |
Fish (Shi et al., 2022) | 76.30 | 85.97 | 77.87 | 81.19 | 82.88 | 88.23 | 78.08 | 78.03 |
MetaAdapt (Yue et al., 2023) | 76.81 | 86.19 | 78.78 | 81.59 | 83.74 | 88.99 | 78.65 | 78.32 |
DEAR (ours) | 79.27 | 87.75 | 81.46 | 84.97 | 84.82 | 89.57 | 80.62 | 80.03 |
Evaluation of the CTD Scenario
To further challenge our approach, we evaluate its performance in the CTD scenario, where the topics of targeted unseen events are distinct from those of the source events, with results shown in Table 4. It can be seen that the proposed approach consistently outperforms the competitive approaches with a clear margin. Especially for the combination of , we achieve a significant accuracy gain over 7.71% compared to the MetaAdapt (Yue et al., 2023), which uses a meta-training strategy for rapid adaptation to target data.
Performance comparison between DEAR and other recent approaches in the CTD scenario using three datasets corresponding to three distinct topics. The evaluation is conducted by combining data from two datasets and predicting the remaining one, which is unseen, for the fake news detection task. Our method achieves the highest accuracy and F1 score, demonstrating its superior effectiveness in detecting fake news across events with different topics.
Method . | . | . | . | |||
---|---|---|---|---|---|---|
Acc. ↑ . | F1↑ . | Acc. ↑ . | F1↑ . | Acc. ↑ . | F1↑ . | |
TextCNN (Kim, 2014) | 62.53 | 64.26 | 51.08 | 21.07 | 59.23 | 53.97 |
RoBERTa (Liu et al., 2019) | 64.90 | 63.26 | 52.69 | 25.03 | 60.82 | 56.30 |
EANN (Wang et al., 2018) | 63.45 | 64.02 | 50.64 | 23.99 | 62.15 | 61.43 |
MDDA (Zhang et al., 2021) | 63.82 | 64.89 | 52.32 | 34.18 | 65.82 | 66.43 |
Fish (Shi et al., 2022) | 63.98 | 65.01 | 54.58 | 37.74 | 66.59 | 68.49 |
MetaAdapt (Yue et al., 2023) | 64.21 | 67.84 | 58.89 | 41.19 | 67.06 | 66.12 |
DEAR (ours) | 66.34 | 68.93 | 63.43 | 48.97 | 70.04 | 69.80 |
Method . | . | . | . | |||
---|---|---|---|---|---|---|
Acc. ↑ . | F1↑ . | Acc. ↑ . | F1↑ . | Acc. ↑ . | F1↑ . | |
TextCNN (Kim, 2014) | 62.53 | 64.26 | 51.08 | 21.07 | 59.23 | 53.97 |
RoBERTa (Liu et al., 2019) | 64.90 | 63.26 | 52.69 | 25.03 | 60.82 | 56.30 |
EANN (Wang et al., 2018) | 63.45 | 64.02 | 50.64 | 23.99 | 62.15 | 61.43 |
MDDA (Zhang et al., 2021) | 63.82 | 64.89 | 52.32 | 34.18 | 65.82 | 66.43 |
Fish (Shi et al., 2022) | 63.98 | 65.01 | 54.58 | 37.74 | 66.59 | 68.49 |
MetaAdapt (Yue et al., 2023) | 64.21 | 67.84 | 58.89 | 41.19 | 67.06 | 66.12 |
DEAR (ours) | 66.34 | 68.93 | 63.43 | 48.97 | 70.04 | 69.80 |
3.4 Ablation Study
To validate the contribution of each component of the proposed approach, we conduct a set of ablation experiments on both detection scenarios, employing different configuration combinations within our model. As shown in Figure 7, the proposed encoder Et consistently demonstrates its advantages over BERT in extracting more hierarchical knowledge, enabling enhanced comprehension of the authenticity-related information. The detection performance is further improved when the CP decoupling mechanism is included (labeled as “+CP decouple”), which effectively mitigates the influence of the event-specific noise by preserving the pure authenticity-related knowledge for real/fake detection. Utilizing the refinement learning based on the CP decoupling mechanism (labeled as “+refine”) significantly increases the detection accuracy by enhancing the robustness of both discriminators. This component plays a crucial role in enhancing the disentanglement task by further filtering out possible interactive knowledge between the authenticity-related and event-specific representations. This analysis confirms the contribution of each proposed key component in our approach, with the cross-perturbation decoupling mechanism yielding the highest improvement in the detection performance.
Results of ablation study, evaluated with accuracy performance on datasets from both ITD and CTD scenarios. Significance testing is indicated by ‡ for p < 0.005.
Results of ablation study, evaluated with accuracy performance on datasets from both ITD and CTD scenarios. Significance testing is indicated by ‡ for p < 0.005.
4 Related Work
4.1 Fake News Detection
Content-aware fake news detection approaches are designed for detection analysis based on input claims. For example, pre-trained transformer models are leveraged to extract semantic or syntactic properties, enhancing their capability to detect fake news (Ma et al., 2016; Chen et al., 2018; Das et al., 2021; Yue et al., 2022; Jiang et al., 2022; Li et al., 2024). Additionally, the integration of multi-modal input, combining text and image features, has been explored to further enhance detection performance (Santhosh et al., 2022; Shang et al., 2022b; Hu et al., 2024b).
Beyond content-based methods, there are approaches that leverage user interactions to assess the credibility of online posts (Jin et al., 2016). Likewise, analyzing patterns in propagation paths proves effective in detecting fake news on social media platforms (Shu et al., 2020). The incorporation of social attributes, such as user dynamics, enriches fake news detection by introducing contextual information (Shu et al., 2019; Nan et al., 2024). When integrated with a content-based module, fake news detection systems exhibit enhanced accuracy (Mosallanezhad et al., 2022; Lin et al., 2022).
There are also approaches that utilize external knowledge as augmentative features and support in the verification of facts and identifications of fake news (Brand et al., 2021). Approaches involving knowledge graphs or crowd-sourcing methodologies can be employed to extract supplementary information for fake news detection (Wu et al., 2024b; Shang et al., 2022a). But they usually need extra human annotations. More recently, the use of large language models (LLMs) has shown promising results in enhancing the performance of fake news detection (Hu et al., 2024a; Wu et al., 2024a).
Many current fake news detection approaches concentrate on news articles specific to in-event scenarios, which contain event-specific characteristics. This raises concerns about their effectiveness in unobserved events marked by event shifts. Our focus is channeled towards the systematic exploration of news articles from unseen events within the scenario of early fake news detection. This is particularly relevant in the early stage when the news has not been widely propagated and is primarily available as news content.
4.2 Domain Adaptation on Fake News Detection
Domain-adaptive fake news detection approaches aim to predict the news from unseen domains, addressing the challenges posed by domain shifts. Several approaches (Li et al., 2021; Yue et al., 2022; Lin et al., 2022; Silva et al., 2024) focus on domain adaptation, assuming access to a portion of target domain data during training. For instance, Silva et al. (2021) introduce an unsupervised technique for selecting unlabeled news records to maximize domain coverage and preserve both domain-specific and cross-domain knowledge through the disentanglement mechanism. Mosallanezhad et al. (2022) propose a domain adaptive detection framework using reinforcement learning and incorporating auxiliary information. Yue et al. (2023) propose a meta learning-based method for few-shot domain-adaptive misinformation detection, leveraging a few target examples to exploit source domain knowledge under the guidance of limited target data.
Incorporating cross-event scenarios into fake news detection has received less attention. Most approaches treat different events as distinct domains and use domain adaptation techniques to tackle the event-generalized challenge. For instance, Wang et al. (2018) propose a multi-modal fake news detection framework using event adversarial networks, aiming to learn shared features across events by mitigating event-specific knowledge that is not shared among different events. Zhang et al. (2021) propose a disentangled domain adaptation mechanism for fake news detection, particularly for unseen events. Liu et al. (2024) argue that large-scale datasets might not generalize well to unseen events due to domain shifts and introduce inter-domain and cross-modality alignment modules that reduce domain shift and the modality gap.
However, these fake news detection methods operate within the domain adaptation framework, assuming access to some target domain data or correlated extra knowledge during training. This assumption can be problematic given the dynamic nature of fake news generalization and propagation, especially when target domain data is not accessible during the training phase.
5 Conclusion
In this paper, we introduce DEAR, an early fake news detection approach that leverages a disentanglement architecture to separate authenticity-related and event-specific knowledge. Our approach employs interactive cross- perturbation and refinement learning techniques to enhance the disentanglement effect, minimizing interactions between the decoupled representations. An adaptive multi-grained semantic encoder, based on BERT, generates hierarchical and fine-grained textual representations. Experimental results across multiple datasets demonstrate the effectiveness of DEAR in mitigating event-specific knowledge for fake news detection, outperforming state-of-the-art methods. As future work, we plan to extend the proposed disentangled methodology to address multi-modal fake news detection, exploring the possibility of mitigating event-specific knowledge in the multi-modality context.
Acknowledgments
We would like to thank Prof. Andrei Popescu- Belis for his insightful suggestions on revising the paper. We also extend our gratitude to the editor of TACL and the anonymous reviewers for their valuable feedback. This work was supported partly by the National Natural Science Foundation of China (62402073 and 62172067), the National Social Science Foundation of China (24XMZ092), the Natural Science Foundation of Chongqing (CSTB2022NSCQ-MSX1342), and the Science and Technology Research Program of Chongqing Municipal Education Commission (KJQN202300619).
Notes
Our source code will be publicly available at https://github.com/PuXiao06/DEAR.
References
Author notes
Action Editor: Tim Baldwin