Skip to Main Content
Table 1 

Examples of studies using k-fold cross-validation for evaluating classification methods on Twitter data.

ReferenceStudyDataDependence Concern
Sriram et al. (2010) Classification of tweets into five categories of events, news, deals, opinions, and private messages 5,407 tweets, no mention of removing retweets Events, Textual Links, Twinning 
Culotta (2010) Classification of tweets using regression models into flu or not flu 500,000 tweets from a 10-week period Events, Textual Links, Twinning 
Verma et al. (2011) Classification of tweets to specific disaster events, and identification of tweets that contained situational awareness content 1,965 tweets collected from specific disasters in the U.S. by keyword search Twinning, Textual Links 
Jiang et al. (2011) Sentiment classification of tweets; hashtags were mentioned as one of the features, retweets were considered to share the same sentiment Tweets found using keyword search, without removing retweets Twinning, Textual Links 
Uysal and Croft (2011) Personalized tweet ranking using retweet behavior. Decision tree classifiers were used based on features from the tweet content, user behavior, and tweet author 24,200 tweets, from which 2,547 were retweeted by the seed users Twinning, Textual Links 
Takemura and Tajima (2012) Classification of tweets into three categories based on whether they should be read now, later, or outdated 9,890 tweets from a fixed period of time, annotated for time-(in)dependency Events, Textual Links, Twinning 
Kumar, Jiang, and Fang (2014) Classification of tweets into two categories of road hazard and non-hazard 30,876 tweets, retweets were not removed Events, Textual Links, Twinning 
ReferenceStudyDataDependence Concern
Sriram et al. (2010) Classification of tweets into five categories of events, news, deals, opinions, and private messages 5,407 tweets, no mention of removing retweets Events, Textual Links, Twinning 
Culotta (2010) Classification of tweets using regression models into flu or not flu 500,000 tweets from a 10-week period Events, Textual Links, Twinning 
Verma et al. (2011) Classification of tweets to specific disaster events, and identification of tweets that contained situational awareness content 1,965 tweets collected from specific disasters in the U.S. by keyword search Twinning, Textual Links 
Jiang et al. (2011) Sentiment classification of tweets; hashtags were mentioned as one of the features, retweets were considered to share the same sentiment Tweets found using keyword search, without removing retweets Twinning, Textual Links 
Uysal and Croft (2011) Personalized tweet ranking using retweet behavior. Decision tree classifiers were used based on features from the tweet content, user behavior, and tweet author 24,200 tweets, from which 2,547 were retweeted by the seed users Twinning, Textual Links 
Takemura and Tajima (2012) Classification of tweets into three categories based on whether they should be read now, later, or outdated 9,890 tweets from a fixed period of time, annotated for time-(in)dependency Events, Textual Links, Twinning 
Kumar, Jiang, and Fang (2014) Classification of tweets into two categories of road hazard and non-hazard 30,876 tweets, retweets were not removed Events, Textual Links, Twinning 
Close Modal

or Create an Account

Close Modal
Close Modal