In recent years, many studies have been published on data collected from social media, especially microblogs such as Twitter. However, rather few of these studies have considered evaluation methodologies that take into account the statistically dependent nature of such data, which breaks the theoretical conditions for using cross-validation. Despite concerns raised in the past about using cross-validation for data of similar characteristics, such as time series, some of these studies evaluate their work using standard k-fold cross-validation. Through experiments on Twitter data collected during a two-year period that includes disastrous events, we show that by ignoring the statistical dependence of the text messages published in social media, standard cross-validation can result in misleading conclusions in a machine learning task. We explore alternative evaluation methods that explicitly deal with statistical dependence in text. Our work also raises concerns for any other data for which similar conditions might hold.