Tracking Brand-Associated Polarity-Bearing Topics in User Reviews

Monitoring online customer reviews is important for business organizations to measure customer satisfaction and better manage their reputations. In this paper, we propose a novel dynamic Brand-Topic Model (dBTM) which is able to automatically detect and track brand-associated sentiment scores and polarity-bearing topics from product reviews organized in temporally ordered time intervals. dBTM models the evolution of the latent brand polarity scores and the topic-word distributions over time by Gaussian state space models. It also incorporates a meta learning strategy to control the update of the topic-word distribution in each time interval in order to ensure smooth topic transitions and better brand score predictions. It has been evaluated on a dataset constructed from MakeupAlley reviews and a hotel review dataset. Experimental results show that dBTM outperforms a number of competitive baselines in brand ranking, achieving a good balance of topic coherence and uniqueness, and extracting well-separated polarity-bearing topics across time intervals.1


Introduction
With the increasing popularity of social media platforms, customers tend to share their personal experience towards products online.Tracking customer reviews online could help business organisations to measure customer satisfaction and better manage their reputations.Monitoring brandassociated topic changes in reviews can be done through the use of dynamic topic models (Blei and Lafferty, 2006;Wang et al., 2008;Dieng et al., 2019).Approaches such as the dynamic Joint Sentiment-Topic (dJST) model (He et al., 1 Data and code are available at https://github.com/BLPXSPG/dBTM.

2014
) are able to extract polarity-bearing topics evolved over time by assuming the dependency of the sentiment-topic-word distributions across time slices.They however require the incorporation of word prior polarity information and assume topics are associated with discrete polarity categories.Furthermore, they are not able to infer brand polarity scores directly.
A recently proposed Brand-Topic Model (BTM) (Zhao et al., 2021) is able to automatically infer real-valued brand-associated sentiment scores from reviews and generate a set of sentiment-topics by gradually varying its associated sentiment scores from negative to positive.This allows users to detect, for example, strongly positive topics or slightly negative topics.BTM however assumes all documents are available prior to model learning and cannot track topic evolution and brand polarity changes over time.
In this paper, we propose a novel framework inspired by Meta-Learning, which is widely used for distribution adaptation tasks (Suo et al., 2020).When training the model on temporally-ordered documents divided into time slice, we assume that extracting polarity-bearing topics and inferring brand polarity scores in each time slice can be treated as a new sub-task and the goal of model learning is to learn to adapt the topic-word distributions associated with different brand polarity scores in a new time slice.We use BTM as the base model and store the parameters learned in a memory.At each time slice, we gauge model performance on a validation set based on the modelgenerated brand ranking results.The evaluation results are used for early stopping and dynamically initialising model parameters in the next time slice with meta learning.The resulting model is called dynamic Brand Topic Modelling (dBTM).
The final outcome from dBTM is illustrated in Figure 1, in which it can simultaneously track topic evolution and infer latent brand polarity Figure 1: Brand-associated polarity-bearing topics tracking by our proposed model.We show top words from an example topic extracted in time slice 1, 4 and 8 along the horizontal axis.In each time slice, we can see a set of topics generated by gradually varying their associated sentiment scores from -1 (negative) to 1 (positive along the vertical axis.For easy inspection, positive words are highlighted in blue while negative ones in red.We can observe in Time 1, negative topics are mainly centred on the complaint of the chemical smell of a perfume, while positive topics are about the praise of the look of a product.From Time 1 to Time 8, we can also see the evolving aspects in negative topics moving from complaining about the strong chemical of perfume to overpowering sweet scent.In the lower part of the figure, we show the inferred polarity scores of three brands.For example, Channel is generally ranked higher than Lancôme, which in turn scores higher than The Body Shop.score changes over time.Moreover, it also enables the generation of fine-grained polarity-bearing topics in each time slice by gradually varying brand polarity scores.In essence, we can observe topic transitions in two dimensions, either along a discrete time dimension, or along a continuous brand polarity score dimension. We have evaluated dBTM on a review dataset constructed from MakeupAlley 2 , consisting of over 611K reviews spanning over 9 years, and a hotel review dataset sampled from HotelRec (Antognini and Faltings, 2020), containing reviews of the most popular 25 hotels over 7 years.We compare its performance with a number of competitive baselines and observe that it generates better brand ranking results, predicts more accurate brand score time series, and produces well-separated polaritybearing topics with more balanced topic coherence and diversity.More interestingly, we have evaluated dBTM in a more difficult setup, where the supervised label information, i.e., review ratings, is only supplied in the first time slice and afterwards, dBTM is trained in an unsupervised way without the use of review ratings.dBTM under such a setting can still produce brand ranking results across time slices more accurately compared to baselines trained under the supervised setting.This is a desirable property as dBTM, initially trained on a 2 https://www.makeupalley.com/small set of labelled data, can self-adapt its parameters with streaming data in an unsupervised way.
Our contributions are three-fold: • We propose a new model, called dBTM, built on the Gaussian state space model with meta learning for dynamic brand topic and polarity score tracking; • We develop a novel meta learning strategy to dynamically initialise the model parameters at each time slice in order to better capture rating score changes, which in turn generates topics with a better overall quality; • Our experimental results show that dBTM trained with the supervision of review ratings at the initial time slice, can self-adapt its parameters with streaming data in an unsupervised way and yet still achieve better brand ranking results compared to supervised baselines.

Related Work
Our work is related to the following research:

Dynamic Topic models
Topic models such as the Latent Dirichlet Allocation (LDA) model (Blei et al., 2003) is one of the most successful approaches for the statistical analysis of document collections.Dynamic topic models aim to analyse the temporal evolution of topics in large document collections over time.Early approaches built on LDA include the dynamic topic model (DTM) (Blei and Lafferty, 2006), which use the Kalman filter to model the transition of topics across time, and the continuous time dynamic topic model (Wang et al., 2008) which replaced the discrete state space model of the DTM with its continuous generalisation.More recently, DTM is combined with word embeddings in order to generate more diverse and coherent topics in document streams (Dieng et al., 2019).
Apart from the commonly used LDA, Poisson factorisation can also be used for topic modelling, in which it factorises a document-word count matrix into a product of a document-topic matrix and a topic-word matrix.It can be extended to analyse sequential count vectors such as a document corpus which contains a single word count matrix with one column per time interval, by capturing dependence among time steps by a Kalman filter (Charlin et al., 2015), neural networks (Gong and Huang, 2017), or by extending a Poisson distribution on the document-word counts as a nonhomogeneous Poisson process over time (Hosseini et al., 2018).
While the aforementioned models are typically used in the unsupervised setting, the Joint Sentiment-Topic (JST) model (Lin and He, 2009;Lin et al., 2012) incorporated the polarity word prior into model learning, which enables the extraction of topics grouped under different sentiment categories.JST is later extended into a dynamic counterpart, called dJST, which tracks both topic and sentiment shifts over time (He et al., 2014) by assuming that the sentiment-topic word distribution at the current time is generated from the Dirichlet distribution parameterised by the sentiment-topic word distributions at previous time intervals.

Market/Brand Topic Analysis
LDA and its variants have been explored for marketing research.Examples include user interests detection by analysing consumer purchase behaviour (Gao et al., 2017;Sun et al., 2021), the tracking of the competitors in the luxury market among given brands by mining the Twitter data (Zhang et al., 2015), and identify emerging app issues from user reviews (Yang et al., 2021).Matrix factorization which is able to extract the global information is also used to be applied in product recommendation (Zhou et al., 2020), review summarization (Cui and Hu, 2021).The interaction between topics and polarities can be modelled by the incorporation of approximations by sampling based methods (Lin and He, 2009) with sentiment prior knowledge such as sentiment lexicon (Lin et al., 2012).But such prior knowledge would be highly domain specific.Seed words with known polarities or seed words generated by morphological information (Brody and Elhadad, 2010) is another common method to get topic polarity.But those methods are focused on analysing the polarity of existing topics.More recently, the Brand-Topic Model built on Poisson factorisation was proposed (Zhao et al., 2021), which can infer brand polarity scores and generate fine-grained polarity-bearing topics.The detailed description of BTM can be found at Section 3.

Meta Learning
Meta learning, or learning to learn, can be broadly categorised into metric-based learning and optimisation-based learning.Metric-based learning aims to learn a distance function between training instances so that it can classify a test instance by comparing it with the training instances in the learned embedding space (Sung et al., 2018).Optimisation-based learning usually splits the labelled samples into training and validation sets.The basic idea is to fine-tune the parameters on the training set to obtain the updated parameters, which are then evaluated on the validation set to get the error which is converted as a loss value for optimising the original parameters (Finn et al., 2017;Jamal and Qi, 2019).Meta learning has been explored in many tasks, including text classification (Geng et al., 2020), topic modelling (Song et al., 2020), knowledge representation (Zheng et al., 2021), recommender systems (Neupane et al., 2021;Dong et al., 2020;Lu et al., 2020) and event detection (Deng et al., 2020).Especially, the meta learning based methods have achieved significant successes in distribution adaptation (Suo et al., 2020;Yu et al., 2021).We propose a meta learning strategy here to learn how to automatically initialise model parameters in each time slice.

Preliminary: Brand Topic Model
The Brand-Topic Model (BTM) (Zhao et al., 2021), as shown in the middle part of Figure 2, is trained on review documents paired with their document-level sentiment class labels (e.g., 'Positive', 'Negative' and 'Neutral').It can automatically infer real-valued brand-associated polarity scores and generate fine-grained sentiment-topics in which a continuous change of words under a certain topic can be observed with a gradual change of its associated sentiment.It was partly inspired by the Text-Based Ideal Point (TBIP) model (Vafa et al., 2020), which aims to model the generation of text via Poisson factorisation.In particular, for the input bag of words data, the count for term v in document d is formulated as term count c dv , which is assumed to be sampled from a Poisson distribution c dv ∼ Poisson(λ dv ) where the rate parameter λ dv can be factorised as: Here, θ dk denotes the per-document topic intensity, β kv represents the topic-word distribution.We have , where D is the total number of documents in a corpus, K is the topic number, and V is the vocabulary size.Then, brand-polarity score x b d and topic-word offset η kv are added to the model: x b d is the brand polarity score for document d of brand b and we have η ∈ R K×V + , x ∈ R. The model normalised brand polarity assignment to [-1,1] in its output for demonstration purposes.
The intuition behind the above formulation is that the latent variable x b d which captures the brand polarity score can be either positive or negative.If a word tends to frequently occur in reviews with positive polarities, but the polarity score of the current brand is negative, then the occurrence count of such a word would be reduced by making x b d and η kv to have opposite signs.
A Gamma prior is placed on θ and β, with a, b, c, d being hyper-parameters, while a normal prior is placed over the brand polarity score x and the topic-word count offset η.
BTM makes use of Gumbel-Softmax (Jang et al., 2017) to construct document features for sentiment classification.This is because directly sampling word counts from the Poisson distribution is not differentiable.Gumbel-Softmax, which is a gradient estimator with the reparameterization trick, is used to enable back-propagation of gradients.More details can be found in (Zhao et al., 2021).

Dynamic Brand Topic Model (dBTM)
To track brand-associated topic dynamics in customer reviews, we split the documents into time slices where the time period of each slice can be set arbitrarily at, e.g. a week, a month, or a year.In each time slice, we have a stream of M documents {d 1 , • • • , d M } ordered by their publication timestamps.A document d at time slice t is input as a Bag-of-Words (BoW) representation.We extend BTM to deal with streaming documents by assuming that documents at the current time slice are influenced by documents at past.The resulting model is called dynamic Brand-Topic Model (dBTM) with its architecture illustrated in Figure 2.

Initialisation
In the original BTM model, the latent variables to be inferred include the document-topic distribution θ, topic-word distribution β, the brandassociated polarity score x, and the polarityassociated topic-word offset η.At time slice 0, we represent all documents in this slice as a document-word count matrix.We then perform Poisson factorisation with coordinate-ascent variational inference (Gopalan et al., 2015) to derive θ and β (see Eq. ( 1)).The topic-word count offset η and the brand polarity score x are sampled from a standard normal distribution.

State-Space Model
At time slice t, we can model the evolution of the latent brand-associated polarity scores x t and the polarity-associated topic-word offset η t over time by a Gaussian state space model: For the topic-word distribution β, a similar Gaussian state-space model is adopted except that log-normal distribution is used: While topic-word distribution could be inherited from previous time slice, the document-topic distribution θ t needs to be re-initialised at the start of each time slice since there is a different set of documents at each time slice.We propose to run a simple Poisson factorisation to derive the initial values of θ t (p) before we do the model adaption at each time slice: Here, the topic-word distribution in the previous time slice β t−1 (p) becomes the prior of the topic-word distribution in the current time slice β t (p) as defined in Eq. ( 5).We use the subscript (p) to denote that the parameters are derived in the Poisson factorisation initialisation stage at the start of each time slice.
Essentially at each time slice t, we initialise the document-topic distribution θ t of the BTM model as θ t (p) which is obtained by performing Poisson factorisation on the document-word count matrix in t.For the topic-word distribution, within BTM, we can set β t to be inherited from β t−1 as defined in Eq. ( 5), but additionally, we also have which is obtained by directly performing Poisson factorisation of the document-word count matrix in the current time slice.In what follows, we will present how we initialise the value of β t through meta learning.

Meta Learning
We notice that although parameters in each time interval are linked with parameters in the previous time interval by Gaussian state-space models, the results generated at each time interval are not stable.Inspired by meta learning, we consider latent brand score prediction and sentiment topic extraction at each time interval as a new subtask, and propose a learning strategy to dynamically initialise model parameters in each interval based on the brand rating prediction performance on the validation set of the previous interval.In particular, we set aside 10% of the training data in each time interval as the validation set and compare the model-inferred brand ranking result with the gold standard one using the Spearman's rank correlation coefficient.By default, the topic-word distribution in the current interval, β t , would have its prior set to β t−1 learned in the previous time interval.However, if the brand ranking result in the previous interval is poor, then β t would be initialised as a weighted interpolation of β t−1 and the topic-word distribution obtained from the Poisson factorisation initialisation stage in the current interval β t (p) .The rationale is that if the model performs poorly in the previous interval, then its learned topic-word distribution should have less impact on the parameters in the current interval.More concretely, we first evaluate the brand ranking result returned by the model at time slice t − 1 on the validation set at t − 1: Algorithm 1: Training procedure of dBTM 3 Update parameters by minimising the loss defined in Eq. 12 4 Derive the brand ranking r0 on the validation set based on the inferred brand polarity score {x 0 b } B b=1 5 Calculate the Spearman's rank correlation coefficient ρ 0 = SpearmanRank( r0 , r 0 ) 6 Set the weight γ 1 = max 0.05, Φz ρ 0 (0) 7 Training: Per epoch initialisation: where rt−1 denotes the derived brand ranking result based on the model predicted latent brand polarity scores, xt−1 , at time slice t − 1, r t−1 is the gold-standard brand ranking, and ρ t−1 is the Spearman's rank correlation coefficient.To check if the brand ranking result gets worse or not, we compare it with the brand ranking evaluation result, ρ t−2 , in the earlier interval.In particular, we first take Fisher's z-transformation z ρ t−1 of ρ t−1 , which is assumed following a Gaussian distribution: where B denotes the total number of brands.Then we compute the Cumulative Distribution Function (CDF) of the above normal distribution, denoted as Φ z ρ t−1 , and calculate Φ z ρ t−1 (ρ t−2 ), which essentially returns Pr(z ρ t−1 ≤ ρ t−2 ).Lower value of Φ z ρ t−1 (ρ t−2 ) indicates that the model at t − 1 generates a better brand rank result than that in the previous time slice t − 2. This is equivalent to performing a hypothesis test in which we compare the rank evaluation result ρ t−1 with ρ t−2 to test if the model at t − 1 performs better than that at t − 2.
The hypothesis testing result can be used to set the weight γ t to determine how to initialise the topic-word distribution at t, β t : The above equations state that if the model trained at t − 1 generates a better brand ranking result than that in the previous time slice significantly (p-value > 0.05), then we are more confident to initialise β t largely based on β t−1 according to the estimated probability of Pr(z ρ t−1 > ρ t−2 ) = 1 − γ t .we will have to re-initialise β t mostly based on the topic-word distribution obtained from the Poisson factorisation initialisation stage in the current interval β t (p) .

Parameter Inference
We use the mean-field variational distribution to approximate the posterior distribution of latent variables, θ, β, η, x, given the observed document-word count data c by maximising the Evidence Lower-Bound (ELBO): where In addition, for each document d, we construct its representation z d by sampling word counts using Gumbel softmax from the aforementioned learned parameters, which is fed to a sentiment classifier to predict a class distribution ŷd .We also perform adversarial learning by inverting the sign of the inferred polarity score of the brand associated with document d and produce the adversarial representation zd .This is also fed to the same sentiment classifier which generates another predicted class distribution ỹd .We train the model by minimising the Wasserstein distance between the prediction and the actual class distributions.The final loss function is the combination of the ELBO and the Wasserstein distance losses: where L WD (•) denotes the Wasserstein distance, y d is the gold-standard class distribution and ȳd is the class distribution derived from the inverted document rating.By inverting the document rating, we essentially balance the document rating distributions that for each positive document, we also create a synthetic negative document, and vice versa.

Experimental Setup
Datasets Popular datasets such as Yelp and Amazon products (Ni et al., 2019) and Multi-Domain Sentiment dataset (Blitzer et al., 2007) are constructed by randomly selecting reviews from Amazon or Yelp without considering their distributions over various brands and across different time periods.Therefore, we construct our own dataset by crawling reviews from top 25 brands from MakeupAlley, a review website on beauty products.Each review is accompanied with a rating score, product type, brand and post time.We consider reviews with the ratings of 1 and 2 as the negative class, those with the rating of 3 as the neutral class, and the remaining with the ratings of 4 and 5 as the positive class, following the label setting in BTM.The entire dataset contains 611,128 reviews spanning over 9 years (2005 to 2013).We treat each year as a time slice and split reviews into 9 time slices.The average review length is 123 words.Besides the MakeupAlley-Beauty, we also run our experiments on HotelRec (Antognini and Faltings, 2020)

Models for Comparison
We conduct experiments using the following models: • Dynamic Joint Sentiment-Topic (dJST) model (He et al., 2014), built on LDA, can detect and track polarity-bearing topics from text with the word prior sentiment knowledge incorporated.In our experiments, the MPQA subjectivity lexicon3 is used to derive the word prior sentiment information.
• Brand Topic Model (BTM) (Zhao et al., 2021), a supervised Poisson factorisation model extended from TBIP with the incorporation of document-level sentiment labels.
• dBTM, our proposed dynamic Brand Topic model in which the model is trained with the document-level sentiment labels at each time slice.
• O-dBTM, a variant of our model that is only trained with the supervised review-level sentiment labels in the first time slice (denoted as the 0-th time slice).In the subsequent time slices, it is trained under the unsupervised setting.In such a case, we no longer have a gold-standard brand ranking in time slices other than the 0-th one.Instead of directly calculating the Spearman's rank correlation coefficient, we measure the difference of the brand ranking results in neighbouring time slices and use it to set the weight γ t in Eq. ( 8).
Parameter setting Frequent bigrams and trigrams4 are added as features in addition to unigrams for document representations.In our experiments, we train the models using the data from the current time slice and test the model performance on the full data from the next time slice.During training, we set aside 10% of data in each time slice as the validation set.For hyperparameters, we set the batch size to 256, the maximum training steps to 50,000, the topic number to 505 .It is worth noting that since topic dynamics are not explicitly modelled in the static models such as TBIP and BTM, their topics extracted in different time slices are not directly linked.

Experimental Results
In this section, we present the experimental results in comparison with the baseline models in brand rating, topic coherence/uniqueness measures, and qualitative evaluation of generated topics.For fair comparison, baselines are trained based on all previous time slices and predict on the current time slice.

Brand Rating
TBIP, BTM and dBTM can infer each brand's associated polarity score automatically.For dJST, we derive the brand rating by aggregating the label distribution of its associated review documents and then normalising over the total number of brand-related reviews.The average of the document-level ratings of a brand b at a time slice t is used as the ground truth of the brand rating x t b .We evaluate two aspects of the brand ratings: Brand Ranking Results We report in Table 2 the brand ranking results measured by the Spearman's correlation coefficient, showing the correlation of predicted brand rating and the ground truth, along with the associated two-sided p-values of the Spearman's correlations Topic model variants, such as dJST, TBIP and BTM, produced brand ranking results either positively or negatively correlated with the true ranking results.We can see the correlation of BTM has In summary, in dBTM, the brand rating score is treated as a latent variable (i.e., x b d in Eq. 2) and is directly inferred from the data.On the contrary, models such as dJST, which require postprocessing to derive brand rating scores by aggregating the document-level sentiment labels, are inferior to dBTM.This shows the advantage of our proposed dBTM over traditional dynamic topic models in brand ranking.
Brand Rating Time Series The brand rating time series aims to compare the ability of models to track the trend of brand rating.For easy comparison, we normalise the ratings produced by each model, so that the plot only reflects the fluctuation of ratings over time.Figure 3 shows the brand rating on the brand 'Maybeline New York' generated on the test set of MakeupAlley-Beauty by various models across time slices.It can be observed that the brand ratings generated by TBIP and BTM do not correlate well with the actual rating scores.dJST shows a better aligned rating trend, but its prediction missed some short-term changes such as the peak of brand rating at time slice 7. By contrast, dBTM correctly predicts the general trend of the brand rating.The weaklysupervised O-dBTM is able to follow the general trend but misses some short-term changes such as the upward trend from the time slice 1 to 2, and from the slice 6 to 7.

Topic Evaluation Results
We use the top 10 words of each topic to calculate the context-vector-based topic coherence scores (Röder et al., 2015) as well as topic uniqueness (Nan et al., 2019) which measures the ratio of word overlap across topics.We want to achieve balanced topic coherence and diversity.As such, topic coherence and topic diversity are combined to give an overall quality measure of topics (Dieng et al., 2020).Since the results for topic coherence is negative in our experiment, i.e., smaller absolute values are better, we define the overall quality Table 3: Topic coherence (coh) and uniqueness (uni) measures of the results generated by various models.We also combine the two scores to derive the overall quality of the extracted topics.
of a topic as q = topic uniqueness |topic coherence| .Table 3 shows the topic evaluation results.In general, there is a trade-off between topic coherence and topic diversity.On average, dJST has the highest coherence but the lowest uniqueness scores, while TBIP has quite high uniqueness but the lowest coherence values.Both O-dBTM and dBTM achieve a good balance between coherence and uniqueness and outperform other models in overall quality.

Example Topics across Time Periods
We illustrate some representative topics generated by dBTM in various time slices.For easy inspection, we retrieve a representative sentence from the corpus for each topic.For a sentence, we derive its representation by averaging the GloVe embeddings of its constituent words.For a topic, we also average the GloVe embeddings of its associated top words, but weighted by the topic-word probabilities.The sentence with the highest cosine similarity is selected.
Example of generated topics relating to 'Eye Products' and 'Skin Care' from MakeupAlley-Beauty is shown in Figure 4. We can observe that for the topic 'Eye Products', the top words of negative comments for 'eye cleanser' evolve from the reaction of skin (e.g.'sting', 'burned') to the cleaning ability (e.g.'remove', 'residue').We could also see that the positive topics gradually change from praising the ability of the product for 'dark circle' in time slice 1 to the quality of eye shadow in time slice 4 and eye primer in time slice 8.Moreover, we observe the brand name M.A.C. in the positive topic in time slice 4, which aligns with its ground truth rating.For the topic 'Skin Care', it can be observed that negative topics gradually move from the complaint of a skin cleanser to the thickness of a sunscreen, while positive topics are about the praise of the coverage of the M.A.C foundation more consistently over time.The results show that dBTM can generate well-separated polarity-bearing topics and it also allows the tracking of topic changes over time.
Example of generated topics relating to 'Room Condition' and 'Food' from HotelRec is shown in Figure 5.We can see that for the topic 'Room Condition', top words gradually shift from the expression of cleanliness (e.g.'clean' in positive and 'dirty' in negative comments) to the description of the type and size of the rooms (e.g.'executive' and 'villa' in positive reviews, and the concern of 'small' room size in negative comments).For the topic 'Food', the concerned food changes across time from drinks (e.g.'coffee', 'tea') to meals (e.g.'eggs', 'toast').Negative reviews mainly focus on the concern of food quality, (e.g.'cold'), while positive reviews contain a general praise of food and services (e.g.'like', 'nice').

Ablation Study
We investigate the contribution of the meta learning component (i.e., Eq. 8 and 9) by conducting an ablation study and the results are shown in Table 4.We can observe that in general, removing meta  learning leads to a significant reduction in brand ranking correlations across all time slices for the MakeupAlley-Beauty dataset.In terms of topic quality, we observe reduced coherence scores, but slightly increased uniqueness scores without meta learning, leading to an overall reduction of topic quality scores in most time slices.
For HotelRec, we can see that removing meta learning also leads to a reduction in brand ranking results, but the impact is smaller compared to MakeupAlley-Beauty.For topic quality, we observe increased coherence but worse uniqueness, resulting in slightly worse topic quality re-sults without meta learning in most time slices.One main reason is that unlike makeup brands where new products are introduced over time, leading to the change of discussed topics in reviews, the topic-word distribution does not change much across different time slices for hotel reviews.Therefore, the results are less impacted with or without meta learning.

Training Time Complexity
All experiments were run on a single GeForce 1080 GPU with 11GB memory.The training time for each model across time slices is shown in Fig-

Conclusion
We have presented dBTM, which is able to automatically detect and track brand-associated topics and sentiment scores.Experimental evaluation based on the reviews from MakeupAlley and HotelRec demonstrates the superiority of dBTM over previous models in brand ranking and dynamic topic extraction.The variant of dBTM, O-dBTM, trained with document-level sentiment labels in the first time slice only, outperforms baselines in brand ranking and achieves the best overall result in topic quality evaluation.This shows the effectiveness of the proposed architecture in modelling the evolution of brand scores and topics across time intervals.Our model currently only considers review ratings, but real-world applications potentially involve additional factors (e.g., user preference).A possible solution is to explore simultaneous modelling of user preferences to extract personalised brand polarity topics.

Figure 2 :
Figure 2: The overall architecture of the dynamic Brand-Topic Model (dBTM), which extends the Brand-Topic Model (BTM) shown in the upper box to deal with streaming documents.In particular, at time slice t, the document-topic distribution θ t is initialised by a vanilla Poisson factorisation model, the evolution of the latent brand-associated polarity scores x t and the polarity-associated topic-word offset η t is modelled by two separate Gaussian state space models.The topicword distribution β t has its prior set based on the trend of the model performance on brand ranking results in the previous two time slices.Lines coloured in grey indicate parameters are linked by Gaussian state space models, while those coloured in green indicate forward calculations.

Figure 3 :
Figure 3: The rating time series for 'Maybeline New York'.The rating scores are normalised in the range of [−1, 1] with positive values denoting positive sentiment and negative ones for negative sentiment.In each subfigure, the dashed curve shows the actual rating scores.

Figure 4 :
Figure 4: Example of generated topics shown as a list of top associated words (underlined) in different time slices from the MakeupAlley dataset.For easy inspection, we also show the most representative sentence under each topic.The negative, neutral and positive topics in each time slice are generated by varying the brand polarity score from -1 to 0, and to 1. Positive words/phrases are highlighted in blue, negative words/phrases are in red, while brand names are in bold.

Figure 5 :
Figure 5: Example of generated topics shown as a list of top associated words (underlined) in different time slices from the HotelRec dataset.The representative sentence for each topic is also shown for easy inspection.

ure 6 .
It can be observed that with the increasing number of time slices, the training time of dJST and BTM grows quickly.Both TBIP and dBTM take significantly less time to train.TBIP simply performs Poisson factorisation independently in each time slice and fails to track topic/sentiment changes over time.On the contrary, our proposed dBTM and O-dBTM are able to monitor topic/sentiment evolvement and yet take even less time to train compared to TBIP.One main reason is that dBTM and O-dBTM can automatically adjust the number of iterations with our proposed meta learning and hence can be trained more efficiently.

Figure 6 :
Figure 6: Training time of models across time slices.
Derive the brand ranking rt on the validation set based on the inferred brand polarity score {x t b } B b=1 16Calculate the Spearman's rank correlation coefficient ρ t = SpearmanRank( rt , r t )

Table 1 :
, by selecting reviews from the top 25 hotels over 7 years(2012  to 2018).The statistics of our datasets are shown in Table1.It can be observed that the dataset is imbalanced with positive reviews being over triple the size of negative ones for MakeupAlley-Beauty and nearly 10 times for HotelRec.Dataset statistics of the reviews.

Table 2 :
Brand ranking results generated by various models trained on time slice t and tested on time slice t + 1.We report the correlation coefficients corr and its associated two-sided p-values.

Table 4 :
Results of dBTM with and without the meta learning component.