Skip to Main Content

All composite language models are first trained by performing the N-best list approximate EM algorithm until convergence, then the EM algorithm for a second stage of parameter re-estimation for WORD-PREDICTOR and SEMANTIZER until convergence. We fix the size of topics in the PLSA to be 200 and then prune to 5 in the experiments, where the unpruned 5 topics in general account for 70% probability in p(g|d). Table 5 shows comprehensive perplexity results for a variety of different models such as composite n-gram/m-SLM, n-gram/PLSA, m-SLM/PLSA, their linear combinations, and so on, where we use on-line EM with a fixed learning rate to re-estimate the parameters of the SEMANTIZER of test document. The m-SLM performs competitively with its counterpart n-gram (n = m + 1) on large scale corpus. Table 6 lists the statistics about the number of types in the predictor of the m-SLMs on these three corpora, where for the 230 million token and 1.3 billion token corpora we cut off the fractional expected counts that are less than a predefined threshold of 0.005, to significantly reduce the number of the predictor's types by 70%.

Table 5 

Perplexity results for various language models on test corpora, where + denotes linear combination, / denotes composite model; n denotes the order of the n-gram, and m denotes the order of the SLM; the topic nodes are pruned from 200 to 5.

language model
44M n = 3, m = 2
reduction
230M n = 4, m = 3
reduction
1.3B n = 5,m = 4
reduction
baselinen-gram (linear262  200  138  
n-gram (Kneser-Ney244 6.9% 183 8.5% — — 
m-SLM 279 −6.5% 190 5.0% 137 0.0% 
PLSA 825 −214.9% 812 −306.0% 773 −460.0% 
n-gram + m-SLM 247 5.7% 184 8.0% 129 6.5% 
n-gram + PLSA 235 10.3% 179 10.5% 128 7.2% 
n-gram + m-SLM + PLSA 222 15.3% 175 12.5% 123 10.9% 
n-gram/m-SLM 243 7.3% 171 14.5% (125) 9.4% 
n-gram/PLSA 196 25.2% 146 27.0% 102 26.1% 
m-SLM/PLSA 198 24.4% 140 30.0% (103) 25.4% 
n-gram/PLSA + m-SLM/PLSA 183 30.2% 140 30.0% (93) 32.6% 
n-gram/m-SLM+m-SLM/PLSA 183 30.2% 139 30.5% (94) 31.9% 
n-gram/m-SLM + n-gram/PLSA 184 29.8% 137 31.5% (91) 34.1% 
n-gram/m-SLM + n-gram/PLSA + m-SLM/PLSA 180 31.3% 130 35.0% — — 
n-gram/m-SLM/PLSA 176 32.8% — — — — 
language model
44M n = 3, m = 2
reduction
230M n = 4, m = 3
reduction
1.3B n = 5,m = 4
reduction
baselinen-gram (linear262  200  138  
n-gram (Kneser-Ney244 6.9% 183 8.5% — — 
m-SLM 279 −6.5% 190 5.0% 137 0.0% 
PLSA 825 −214.9% 812 −306.0% 773 −460.0% 
n-gram + m-SLM 247 5.7% 184 8.0% 129 6.5% 
n-gram + PLSA 235 10.3% 179 10.5% 128 7.2% 
n-gram + m-SLM + PLSA 222 15.3% 175 12.5% 123 10.9% 
n-gram/m-SLM 243 7.3% 171 14.5% (125) 9.4% 
n-gram/PLSA 196 25.2% 146 27.0% 102 26.1% 
m-SLM/PLSA 198 24.4% 140 30.0% (103) 25.4% 
n-gram/PLSA + m-SLM/PLSA 183 30.2% 140 30.0% (93) 32.6% 
n-gram/m-SLM+m-SLM/PLSA 183 30.2% 139 30.5% (94) 31.9% 
n-gram/m-SLM + n-gram/PLSA 184 29.8% 137 31.5% (91) 34.1% 
n-gram/m-SLM + n-gram/PLSA + m-SLM/PLSA 180 31.3% 130 35.0% — — 
n-gram/m-SLM/PLSA 176 32.8% — — — — 
Table 6 

Statistics about the number of types in the predictor of the m-SLMs (m = 2, 3, 4) on the 44 million, 230 million, and 1.3 billion token corpora. For the 230 million and 1.3 billion token corpora, fractional expected counts that are less than a threshold are pruned to significantly reduce the number of m-SLM (m = 3, 4) predictor's types by 70%.

m = 2
m = 3
m = 4
44 M 189,002,525 269,685,833 318,174,025 
230 M 267,507,672 1,154,020,346 1,417,977,184 
1.3 B 946,683,807 1,342,323,444 1,849,882,215 
m = 2
m = 3
m = 4
44 M 189,002,525 269,685,833 318,174,025 
230 M 267,507,672 1,154,020,346 1,417,977,184 
1.3 B 946,683,807 1,342,323,444 1,849,882,215 

Close Modal

or Create an Account

Close Modal
Close Modal