Skip to Main Content

Some of the word/function pairs in Table 2 are skewed to contributions from a few speakers. For example, for backchannel (BC) uh-huh, as many as 65 instances (44%) are from one single speaker, and the remaining 83 are from seven other speakers. In cases like this, using the whole sample would pose the risk of drawing false conclusions on the usage of ACWs, possibly influenced by stylistic properties of individual speakers. Therefore, we downsampled the tokens of ACWs in the Games Corpus to obtain a balanced data set, with instances of each word and function coming in similar proportions from as many speakers as possible. Specifically, we downsampled our data using the following procedure: First, we discarded all word/function pairs with tokens from fewer than four different speakers; second, for each of the remaining word/function pairs, we discarded tokens (at random) from speakers who contributed more than 25% of its tokens. In other words, the resulting data set meets two conditions: For each word/function pair, (a) tokens come from at least four different speakers, and (b) no single speaker contributes more than 25% of the tokens. The two thresholds were found via a grid search, and were chosen as a trade-off between size and representativeness of the data set. With this procedure we discarded 506 tokens of ACWs, or 9.3% of such words in the corpus. Table 3 shows the resulting distribution of discourse/pragmatic functions over ACWs in the whole corpus after downsampling the data. The κ measure of inter-labeler reliability was practically identical for the downsampled data, at 0.751.

Table 3

Distribution of function over ACW, after downsampling. Rest = {gotcha, huh, yep, yes, yup}.


alright
mm-hm
okay
right
uh-huh
yeah
Rest
Total
Agr 76 58 1,092 74 16 754 87 2,157 
BC 395 120 101 58 674 
CBeg 61 543 604 
CEnd 
PBeg 64 64 
PEnd 10 218 18 250 
Mod 18 1,069 1,091 
BTsk 28 33 
Chk 49 58 
Stl 15 15 
Total 156 457 2,107 1,192 117 830 91 4,950 

alright
mm-hm
okay
right
uh-huh
yeah
Rest
Total
Agr 76 58 1,092 74 16 754 87 2,157 
BC 395 120 101 58 674 
CBeg 61 543 604 
CEnd 
PBeg 64 64 
PEnd 10 218 18 250 
Mod 18 1,069 1,091 
BTsk 28 33 
Chk 49 58 
Stl 15 15 
Total 156 457 2,107 1,192 117 830 91 4,950 

Close Modal

or Create an Account

Close Modal
Close Modal