Skip to Main Content

For our phonetic features (listed in Table 8), we trained an automatic phone recognizer based on the Hidden Markov Model Toolkit (HTK) (Young et al. 2006), using three corpora as training data: the TIMIT Acoustic-Phonetic Continuous Speech Corpus (Garofolo et al. 1993), the Boston Directions Corpus (Hirschberg and Nakatani 1996), and the Columbia Games Corpus. With this recognizer, we obtained automatic time-aligned phonetic transcriptions of each instance of alright, mm-hm, okay, right, uh-huh, and yeah in the corpus. To improve accuracy, we restricted the recognizer's grammar to accept only the most frequent variations of each word, as shown in Table 9. We extracted our phonetic features, such as phone and syllable durations, from the resulting time-aligned phonetic transcriptions. The remaining five ACWs in our corpus (gotcha, huh, yep, yes, and yup) had too low counts to contain meaningful phonetic variation; thus, we did not compute phonetic features for those words.

Table 9

Restricted grammars for the automatic speech recognizer. Phones in square brackets are optional.

ACW
ARPAbet Grammar
alright (aa∣ao∣ax) r (ay∣eh) [t] 
mm-hm m hh m 
okay [aa∣ao∣ax∣m∣ow] k (ax∣eh∣ey) 
right r (ay∣eh) [t] 
uh-huh (aa∣ax) hh (aa∣ax) 
yeah y (aa∣ae∣ah∣ax∣ea∣eh) 
ACW
ARPAbet Grammar
alright (aa∣ao∣ax) r (ay∣eh) [t] 
mm-hm m hh m 
okay [aa∣ao∣ax∣m∣ow] k (ax∣eh∣ey) 
right r (ay∣eh) [t] 
uh-huh (aa∣ax) hh (aa∣ax) 
yeah y (aa∣ae∣ah∣ax∣ea∣eh) 

Close Modal

or Create an Account

Close Modal
Close Modal