Skip to Main Content
Table 4: 
Manual classification of 100 model errors on the SMCalFlow dataset. The largest categories are underprediction (omitting steps from agent programs), entity linking (errors in extraction of entities from user utterances, fencing (classifying a user request as out-of-scope), and ambiguity (user utterances with multiple possible interpretations). See §7 for discussion.
Error categoryCount
Underprediction 21 
 
Entity linking 21 
 Hallucinated 
 Wrong type 
 Wrong field 
 Boundary mismatch 
 
Fencing 22 
 Should have fenced 
 Shouldn’t have fenced 
 Wrong message 
 
Ambiguity 23 
 Wrong in context 
 Acceptable (same semantics) 
 Acceptable (different semantics) 
 
Miscellaneous 10 
 Used wrong function 
 Other / Multiple 
 
Error in gold 3 
Error categoryCount
Underprediction 21 
 
Entity linking 21 
 Hallucinated 
 Wrong type 
 Wrong field 
 Boundary mismatch 
 
Fencing 22 
 Should have fenced 
 Shouldn’t have fenced 
 Wrong message 
 
Ambiguity 23 
 Wrong in context 
 Acceptable (same semantics) 
 Acceptable (different semantics) 
 
Miscellaneous 10 
 Used wrong function 
 Other / Multiple 
 
Error in gold 3 
Close Modal

or Create an Account

Close Modal
Close Modal