Input:T training iterations, n training examples associating each input xi with an output set (i.e., semantic stack or realization sequences). GEN(xi) returns the n-best output sequences for input xi based on a Viterbi search using the corresponding FLM, in which n depends on a pruning beam and a maximum value. Φ(xi,y) is a sparse feature vector of dimensionality d representing the number of occurrences of specific combinations of realization phrases and/or semantic stacks in (xi, y), with an entry for each instantiation in the training data of each node of the backoff graph of the large context FLM in Figure 6. Output: a collection V of feature vectors in ℝd and their respective weights α in ℝ|V|. Using a linear kernel, the algorithm is simplified as the weighted feature vectors can be represented as a single weight vector in ℝd. Linear kernel algorithm: For t = 1…T, i = 1…n For z in If w . Φ(xi, z) ≥ 0 then w ←w − Φ(xi, z) // incorrect positive prediction For y in If w . Φ(xi, y) < 0 then w ←w + Φ(xi, y) // incorrect negative prediction Kernelized algorithm with kernel function K: ℝd ×ℝd →ℝ: For t = 1…T, i = 1…n For z in If then  // incorrect positive prediction append Φ(xi, z) to V    // weigh instance negatively append −1 to α For y in If then  // incorrect negative prediction append Φ(xi, y) to V    // weigh instance positively append 1 to α
 Input:T training iterations, n training examples associating each input xi with an output set (i.e., semantic stack or realization sequences). GEN(xi) returns the n-best output sequences for input xi based on a Viterbi search using the corresponding FLM, in which n depends on a pruning beam and a maximum value. Φ(xi,y) is a sparse feature vector of dimensionality d representing the number of occurrences of specific combinations of realization phrases and/or semantic stacks in (xi, y), with an entry for each instantiation in the training data of each node of the backoff graph of the large context FLM in Figure 6. Output: a collection V of feature vectors in ℝd and their respective weights α in ℝ|V|. Using a linear kernel, the algorithm is simplified as the weighted feature vectors can be represented as a single weight vector in ℝd. Linear kernel algorithm: For t = 1…T, i = 1…n For z in If w . Φ(xi, z) ≥ 0 then w ←w − Φ(xi, z) // incorrect positive prediction For y in If w . Φ(xi, y) < 0 then w ←w + Φ(xi, y) // incorrect negative prediction Kernelized algorithm with kernel function K: ℝd ×ℝd →ℝ: For t = 1…T, i = 1…n For z in If then  // incorrect positive prediction append Φ(xi, z) to V    // weigh instance negatively append −1 to α For y in If then  // incorrect negative prediction append Φ(xi, y) to V    // weigh instance positively append 1 to α