Skip to Main Content
Table 3 

The generic kernelized perceptron training algorithm for structured prediction, as well as the simplified version using a linear kernel. Both algorithms were adapted to the NLG reranking task.

Input:T training iterations, n training examples associating each input xi with an output set (i.e., semantic stack or realization sequences). GEN(xi) returns the n-best output sequences for input xi based on a Viterbi search using the corresponding FLM, in which n depends on a pruning beam and a maximum value. Φ(xi,y) is a sparse feature vector of dimensionality d representing the number of occurrences of specific combinations of realization phrases and/or semantic stacks in (xi, y), with an entry for each instantiation in the training data of each node of the backoff graph of the large context FLM in Figure 6. 
 
Output: a collection V of feature vectors in ℝd and their respective weights α in ℝ|V|. Using a linear kernel, the algorithm is simplified as the weighted feature vectors can be represented as a single weight vector in ℝd
 
Linear kernel algorithm: 
 
For t = 1…T, i = 1…n 
 For z in  
  If w . Φ(xi, z) ≥ 0 then wwΦ(xi, z) // incorrect positive prediction 
 For y in  
  If w . Φ(xi, y) < 0 then ww + Φ(xi, y) // incorrect negative prediction 
 
Kernelized algorithm with kernel function K: ℝd ×ℝd →ℝ: 
 
For t = 1…T, i = 1…n 
 For z in  
  If then  // incorrect positive prediction 
   append Φ(xi, z) to V    // weigh instance negatively 
   append −1 to α 
 For y in  
  If then  // incorrect negative prediction 
   append Φ(xi, y) to V    // weigh instance positively 
   append 1 to α 
Input:T training iterations, n training examples associating each input xi with an output set (i.e., semantic stack or realization sequences). GEN(xi) returns the n-best output sequences for input xi based on a Viterbi search using the corresponding FLM, in which n depends on a pruning beam and a maximum value. Φ(xi,y) is a sparse feature vector of dimensionality d representing the number of occurrences of specific combinations of realization phrases and/or semantic stacks in (xi, y), with an entry for each instantiation in the training data of each node of the backoff graph of the large context FLM in Figure 6. 
 
Output: a collection V of feature vectors in ℝd and their respective weights α in ℝ|V|. Using a linear kernel, the algorithm is simplified as the weighted feature vectors can be represented as a single weight vector in ℝd
 
Linear kernel algorithm: 
 
For t = 1…T, i = 1…n 
 For z in  
  If w . Φ(xi, z) ≥ 0 then wwΦ(xi, z) // incorrect positive prediction 
 For y in  
  If w . Φ(xi, y) < 0 then ww + Φ(xi, y) // incorrect negative prediction 
 
Kernelized algorithm with kernel function K: ℝd ×ℝd →ℝ: 
 
For t = 1…T, i = 1…n 
 For z in  
  If then  // incorrect positive prediction 
   append Φ(xi, z) to V    // weigh instance negatively 
   append −1 to α 
 For y in  
  If then  // incorrect negative prediction 
   append Φ(xi, y) to V    // weigh instance positively 
   append 1 to α 
Close Modal

or Create an Account

Close Modal
Close Modal