When considering our approach to approximate a probabilistic grammar by increasing its parameter probabilities to be over a certain threshold, it becomes clear why we are required to limit the grammar to have only two rules and why we are required to use the normal from Section 3.2 with grammars of degree 2. Consider the PCFG rules in Table 1. There are different ways to move probability mass to the rule with small probability. This leads to a problem with identifability of the approximation: How does one decide how to reallocate probability to the small probability rules? By binarizing the grammar in advance, we arrive at a single way to reallocate mass when required (i.e., move mass from the high-probability rule to the low-probability rule). This leads to a simpler proof for sample complexity bounds and a single bound (rather than different bounds depending on different smoothing operators). We note, however, that the choices made in binarizing the grammar imply a particular way of smoothing the probability across the original rules.

Table 1

Example of a PCFG where there is more than a single way to approximate it by truncation with γ = 0.1, because it has more than two rules. Any value of η ∈ [0, γ] will lead to a different approximation.

Rule
θ
General
η = 0
η = 0.01
η = 0.005
S → NP VP 0.09 0.01 0.1 0.1 0.1
S → NP 0.11 0.11 − η 0.11 0.1 0.105
S → VP 0.8 0.8 − γ + η 0.79 0.8 0.795
Rule
θ
General
η = 0
η = 0.01
η = 0.005
S → NP VP 0.09 0.01 0.1 0.1 0.1
S → NP 0.11 0.11 − η 0.11 0.1 0.105
S → VP 0.8 0.8 − γ + η 0.79 0.8 0.795

Close Modal