Skip to Main Content
Table 1 

Feature templates and instances for character classification-based word segmentation model. C0 denotes the current character, and Ci/Ci denote the ith character to the left/right of C0. Suppose we are considering the third character “” in “”.

TypeTemplatesInstances
n-gram C−2 C−2= 
C−1 C−1= 
C0 C0= 
C1 C1= 
C2 C2= 
C−2C−1 C−2C−1= 
C−1C0 C−1C0= 
C0C1 C0C1= 
C1C2 C1C2= 
C−1C1 C−1C1= 
function Pu(C0Pu(C0)=true 
T(C−2:2T(C−2:2)= 44444 
TypeTemplatesInstances
n-gram C−2 C−2= 
C−1 C−1= 
C0 C0= 
C1 C1= 
C2 C2= 
C−2C−1 C−2C−1= 
C−1C0 C−1C0= 
C0C1 C0C1= 
C1C2 C1C2= 
C−1C1 C−1C1= 
function Pu(C0Pu(C0)=true 
T(C−2:2T(C−2:2)= 44444 
Close Modal

or Create an Account

Close Modal
Close Modal