Skip to Main Content
Table 5

Neural (Stoken, Stype), lexical (Ltoken, Ltype), and entropic (Etoken, Etype) logography measures computed on the Bible corpora. Recall that lower values for S and L measures correspond to a lower degree of logography (inverse is true for Etype). The unexpectedly low values for some of the Chinese and Japanese encodings for L as opposed to the expected higher values for the same with the S measures suggests that on balance the neural attention-based metric is better at capturing the notion of logography. The fourth column gives the per-token spelling accuracy of the neural S model on the test data. For Russian and Swedish the † marker in the columns for the E measures indicates that the number is suspect because a very small number of training and testing verses (3,289 and 3,823, respectively) were left after removing verses that contained null pronunciations.

LanguageNeuralLexicalEntropic
StokenStypeAccuracyLtokenLtypeEtokenEtype
Chinese 1.00 1.00 0.85 4.46 2.96 −0.12 7.86 
Chinese (Cangjie) 0.74 0.71 0.87 4.45 2.96 −0.12 7.85 
Chinese (tokenized) 0.55 0.37 0.89 2.10 1.05 −0.02 9.43 
Chinese (tokenized, Cangjie) 0.51 0.32 0.78 2.10 1.05 −0.02 9.42 
English 0.40 0.32 0.95 2.08 1.15 0.02 8.05 
Finnish 0.19 0.12 0.96 1.43 1.05 0.02 10.10 
French 0.57 0.36 0.89 3.10 1.68 0.14 8.24 
Hebrew (Biblical) 0.65 0.50 0.94 1.06 1.04 0.06 9.18 
Hebrew (Modern) 0.72 0.56 0.87 1.19 1.06 0.05 9.14 
Japanese 0.97 0.88 0.94 7.19 1.25 −0.05 7.38 
Japanese (Cangjie) 0.88 0.65 0.92 7.19 1.25 −0.06 7.38 
Korean (jamo) 0.26 0.21 0.96 1.06 1.01 0.00 12.21 
Russian 0.46 0.29 0.89 1.58 1.10 †0.12 †8.87 
Swedish 0.35 0.20 0.90 1.13 1.01 †0.01 †8.95 
LanguageNeuralLexicalEntropic
StokenStypeAccuracyLtokenLtypeEtokenEtype
Chinese 1.00 1.00 0.85 4.46 2.96 −0.12 7.86 
Chinese (Cangjie) 0.74 0.71 0.87 4.45 2.96 −0.12 7.85 
Chinese (tokenized) 0.55 0.37 0.89 2.10 1.05 −0.02 9.43 
Chinese (tokenized, Cangjie) 0.51 0.32 0.78 2.10 1.05 −0.02 9.42 
English 0.40 0.32 0.95 2.08 1.15 0.02 8.05 
Finnish 0.19 0.12 0.96 1.43 1.05 0.02 10.10 
French 0.57 0.36 0.89 3.10 1.68 0.14 8.24 
Hebrew (Biblical) 0.65 0.50 0.94 1.06 1.04 0.06 9.18 
Hebrew (Modern) 0.72 0.56 0.87 1.19 1.06 0.05 9.14 
Japanese 0.97 0.88 0.94 7.19 1.25 −0.05 7.38 
Japanese (Cangjie) 0.88 0.65 0.92 7.19 1.25 −0.06 7.38 
Korean (jamo) 0.26 0.21 0.96 1.06 1.01 0.00 12.21 
Russian 0.46 0.29 0.89 1.58 1.10 †0.12 †8.87 
Swedish 0.35 0.20 0.90 1.13 1.01 †0.01 †8.95 
Close Modal

or Create an Account

Close Modal
Close Modal