Neural (Stoken, Stype), lexical (Ltoken, Ltype), and entropic (Etoken, Etype) logography measures computed on the Bible corpora. Recall that lower values for S and L measures correspond to a lower degree of logography (inverse is true for Etype). The unexpectedly low values for some of the Chinese and Japanese encodings for L as opposed to the expected higher values for the same with the S measures suggests that on balance the neural attention-based metric is better at capturing the notion of logography. The fourth column gives the per-token spelling accuracy of the neural S model on the test data. For Russian and Swedish the † marker in the columns for the E measures indicates that the number is suspect because a very small number of training and testing verses (3,289 and 3,823, respectively) were left after removing verses that contained null pronunciations.
Language . | Neural . | Lexical . | Entropic . | ||||
---|---|---|---|---|---|---|---|
Stoken . | Stype . | Accuracy . | Ltoken . | Ltype . | Etoken . | Etype . | |
Chinese | 1.00 | 1.00 | 0.85 | 4.46 | 2.96 | −0.12 | 7.86 |
Chinese (Cangjie) | 0.74 | 0.71 | 0.87 | 4.45 | 2.96 | −0.12 | 7.85 |
Chinese (tokenized) | 0.55 | 0.37 | 0.89 | 2.10 | 1.05 | −0.02 | 9.43 |
Chinese (tokenized, Cangjie) | 0.51 | 0.32 | 0.78 | 2.10 | 1.05 | −0.02 | 9.42 |
English | 0.40 | 0.32 | 0.95 | 2.08 | 1.15 | 0.02 | 8.05 |
Finnish | 0.19 | 0.12 | 0.96 | 1.43 | 1.05 | 0.02 | 10.10 |
French | 0.57 | 0.36 | 0.89 | 3.10 | 1.68 | 0.14 | 8.24 |
Hebrew (Biblical) | 0.65 | 0.50 | 0.94 | 1.06 | 1.04 | 0.06 | 9.18 |
Hebrew (Modern) | 0.72 | 0.56 | 0.87 | 1.19 | 1.06 | 0.05 | 9.14 |
Japanese | 0.97 | 0.88 | 0.94 | 7.19 | 1.25 | −0.05 | 7.38 |
Japanese (Cangjie) | 0.88 | 0.65 | 0.92 | 7.19 | 1.25 | −0.06 | 7.38 |
Korean (jamo) | 0.26 | 0.21 | 0.96 | 1.06 | 1.01 | 0.00 | 12.21 |
Russian | 0.46 | 0.29 | 0.89 | 1.58 | 1.10 | †0.12 | †8.87 |
Swedish | 0.35 | 0.20 | 0.90 | 1.13 | 1.01 | †0.01 | †8.95 |
Language . | Neural . | Lexical . | Entropic . | ||||
---|---|---|---|---|---|---|---|
Stoken . | Stype . | Accuracy . | Ltoken . | Ltype . | Etoken . | Etype . | |
Chinese | 1.00 | 1.00 | 0.85 | 4.46 | 2.96 | −0.12 | 7.86 |
Chinese (Cangjie) | 0.74 | 0.71 | 0.87 | 4.45 | 2.96 | −0.12 | 7.85 |
Chinese (tokenized) | 0.55 | 0.37 | 0.89 | 2.10 | 1.05 | −0.02 | 9.43 |
Chinese (tokenized, Cangjie) | 0.51 | 0.32 | 0.78 | 2.10 | 1.05 | −0.02 | 9.42 |
English | 0.40 | 0.32 | 0.95 | 2.08 | 1.15 | 0.02 | 8.05 |
Finnish | 0.19 | 0.12 | 0.96 | 1.43 | 1.05 | 0.02 | 10.10 |
French | 0.57 | 0.36 | 0.89 | 3.10 | 1.68 | 0.14 | 8.24 |
Hebrew (Biblical) | 0.65 | 0.50 | 0.94 | 1.06 | 1.04 | 0.06 | 9.18 |
Hebrew (Modern) | 0.72 | 0.56 | 0.87 | 1.19 | 1.06 | 0.05 | 9.14 |
Japanese | 0.97 | 0.88 | 0.94 | 7.19 | 1.25 | −0.05 | 7.38 |
Japanese (Cangjie) | 0.88 | 0.65 | 0.92 | 7.19 | 1.25 | −0.06 | 7.38 |
Korean (jamo) | 0.26 | 0.21 | 0.96 | 1.06 | 1.01 | 0.00 | 12.21 |
Russian | 0.46 | 0.29 | 0.89 | 1.58 | 1.10 | †0.12 | †8.87 |
Swedish | 0.35 | 0.20 | 0.90 | 1.13 | 1.01 | †0.01 | †8.95 |