Parameter counts and evaluation perplexities for the trained language models. For reference, the pre-trained BERT base model from Huggingface reached a perplexity of 9.4 on our evaluation set. Additional perplexity comparisons with comparable models are included in Appendix A.1.