Skip to Main Content
Table B1

Hyperparameter setting. Note that typo. pred. mask and loss scaling factor only used in joint typology prediction model 4.

HyperparameterValueHyperparameterValue
Dependency tag dimension 256 Typo. pred. mask ratio 0.2 
Dependency arc dimension 768 Typo. pred. loss scaling factor (λ) 0.8 
POS layer dimension 768 Optimizer Adam 
Morph. tagging layer dimension 768 β12 0.9, 0.99 
NER layer dimension 768 Weight decay 0.01 
Batch size 32 Label smoothing 0.03 
Dependency parsing epochs 80 Dropout 0.5 
POS tagging epochs 30 BERT dropout 0.2 
Morphological tagging epochs 30 Mask probability 0.2 
NER epochs 30 Base learning rate 1e−3 
LR warm up ratio 0.125 BERT learning rate 5e−5 
Language embedding size 32 Adapter size 256 
HyperparameterValueHyperparameterValue
Dependency tag dimension 256 Typo. pred. mask ratio 0.2 
Dependency arc dimension 768 Typo. pred. loss scaling factor (λ) 0.8 
POS layer dimension 768 Optimizer Adam 
Morph. tagging layer dimension 768 β12 0.9, 0.99 
NER layer dimension 768 Weight decay 0.01 
Batch size 32 Label smoothing 0.03 
Dependency parsing epochs 80 Dropout 0.5 
POS tagging epochs 30 BERT dropout 0.2 
Morphological tagging epochs 30 Mask probability 0.2 
NER epochs 30 Base learning rate 1e−3 
LR warm up ratio 0.125 BERT learning rate 5e−5 
Language embedding size 32 Adapter size 256 
Close Modal

or Create an Account

Close Modal
Close Modal