Skip to Main Content
Table 7: 
Hyper-parameters for training on all Treebanks. We stop training, if there is no improvement in the current epoch, and the number of the current epoch is bigger than the summation of last checkpoint and “Patience.”
ComponentSpecificationComponentSpecification
Optimiser BertAdam  Feed-Forward layers (arc) 
Base Learning rate 2e-3  No. Layers 
BERT Learning rate 1e-5  Hidden size 500 
Adam Betas(b1, b2(0.9,0.999)  Drop-out 0.33 
Adam Epsilon 1e-5  Negative Slope 0.1 
Weight Decay 0.01  Feed-Forward layers (rel) 
Max-Grad-Norm  No. Layers 
Warm-up 0.01  Hidden size 100 
  Drop-out 0.33 
Self-Attention   Negative Slope 0.1 
No. Layers 12  
No. Heads 12  Epoch 200 
Embedding size 768  Patience 100 
Max Position Embedding 512    
ComponentSpecificationComponentSpecification
Optimiser BertAdam  Feed-Forward layers (arc) 
Base Learning rate 2e-3  No. Layers 
BERT Learning rate 1e-5  Hidden size 500 
Adam Betas(b1, b2(0.9,0.999)  Drop-out 0.33 
Adam Epsilon 1e-5  Negative Slope 0.1 
Weight Decay 0.01  Feed-Forward layers (rel) 
Max-Grad-Norm  No. Layers 
Warm-up 0.01  Hidden size 100 
  Drop-out 0.33 
Self-Attention   Negative Slope 0.1 
No. Layers 12  
No. Heads 12  Epoch 200 
Embedding size 768  Patience 100 
Max Position Embedding 512    
Close Modal

or Create an Account

Close Modal
Close Modal