Summary of hyper-parameter tuning. The * indicates divergence from the NCRF++ proposed setup and empirical findings (Yang and Zhang, 2018).
Parameter . | Value . | Parameter . | Value . |
---|---|---|---|
Optimizer | SGD | *LR (token-single) | 0.01 |
*Batch Size | 8 | *LR (token-multi) | 0.005 |
LR decay | 0.05 | *LR (morpheme) | 0.01 |
Epochs | 200 | Dropout | 0.5 |
Bi-LSTM layers | 2 | *CharCNN window | 7 |
*Word Emb Dim | 300 | Char Emb dim | 30 |
Word Hidden Dim | 200 | *Char Hidden Dim | 70 |
Parameter . | Value . | Parameter . | Value . |
---|---|---|---|
Optimizer | SGD | *LR (token-single) | 0.01 |
*Batch Size | 8 | *LR (token-multi) | 0.005 |
LR decay | 0.05 | *LR (morpheme) | 0.01 |
Epochs | 200 | Dropout | 0.5 |
Bi-LSTM layers | 2 | *CharCNN window | 7 |
*Word Emb Dim | 300 | Char Emb dim | 30 |
Word Hidden Dim | 200 | *Char Hidden Dim | 70 |