Model . | BERT+CRF-1 . | BERT+CRF-2 . | BERT+Bi-LSTM+CRF . |
---|---|---|---|
max_seq_length | 320 | 300 | 300 |
label_num | 9 | 9 | 9 |
batch_size | 4 | 4 | 4 |
dropout_rate | 0.4 | 0.5 | 0.3 |
bi-lstm units | / | / | 128 |
hidden_size (BERT) | 1024 | ||
learning_rate | Adjusted dynamically, adjusted every 10 epochs, 5e-5, 3e-5, 2e-5, 1e-5, 5e-6 and 1e-6 | ||
crf_lr_multiplier | 100 times of learning-rate of BERT layer | ||
optimization | Adam | ||
epoch | 60 |
Model . | BERT+CRF-1 . | BERT+CRF-2 . | BERT+Bi-LSTM+CRF . |
---|---|---|---|
max_seq_length | 320 | 300 | 300 |
label_num | 9 | 9 | 9 |
batch_size | 4 | 4 | 4 |
dropout_rate | 0.4 | 0.5 | 0.3 |
bi-lstm units | / | / | 128 |
hidden_size (BERT) | 1024 | ||
learning_rate | Adjusted dynamically, adjusted every 10 epochs, 5e-5, 3e-5, 2e-5, 1e-5, 5e-6 and 1e-6 | ||
crf_lr_multiplier | 100 times of learning-rate of BERT layer | ||
optimization | Adam | ||
epoch | 60 |