Hyperparameter . | value . |
---|---|
dw (English word embeddings) | 100 |
dw (other languages word embeddings) | 300 |
dc (character embeddings) | 300 |
dpos (POS embeddings) | 16 |
dl (lemma embeddings) | 100 |
dh (LSTM hidden states) | 300 |
dhidden (hidden layer representation) | 200 |
doutput (output label embeddings) | 32 |
dr (role representation) | 128 |
dl′ (output lemma representation) | 128 |
K (BiLSTM depth) | 4 |
J (BiLSTM depth) | 2 |
batch size | 30 |
input layer dropout rate | 0.3 |
hidden layer dropout rate | 0.3 |
learning rate | 0.001 |
auxiliary tasks loss weight α | 0.5 |
Hyperparameter . | value . |
---|---|
dw (English word embeddings) | 100 |
dw (other languages word embeddings) | 300 |
dc (character embeddings) | 300 |
dpos (POS embeddings) | 16 |
dl (lemma embeddings) | 100 |
dh (LSTM hidden states) | 300 |
dhidden (hidden layer representation) | 200 |
doutput (output label embeddings) | 32 |
dr (role representation) | 128 |
dl′ (output lemma representation) | 128 |
K (BiLSTM depth) | 4 |
J (BiLSTM depth) | 2 |
batch size | 30 |
input layer dropout rate | 0.3 |
hidden layer dropout rate | 0.3 |
learning rate | 0.001 |
auxiliary tasks loss weight α | 0.5 |