Skip to Main Content
Table 8 
The BERT-L-T model hyperparameters. H is the pre-trained BERT model’s hidden vector size (Hbase = 768, Hlarge = 1,024)
LayerInput DimensionsOutput Dimensions
BERT Pretrained Encoder Interview text H × # Q-A pairs 
LSTMforwardLSTMbackward H × Max # Q-A pairs in batch 2H 
Linear Output 2H 
LayerInput DimensionsOutput Dimensions
BERT Pretrained Encoder Interview text H × # Q-A pairs 
LSTMforwardLSTMbackward H × Max # Q-A pairs in batch 2H 
Linear Output 2H 
Close Modal

or Create an Account

Close Modal
Close Modal