Model Details 

Model Name: DREAM (single model)

Model Description: We first use elasticsearch to build index for Wiki docs and find top-10 sentences using BM25. The query is question + answer choice. We then concatenate them such as  <context tokens> <sep> <question tokens>  <answer tokens> <sep> <cls> to finetune XLNet-large cased model 

Experiment Details: In our experiments, we used the pre-trained XLNet-large cased model from https://github.com/huggingface/pytorch-transformers.  The accuracy  is 73.0%/66.9% on the dev/test dataset. The parameters is listed as below:
  • maximum sequence length: 256
  • batch size: 4
  • learning rate: 5e-6
  • adam epsilon: 1e-6
  • training steps: 2600 (about 1 epoch)

Model Name: DREAM (ensemble model)

Model Description: We first use elasticsearch to build index for Wiki docs and find top-10 sentences using BM25, and then collect natural language snippets from search-engine results. We concatenate them such as <context tokens> <sep> <question tokens>  <answer tokens> <sep> <cls> to finetune XLNet-large cased model and RoBERTa-large model.  

Experiment Details: 
For both models, we first pretrain on the RACE dataset, and then finetune on CommonsenseQA dataset. The accuracy is 81.6%/73.3% on the dev/test dataset.
The parameters is listed as below:
  • maximum sequence length: 256
  • batch size: 4
  • learning rate: 5e-6
  • adam epsilon: 1e-6
  • training steps: 10000 (Early Stopping)