Dropbox Paper

Paper Reading and Discussion #7

Please make sure to register to Dropbox to edit this document!

Date: June 13, 2020

Title: When BERT Plays the Lottery, All Tickets Are Winning

Authors: Sai Prasanna, Anna Rogers, Anna Rumshisky

Abstract: Much of the recent success in NLP is due to the large Transformer-based models such as BERT (Devlin et al, 2019). However, these models have been shown to be reducible to a smaller number of self-attention heads and layers. We consider this phenomenon from the perspective of the lottery ticket hypothesis. For fine-tuned BERT, we show that (a) it is possible to find a subnetwork of elements that achieves performance comparable with that of the full model, and (b) similarly-sized subnetworks sampled from the rest of the model perform worse. However, the "bad" subnetworks can be fine-tuned separately to achieve only slightly worse performance than the "good" ones, indicating that most weights in the pre-trained BERT are potentially useful. We also show that the "good" subnetworks vary considerably across GLUE tasks, opening up the possibilities to learn what knowledge BERT actually uses at inference time.

Link to paper: https://arxiv.org/abs/2005.00561

Slack channel: #paper_reading_7

Voting for next week’s paper: https://github.com/dair-ai/ml-nlp-paper-discussions/issues/4

Website: https://dair.ai/

GitHub: https://github.com/dair-ai

Twitter: https://twitter.com/dair_ai

Newsletter: https://dair.ai/newsletter/

Slack: https://join.slack.com/t/dairai/shared_invite/zt-dv2dwzj7-F9HT047jIGkunNKv88lQ~g

Why Paper Reading/Discussion?

Enable a fun and open place to discuss about the latest research in NLP and ML

Keeping up with the fast pace of ML and NLP research

Create a community where you can feel free to bounce off ideas/start conversations and always know that you are welcome to do so

Connect and engage with academics and industry practitioners

Agenda/Housekeeping

Introductions in chat

Where you from?

One thing you would love to get from today’s session?

Format of the paper reading sessions

Silent reading (~90 minutes)?

Open Discussion (30 minutes)

Introduction (5 minutes)

Method/Results (15 minutes)

Practicality (5 minutes)

Questions (5 minutes)

Note taking, especially during discussions (Volunteers! 🙏 )

GitHub repo to upload notes and track discussions

While reading the paper, we encourage you to post your notes/comments/summaries of what you understood from the paper (Use the sections below to determine where notes/comments/summaries should go). You can also include your questions below.

Discussion 🤓

Introduction ⚡

​​Agenda/Housekeeping

​​Discussion ​🤓​

Agenda/Housekeeping

Discussion 🤓