Loading...

Extractive Text Summarisation

Extractive summarisation basically involves binary classification for each sentence indicating if it would be part of the summary

Encoder-Decoder architecture - Encoder - Obtains document level features and domain understanding and creates word, or sentence level representations (in case of BERT the three word, position and sentence encodings) and then these vectors are fed to Decoder

PreSumm decoder - Stacked transformer layers to obtain document level understanding and then a logistic function for label classification.

Decoder used by PreSumm- Randomly Initialised transformer decoder.

Optimisers of encoder and decoders are separate—because encoder here is pretrained.

Two Transformer layers works the best it’s called BERTSUMEXT

Math That I don’t think I completely understand.

1234

User-uploaded image: Screenshot+2019-09-17+at+1.31.04+AM.png

12

User-uploaded image: Screenshot+2019-09-17+at+12.58.22+AM.png

Some Data:

Rouge Scores:

TransformerExt	40.90	18.02	37.17
BertSumExt	43.23	20.24	39.63
BertSumExt (large)	43.85	20.34	39.90
SummaRunner	39.60	16.20	35.30
BanditSum	41.50	18.70	37.60

Datasets;

SummaRunner

Dataset: CNN/DailyMail

Train: 196557 Documents

Val: 12147 Documents

Embeddings : 100-word2vec

Vocab : 150k

Gradient Clipping

BanditSum

Dataset: CNN/DailyMail

Train : 196,557 training documents

Val : 12,147 validation documents

Embeddings :100-dimensional Glove embeddings

Gradient Clipping

PreSumm

Dataset: CNN/DailyMail

Train:90266+196961

Val: 1200 + 12148

Embeddings : BERT

PreSumm -

Takes pre-trained encoders (BERT) and stacks inter-sentence transformers on them to capture document level features.

Both extractive and abstractive summarisation—also might have a combination of both?

Assigns scores to each sentence and top three sentences are selected as model summary

Trigram blocking used to reduce redundancy—i.e they don’t want similar sentences in summary