Deep Learning for NLP - Part 3

Part 3: Sentence Embeddings, Generative Transformer Models

Ratings 4.71 / 5.00

What You Will Learn!

Deep Learning for Natural Language Processing
Sentence Embeddings: Bag of words, Doc2Vec, SkipThought, InferSent, DSSM, USE, MTDNN, SentenceBERT
Generative Transformer Models: UniLM, Transformer-XL and XLNet, MASS, BART, CTRL, T5, ProphetNet
DL for NLP

Description

This course is a part of "Deep Learning for NLP" Series. In this course, I will introduce concepts like Sentence embeddings and Generative Transformer Models. These concepts form the base for good understanding of advanced deep learning models for modern Natural Language Generation.

The course consists of two main sections as follows.

In the first section, I will talk about sentence embeddings. We will start with basic bag of words methods where sentence embedddings are obtained using an aggregation over word embeddings of constituent words. We will talk about averaged bag of words, word mover's distance, SIF and Power means method. Then we will discuss two unsupervised methods: Doc2Vec and SkipThought. Further, we will discuss about supervised sentence embedding methods like recursive neural networks, deep averaging networks and InferSent. CNNs can also be used for computing semantic similarity between two text strings; we will talk about DSSMs for the same. We will also discuss 3 multi-task learning methods including Universal Sentence Encodings and MT-DNN. Lastly, I will talk about SentenceBERT.

In the second section, I will talk about multiple Generative Transformer Models. We will start with UniLM. Then we will talk about segment recurrence and relative position embeddings in Transformer-XL. Then get to XLNets which use Transformer-XL along with permutation language modeling. Next we will understand span masking in MASS and also discuss various noising methods on BART. We will then discuss about controlled natural language generation using CTRL. We will discuss how T5 models every learning task as a text-to-text task. Finally, we will discuss how ProphetNet extends 2-stream attention modeling from XLNet to n-stream attention modeling, thereby enabling n-gram predictions.