02-Machine Learning in Language
Machine learning models within the context of language can operate quite differently than those used in Image Processing and other applications. For starters, because language follows such specific rules, when processing series of words, our ML models need to have some concept of what has come before them, what comes after them, and in what sort of context they are operating. For this reason, recurrent neural networks are often used rather than convolutional neural networks or other formats.

SCHEDULE


We will be looking to understand how Machine Learning works with respect to language, as well as examining some practical examples of ML in Language, including:

Word Vectors
Finding similar words, word analogies, averages between two different kinds of words, etc.

Generative Text
Generating words in the style of a trained corpus of words such as JK Rowling’s Harry Potter. Training a model on our own corpus of words.

Current State of ML in Language
Other applications of ML in language processing, including a predictive writer keyboard to which you can upload your own corpus of text.

RNNs


Recurrent neural networks (RNN): networks with loops to create persistence. The networks base their knowledge off of things they’ve learned in the past, rather than being insularly focused on the data they are processing at the present moment like feed-forward networks. In machine learning, this is effective, but slow. RNNs can also be difficult to train. Recent optimization algorithms have helped with this.

How’s it different from other neural networks?

Unlike feedforward neural networks, RNNs can use their internal state (memory) to process sequences of inputs.

WORD2VEC


A Group of related models trained to construct the linguistic contexts of words by generating word vectors. Essentially, these vectors involve assigning words numerical values based on their sentiment within a specified context, and then grouping them. 

Basic Premise:

Words that appear in the same contexts / proximity share semantic meaning.

"the quick brown fox jumped over the lazy dog”

We can organize this sentence into groups where words reoccur and are evaluated against a main word: 

([the, brown], quick), ([quick, fox], brown), ([brown, jumped], fox), ...

We then optimize everything with something like Stochastic Gradient Descent (SGD) or T-SNE.

SGD

Different from standard gradient descent in that samples are selected randomly instead of in order, or as a single group. Remember walking down that mountain? Now imagine teleporting around until you find the bottom.

T-SNE

A method of reducing a dataset with many dimensions down to just 2 dimensions.