Skip to content

Word2Vec

Source: Word2Vec Tutorial - The Skip-Gram Model

Consider a NN: Input a word, output others' word's probability that occur nearby the word:

N:word(word[0,1])

Train Data:

Matrix M1 shows the "feature" of 10000 words. Each word has 300 features.

Word Embedding

I eat apple
I     -> [1, 0, 0]
eat   -> [0, 1, 0]
apple -> [0, 0, 1]
--> [
	[1, 0, 0],
    [0, 1, 0],
    [0, 0, 1]
]

Attention

Good article: Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)

Changelog

Just observe 👀