Oleg Zabluda's blog
Wednesday, February 15, 2017
 
king - man + woman is queen; but why?
king - man + woman is queen; but why?
"""
word2vec is an algorithm that transforms words into vectors, so that words with similar meaning end up laying close to each other. Moreover, it allows us to use vector arithmetics to work with analogies, for example the famous king - man + woman = queen. I will try to explain how it works
[...]
we claim that two words a and b are similar if P(w|a)=P(w|b) for every word w. [...] Often instead of working with conditional probabilities, we use the pointwise mutual information (PMI), defined as:

PMI(a,b)=log[P(a,b)/P(a)P(b)]=log[P(a|b)/P(a)].

Its direct interpretation is how much more likely we get a pair than if it were at random. The logarithm makes it easier to work with words appearing at frequencies of different orders of magnitude. We can approximate PMI as a scalar product:

PMI(a,b)=v⃗ a⋅v⃗ b,
where v⃗ i are vectors, typically of 50-300 dimensions.
[...]
The condition that P(w|a)=P(w|b) is equivalent to PMI(w,a)=PMI(w,b), After expressing PMI with vector products, we get

v⃗ w⋅v⃗ a=v⃗ w⋅v⃗ b
v⃗ w⋅(v⃗ a−v⃗ b)=0
If it needs to work for every v⃗ w, then v⃗ a=v⃗ b.
[...]
Analogies and linear space

If we want to make word analogies (a is to b is as A is to B), one may argue that in can be expressed as an equality of conditional probability ratios

P(w|a)/P(w|b)=P(w|A)/P(w|B) for every word w.
[...]
For example, if we want to say dog is to puppy as cat is to kitten, we expect that if e.g. word nice co-occurs with both dog and cat (likely with different frequencies), then it co-occurs with puppy and kitten by the same factor. It appears it is true, with the factor of two favoring the cubs - compare pairs to words from Google Books Ngram Viewer (while n-grams look only at adjacent words, they can be some sort of approximation).

By proposing ratios for word analogies we implicitly assume that the probabilities of words can be factorized with respect to different dimensions of a word. For the discussed case it would be:

P(w|dog)=f(w|species=dog)×f(w|age=adult)×P(w|is_a_pet)
P(w|puppy)=f(w|species=dog)×f(w|age=cub)×P(w|is_a_pet)
P(w|cat)=f(w|species=cat)×f(w|age=adult)×P(w|is_a_pet)
P(w|kitten)=f(w|species=cat)×f(w|age=cub)×P(w|is_a_pet)

So, in particular:

P(w|dog)P(w|puppy)=f(w|age=adult)f(w|age=cub)=P(w|cat)P(w|kitten).

How does the equality of conditional probability ratios translate to the word vectors? If we express it as mutual information (again, P(w) and logarithms) we get

v⃗ w⋅v⃗ a−v⃗ w⋅v⃗ b=v⃗ w⋅v⃗ A−v⃗ w⋅v⃗ B,

which is the same as

v⃗ w⋅(v⃗ a−v⃗ b−v⃗ A+v⃗ B)=0.

Again, if we want it to hold for any word w, this vector difference needs to be zero.

Difference of words vectors, like

v⃗ she−v⃗ he
are not words vectors by themselves. However, it is interesting to project a word on this axis. We can see that the projection

v⃗ w⋅(v⃗ a−v⃗ b)=log[P(w|a)]−log[P(w|b)]

is exactly a relative occurrence of a word within different contexts.

Bear in mind that when we want to look at common aspects of a word it is more natural to average two vectors rather than take their sum. While people use it interchangeably, it only works because cosine distance ignores the absolute vector length. So, for a gender neutral pronoun use (v⃗ she+v⃗ he)/2 rather than their sum.
"""
http://p.migdal.pl/2017/01/06/king-man-woman-queen-why.html

Labels:


| |

Home

Powered by Blogger