Natural Language Processing

Original: 02/01/20
Revised: no

$$Rome + France - Italy = \space ?$$

What do you think the answer should be? But moreover, what are we doing here!? Adding and subtracting words as if they were numbers!? The fact that this equation makes sense is one of the most remarkable successes with using neural networks for Natural Language Processing (NLP). This success of AI has had the most visible impact in our lives.

Recall that in general (apart from the DRL, which we have seen above), most of AI applications have been with supervised learning, i.e., the system is trained on large sets of human-labeled data and it learns from that data how to label other data which it has not seen before (case in point, we show it 60,000 pictures of handwritten digits and it will learn to recognize handwritten digits that it has not seen before). One of the largest data sets that humanity has at its disposal is in text form. How can this data be seen as being labeled? That answer turns out to be of great importance.

The main working assumption in modern NLP is that of distributional semantics, namely that words only have meaning relative to the context in which they are used. The popular expression of this way of looking at words is You know a word by the company it keeps. Writers are fully aware of the fact that the meaning of words is mainly contextual. When we look at textual data, we can treat each word as being labeled by the contexts in which it appears. So the learning we do on text is supervised learning.

The modern treatment of this denotational semantics approach is based on machine learning with neural nets. A neural network is used to learn a representation of each word as a vector. The learning process does it in such a way that two words are semantically close if the vectors representing them are similar (similarity is a precise linear algebra operation, but we do not need to worry about that now).

So questions about words become questions in linear algebra about their representing vectors. $$Rome + France - Italy = Paris$$

The easiest way to use these Use pre-trained models that you can download online (easiest). Most NLP libraries that developers use for their applications contain such pre-trained models for English and other languages. Google makes a few available, depending on the corpus of data that makes most sense for you application: Wikipedia, Google News, Freebase

The data that is at the center of our attention is graph data. We will see in the next article that graph processing can borrow from the embedding technique we discussed above. Word embeddings are based on the idea that words live in neighborhoods (their context) and these neighborhoods determine their meaning. But a node in a graph also has neighborhoods, the totality of the other nodes that can be reached by traversing the graph starting from the node. So one could also look at a graph as labeled data as well and we could also embed a graph in a vector space, in such a way that the similarity between the vectors corresponds to how close the nodes are semantically.