In the previous chapter, we saw how to apply ConvNets to images. During this chapter, we will apply similar ideas to texts.
What do a text and an image have in common? At first glance, very little. However, if we represent sentences or documents as a matrix then this matrix is not different from an image matrix where each cell is a pixel. So, the next question is, how can we represent a text as a matrix? Well, it is pretty simple: each row of a matrix is a vector which represents a basic unit of the text. Of course, now we need to define what a basic unit is. A simple choice could be to say that the basic unit is a character. Another choice would be to say that a basic unit is a word, yet another choice is to aggregate similar words together and then denote each aggregation (sometimes called cluster or embedding) with a representative symbol.
Now you might wonder: I understand that you represent the text as a vector but, in doing so, we lose the position of the words and this position should be important, shouldn't it?
Well, it turns out that in many real applications knowing whether a sentence contains a particular basic unit (a char, a word, or an aggregate) or not is pretty accurate information, even if we don't memorize where exactly in the sentence this basic unit is located.