fastText is a famous library for text classification and text representation learning. It was released by Facebook in 2016. It’s really a long time ago when fastText was released. I’ve read the papers and the source code of fastText recently. There’s no doubt that fastText is a great work, and it’s really a good tool for text classification and text representation learning.
Hierarchical softmax is a great idea to organize the output layer of the neural network. It’s a binary tree structure. Each leaf node of the tree represents one class of the classification task. The path from the root node to the leaf node represents the class label. The hierarchical softmax can reduce the complexity of the output layer from to , where is the number of classes.
The probability of the class label is the product of the probabilities of the nodes on the path from the root node to the leaf node.
We can construct the hierarchical softmax based on the Huffman tree. The Huffman tree is a binary tree structure. The Huffman tree is constructed based on the frequency of the classes. The classes with higher frequency are closer to the root node of the tree. The classes with lower frequency are closer to the leaf nodes of the tree.
fastText uses the char-level n-gram feature to represent the text. The motivation is mainly in Morphologically Rich Languages (MRL) where the vocabulary has multiple morphological changes, such as there are 40 different verb forms in French, and there are 15 different noun forms in Finnish. The word level is hard to capture such kind of information very well. If we use the stemming, the information on the morphological changes will be discarded. The char-level n-gram is the best way to capture the information of the morphological changes.
The char-level n-gram also has the advantage of reducing the size of the vocabulary. The char-level n-gram can reduce the size of the vocabulary from to , where is the number of char-level n-grams.