knowledge distilling is a method to compress a large model into a smaller one.
Readan introduction to the Expectation Maximization algorithm.
Readword2vec is a classic model for generating word embeddings, it's important to understand how it works.
Read