This course introduces various methods for analyzing text data and using it as an input for regression and classification. Lectures will be complemented with hands-on exercises, working with text data in Python.
Duration
15 to 25 h
Syllabus
- Text preprocessing
- Dictionary methods
- TF-IDF models
- Latent Semantic Analysis
- Word Embeddings (Word2Vec, GloVe)
- Attention and Tranformers (BERT)
- Latent Dirichlet Allocation
- Dynamic topic models
- Structural topic model
Prerequisites
Assumes level in “Foundations of Data Science”
Credential
Part of this appears has been taught in Summer School Week II, also in MSc Data Science as part of course in text mining, and in executive (ad-hoc) courses