Relevant statistical methods for data with high dimensions
Duration
10 to 20 h
Syllabus
Regression models with big data: motivation for variable selection
(e.g. text regression, treatment effects with many controls, Bayesian
networks and high-d time series), regularization, sparsity and a new
theory for Statistics
Variable selection: best subset, Bayesian variable selection and
computational challenges
Penalized likelihood methods: lasso and variants
Post selection inference: an overview of valid statistical methods
for estimation and testing with big data
Network models and high-d covariance estimation
Prerequisites
“Foundations of Data Science” and Part I of “Introduction to Causal inference“
Credential
Taught as part of Summer School Week II, PhD courses, Statistical Modelling and Inference MSc in Data Science. Executive (ad-hoc) courses