Introduction to distributed computing using Spark
Duration
5 to 10 h
Syllabus
Basic concepts (RDDs, transformations, runtime architecture) behind distributed Machine Learning (ML) using Spark
MLlib, a library of machine learning functions
Implementation of supervised and unsupervised algorithms in Spark
Prerequisites
“Foundations of Data Science”
Credential
Taught in DSC workshops and executive (ad-hoc) trainings