What You Will Learn!
- Do machine learning in R
- Process data for modelling
Description
This course explores several modern machine learning and data science techniques in R. As you probably know, R is one of the most used tools among data scientists. We showcase a wide array of statistical and machine learning techniques. In particular:
- Using R's statistical functions for drawing random numbers, calculating densities, histograms, etc.
- Supervised ML problems using the CARET package
- Data processing using sqldf, caret, etc.
- Unsupervised techniques such as PCA, DBSCAN, K-means
- Calling Deep Learning models in Keras(Python) from R
- Use the powerful XGBOOST method for both regression and classification
- Doing interesting plots, such as geo-heatmaps and interactive plots
- Train ML train hyperparameters for several ML methods using caret
- Do linear regression in R, build log-log models, and do ANOVA analysis
- Estimate mixed effects models to explicitly model the covariances between observations
- Train outlier robust models using robust regression and quantile regression
- Identify outliers and novel observations
- Estimate ARIMA (time series) models to predict temporal variables
Most of the examples presented in this course come from real datasets collected from the web such as Kaggle, the US Census Bureau, etc. All the lectures can be downloaded and come with the corresponding material. The teaching approach is to briefly introduce each technique, and focus on the computational aspect. The mathematical formulas are avoided as much as possible, so as to concentrate on the practical implementations.
This course covers most of what you would need to work as a data scientist, or compete in Kaggle competitions. It is assumed that you already have some exposure to data science / statistics.
Who Should Attend!
- Students aiming to do serious data science in R, with some knowledge about statistics
TAKE THIS COURSE