This course explores a variety of machine learning and data science techniques using real life datasets/images/audio collected from several sources. These realistic situations are much better than dummy examples, because they force the student to better think the problem, pre-process the data in a better way, and evaluate the performance of the prediction in different ways.
The datasets used here are from different sources such as Kaggle, US Data.gov, CrowdFlower, etc. And each lecture shows how to preprocess the data, model it using an appropriate technique, and compute how well each technique is working on that specific problem. Certain lectures contain also multiple techniques, and we discuss which technique is outperforming the other. Naturally, all the code is shared here, and you can contact me if you have any questions. Every lecture can also be downloaded, so you can enjoy them while travelling.
The student should already be familiar with Python and some data science techniques. In each lecture, we do discuss some technical details on each method, but we do not invest much time in explaining the underlying mathematical principles behind each method
Some of the techniques presented here are:
The modules/libraries used here are:
Some of the real examples used here:
The motivation for this course is that many students willing to learn data science/machine learning are usually suck with dummy datasets that are not challenging enough. This course aims to ease that transition between knowing machine learning, and doing real machine learning on real situations.