This is a hands-on, project-based course designed to help you master the foundations for classification modeling in Python.
We’ll start by reviewing the data science workflow, discussing the primary goals & types of classification algorithms, and do a deep dive into the classification modeling steps we’ll be using throughout the course.
You’ll learn to perform exploratory data analysis, leverage feature engineering techniques like scaling, dummy variables, and binning, and prepare data for modeling by splitting it into train, test, and validation datasets.
From there, we’ll fit K-Nearest Neighbors & Logistic Regression models, and build an intuition for interpreting their coefficients and evaluating their performance using tools like confusion matrices and metrics like accuracy, precision, and recall. We’ll also cover techniques for modeling imbalanced data, including threshold tuning, sampling methods like oversampling & SMOTE, and adjusting class weights in the model cost function.
Throughout the course, you'll play the role of Data Scientist for the risk management department at Maven National Bank. Using the skills you learn throughout the course, you'll use Python to explore their data and build classification models to accurately determine which customers have high, medium, and low credit risk based on their profiles.
Last but not least, you'll learn to build and evaluate decision tree models for classification. You’ll fit, visualize, and fine-tune these models using Python, then apply your knowledge to more advanced ensemble models like random forests and gradient boosted machines.
COURSE OUTLINE:
Intro to Data Science
Introduce the fields of data science and machine learning, review essential skills, and introduce each phase of the data science workflow
Classification 101
Review the basics of classification, including key terms, the types and goals of classification modeling, and the modeling workflow
Pre-Modeling Data Prep & EDA
Recap the data prep & EDA steps required to perform modeling, including key techniques to explore the target, features, and their relationships
K-Nearest Neighbors
Learn how the k-nearest neighbors (KNN) algorithm classifies data points and practice building KNN models in Python
Logistic Regression
Introduce logistic regression, learn the math behind the model, and practice fitting them and tuning regularization strength
Classification Metrics
Learn how and when to use several important metrics for evaluating classification models, such as precision, recall, F1 score, and ROC-AUC
Imbalanced Data
Understand the challenges of modeling imbalanced data and learn strategies for improving model performance in these scenarios
Decision Trees
Build and evaluate decision tree models, algorithms that look for the splits in your data that best separate your classes
Ensemble Models
Get familiar with the basics of ensemble models, then dive into specific models like random forests and gradient boosted machines
__________
Ready to dive in? Join today and get immediate, LIFETIME access to the following:
9.5 hours of high-quality video
18 homework assignments
9 quizzes
2 projects
Data Science in Python: Classification ebook (250+ pages)
Downloadable project files & solutions
Expert support and Q&A forum
30-day Udemy satisfaction guarantee
If you're an aspiring data scientist looking for an introduction to the world of classification modeling with Python, this is the course for you.
Happy learning!
-Chris Bruehl (Data Science Expert & Lead Python Instructor, Maven Analytics)