Our modern world runs on software and data, with Git - a version control tool we track and manage the different changes and versions of our software. Git is very useful in every programmer's work. It is a must-have tool for working in any software-related field, that includes data science to machine learning.
What about the data and the ML models we build? How do we track and manage them?
How do data scientist, machine learning engineers and AI developers track and manage the data and models they spend hours and days building?
In this course we will explore Git and DVC - two essential version control tools that every data scientist, ML engineer and AI developer needs when working on their data science project.
This is a very new field hence there are not a lot of materials on using git and dvc for data science projects. The goal of this exciting and unscripted course is to introduce you to Git and DVC for data science.
We will also explore Data Version control, how to track your models and your datasets using DVC and Git.
By the end of the course you will have a comprehensive overview of the fundamentals of Git and DVC and how to use these tools in managing and tracking your ML models and dataset for the entire machine learning project life cycle.
This course is unscripted,fun and exciting but at the same time we will dive deep into DVC and Git For Data Science.
Specifically you will learn
Git Essentials
How Git works
Git Branching for Data Science Project
Build our own custom Version Control Tools from scratch
Data Version Control - The What,Why and How
DVC Essentials
How to track and version your ML Models
DVC pipelines
How to use DAGsHub and GitHub
Label Studio
Best practices in using Git and DVC
Machine Learning Experiment Tracking
etc