Deep Learning for Computer Vision

From Pixels to Semantics

Ratings 4.71 / 5.00

What You Will Learn!

Build solid understanding of Computer vision foundations, using traditional and Deep Learning methods
Deep understanding of Conolutional Neural Networks and their usage in computer vision
Build practical projects with ConvNets, like image classification, multi-object detection and semantic segmentations
Understand and practice the concepts of Transfer Learning in practical problems
Learn how to visualize and debug ConvNets and understand their underlying dynamics in a practical way
Learn how to use and apply data augmentation and how to deal with large and small datasets using ConvNets
Understand the basics of dealing with time and video data using Spatio-temporal models
Understand the basics of 3D Deep Learning and how to deal with 3D data sets

Description

Welcome to our course, Deep Learning for Computer Vision: From Pixels to Semantics. In this course, we will cover three main parts. The first part covers the essentials of traditional computer vision pipeline, and how to deal with images in OpenCV and Pillow libraries, including the image pre-processing pipeline like: thresholding, denoising, blurring, filtering, edge detection, contours...etc. We will build simple apps like Car License Plate Detection (LPD) and activity recogntion. This will lead us to the revolution that deep learning brought to the game of computer vision, turning traditional filters into learnable parameters using Convolution Neural Networks. We will cover all the basics of ConvNets, including the details of the Vanilla architecture for image classification, hyper parameters like kernels, strides, maxpool and feature maps sizes calculations. Beyond the Vanilla architecture, we also cover the state-of-the art ConvNet meta-architectures and design patters, like skip-connnections, Inception, DenseNet...etc. In the second part, we will learn how to use ConvNets to solve practical problems in different situations, with small amount of data, how to use transfer learning and the different scenarios for that, and finally how to debug and visualize the leant kernels in ConvNets. In the last part, we will learn about different CV apps using ConvNets. We will learn about the Encoder-Decoder design pattern. We start by the task of semantic segmentation, where we will build a U-Net architecture from scratch for the Cambridge Video (CAMVID) dataset. Then we will learn about Object Detection, covering both 2-stage and one-shot architectures like SSD and YOLO. Next, we will learn how to deal with the video data using the Spatio-Temporal ConvNet architectures. Finally we will introduce 3D Deep Learning to extend ConvNets usage to deal with 3D data, like LiDAR data.