Azure Databricks - Build data engineering and AI/ML pipeline

Learn anomaly detection, Azure datafactory, Azure Devops, Azure Webapp, Spark, Delta lake, Kafka and explainable AI

Ratings 2.70 / 5.00

Azure Databricks - Build data engineering and AI/ML pipeline

What You Will Learn!

What is Anomaly detection?
How to apply unsupervised learning algorithms Isolation Forest, KNN and Clustering based Approach to detect anomalies?
Step by Step guide to perform ETL operations using Azure Databricks
Understand DataLakeHouse Architecture
Build Data Pipeline using Azure Tech stack
machine learning model interpretable shapley values
Spark structured streaming with Kafka
Spark Structured streaming with Azure Event Hub
Use MLFlow for managing the end-to-end machine learning lifecycle
Anomaly detection on Time series data
Building CI/CD Pipeline using Azure Devops
Building Data Pipeline using Azure Data Factory
Productionizing model using Azure Function and Docker

Description

This course is designed to help you develop the skill necessary to perform ETL operations in Databricks, build unsupervised anomaly detection models, learn MLOPS, perform CI/CD operations in databricks and Deploy machine learning models into production.

Big Data engineering:

Big data engineers interact with massive data processing systems and databases in large-scale computing environments. Big data engineers provide organizations with analyses that help them assess their performance, identify market demographics, and predict upcoming changes and market trends.

Azure Databricks:

Azure Databricks is a data analytics platform optimized for the Microsoft Azure cloud services platform. Azure Databricks offers three environments for developing data intensive applications: Databricks SQL, Databricks Data Science & Engineering, and Databricks Machine Learning.

Anomlay detection:

Anomaly detection (aka outlier analysis) is a step in data mining that identifies data points, events, and/or observations that deviate from a dataset’s normal behavior. Anomalous data can indicate critical incidents, such as a technical glitch, or potential opportunities, for instance a change in consumer behavior. Machine learning is progressively being used to automate anomaly detection.

Data Lake House:

A data lakehouse is a data solution concept that combines elements of the data warehouse with those of the data lake. Data lakehouses implement data warehouses' data structures and management features for data lakes, which are typically more cost-effective for data storage .

Explainable AI:

Explainable AI is artificial intelligence in which the results of the solution can be understood by humans. It contrasts with the concept of the "black box" in machine learning where even its designers cannot explain why an AI arrived at a specific decision.

Spark structured streaming:

Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. .In short, Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing without the user having to reason about streaming.

CI/CD Operation :

CI and CD stand for continuous integration and continuous delivery/continuous deployment. In very simple terms, CI is a modern software development practice in which incremental code changes are made frequently and reliably.