This course is designed to help you develop the skill necessary to perform ETL operations in Databricks, build unsupervised anomaly detection models, learn MLOPS, perform CI/CD operations in databricks and Deploy machine learning models into production.
Big Data engineering:
Big data engineers interact with massive data processing systems and databases in large-scale computing environments. Big data engineers provide organizations with analyses that help them assess their performance, identify market demographics, and predict upcoming changes and market trends.
Azure Databricks:
Azure Databricks is a data analytics platform optimized for the Microsoft Azure cloud services platform. Azure Databricks offers three environments for developing data intensive applications: Databricks SQL, Databricks Data Science & Engineering, and Databricks Machine Learning.
Anomlay detection:
Anomaly detection (aka outlier analysis) is a step in data mining that identifies data points, events, and/or observations that deviate from a dataset’s normal behavior. Anomalous data can indicate critical incidents, such as a technical glitch, or potential opportunities, for instance a change in consumer behavior. Machine learning is progressively being used to automate anomaly detection.
Data Lake House:
A data lakehouse is a data solution concept that combines elements of the data warehouse with those of the data lake. Data lakehouses implement data warehouses' data structures and management features for data lakes, which are typically more cost-effective for data storage .
Explainable AI:
Explainable AI is artificial intelligence in which the results of the solution can be understood by humans. It contrasts with the concept of the "black box" in machine learning where even its designers cannot explain why an AI arrived at a specific decision.
Spark structured streaming:
Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. .In short, Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing without the user having to reason about streaming.
CI/CD Operation :
CI and CD stand for continuous integration and continuous delivery/continuous deployment. In very simple terms, CI is a modern software development practice in which incremental code changes are made frequently and reliably.