Data Engineering, Serverless ETL & BI on Amazon Cloud

Data warehousing & ETL on AWS Cloud

Ratings 4.75 / 5.00
Data Engineering, Serverless ETL & BI on Amazon Cloud

What You Will Learn!

  • Setting up a Data Warehouse on Amazon Cloud using Redshift from scratch
  • Learn and understand AWS Athena and when to make use of Athena
  • Learn how to store data in S3 Data lakes using Parquet columnar file formats and optimize the process of data scans using Athena
  • Learn and automate the ETL processes using different server-less components like AWS Glue , Data Pipeline and Lambda Functions
  • Data Centralization using Redshift Spectrum
  • Trigger and Automate Glue jobs using Lambda Functions
  • Understand how to pull data into QuickSight which is a BI-Reporting/Visualization offering from AWS

Description

AWS Cloud can seem intimidating and overwhelming to a lot of people due to its vast ecosystem, but this course will make it easier for anyone who wants a hands-on expertise in setting up a data-warehouse in Redshift or setup a BI infrastructure from scratch .

Data Scientists/Analysts/Business Analysts will soon be expected to (if not already) become all-rounders and handle the technical aspect of data ingestion/engineering/warehousing .

Anyone who has the basic understanding of how cloud works can benefit from this course because : 

- This course is designed keeping in mind end to end life cycle of a typical data engineering project

-  Provides a practical solution to real-world use-cases

This Course covers : 

  • Setting up a data warehouse in AWS Redshift from scratch

  • Basic Data Warehousing Concepts

  • Writing server-less AWS Glue Jobs (pyspark and python shell) for ETL and batch processing

  • AWS Athena for ad-hoc analysis (when to use Athena)

  • AWS Data Pipeline to sync incremental data

  • Lambda functions to trigger and automate ETL/Data Syncing processes

  • QuickSight Setup , Analyses and Dashboards

Prerequisites for this course are : 

  • Python / Sql (Absolute must)

  • PySpark (should know how to write some basic Pyspark scripts)

  • Willingness to explore ,learn and put in the extra effort to succeed

  • An active AWS Account

Important Note - This course makes use of the free tiers for Redshift and RDS , so you will not be billed for them unless you exceed the free tier usage which should be more than enough to get enough practice from this course  .

Also , this course makes use of AWS UI on the browser for creating clusters and setting up jobs , there is no bash scripting involved. One can use any operating system to perform the lab sessions in this course . 

This course is not code-intense or code-heavy ,there is only 35% coding involved , the rest is execution,understanding and chaining different component together. The whole purpose of this course is to make everyone aware of and feel comfortable with all the tools/features used in this course .

Some Tips : 

  • Try to watch the videos at 1.2X speed

  • Every time you work on a new component or feature , do some research on the other tools that are meant for the same purpose and see how they differ and in what aspects , For Eg  Redshift/Athena vs  Snowflake or Bigquery , QuickSight vs PowerBi vs Microstrategy


Who Should Attend!

  • Data Scientists/Analysts who need hands on implementation experience on AWS ETL Tools
  • Software developers who are curious to learn data engineering
  • Anyone with experience in coding that wants to get into the field of Data Engineering/Analytics and Science

TAKE THIS COURSE

Tags

  • Amazon Redshift
  • Data Warehouse
  • ETL
  • Data Engineering

Subscribers

6572

Lectures

51

TAKE THIS COURSE



Related Courses