Mastering Presto: Hands-On Learning

Learn Presto - distributed SQL Query Engine for Big Data! Query parquet files in AWS S3! Join them with PostgreSQL data.

Ratings 3.84 / 5.00

What You Will Learn!

Query parquet files in AWS S3
Deploy to Kubernetes cluster in AWS
Join parquet files in S3 with data from PostgreSQL table
Deep knowledge about Presto's internal architecture
Learn more about Cassandra, Kafka, Redshift, Hive and PostgreSQL
Run development environment in local Kubernetes cluster using minikube
Understand Presto configurations
Presto service providers
Build Docker images for Presto and Hive
Create a Presto Helm chart

Description

Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organisations like Facebook.

In the first part of the course I will talk about Presto's theory including Presto's architecture and components - coordinator, worker, connector, query execution model, etc. Additionally, I will explain to you how Kafka, Cassandra, Hive, PostgreSQL and Redshift work before I mention the specifics to their connectors.

In the second part of the course, you are going to have many practical lectures where I will help you to build a development environment including Docker images for Hive and Presto, Helm chart for the whole Presto infrastructure and then deploy the chart to a local Kubernetes cluster.

Later, you will learn how to run a real world example by joining parquet files in S3 with PostgreSQL data in a single SQL query. When you learn how to run and use Presto's features locally, I will show you how to setup your AWS account and how to deploy your Presto cluster to a managed Kubernetes (EKS) cluster in Amazon where you will be able to analyse terabytes or even petabytes of data at scale.

Finally, I am going to talk about all available managed and non-managed Presto services on the market, describing pros and cons for each of them.

Who Should Attend!

Software Engineers curious about Presto
Software Engineers looking for distributed SQL query engine
Data Engineers looking for a way to optimize their queries over a huge data lake
Data Engineers looking to improve their Data Platforms

TAKE THIS COURSE

Subscribers

352

Lectures