Mastering Presto: Hands-On Learning

Learn Presto - distributed SQL Query Engine for Big Data! Query parquet files in AWS S3! Join them with PostgreSQL data.

Ratings 3.82 / 5.00
Mastering Presto: Hands-On Learning

What You Will Learn!

  • Query parquet files in AWS S3
  • Deploy to Kubernetes cluster in AWS
  • Join parquet files in S3 with data from PostgreSQL table
  • Deep knowledge about Presto's internal architecture
  • Learn more about Cassandra, Kafka, Redshift, Hive and PostgreSQL
  • Run development environment in local Kubernetes cluster using minikube
  • Understand Presto configurations
  • Presto service providers
  • Build Docker images for Presto and Hive
  • Create a Presto Helm chart

Description

Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organisations like Facebook.

In the first part of the course I will talk about Presto's theory including Presto's architecture and components - coordinator, worker, connector, query execution model, etc. Additionally, I will explain to you how Kafka, Cassandra, Hive, PostgreSQL and Redshift work before I mention the specifics to their connectors.

In the second part of the course, you are going to have many practical lectures where I will help you to build a development environment including Docker images for Hive and Presto, Helm chart for the whole Presto infrastructure and then deploy the chart to a local Kubernetes cluster.

Later, you will learn how to run a real world example by joining parquet files in S3 with PostgreSQL data in a single SQL query. When you learn how to run and use Presto's features locally, I will show you how to setup your AWS account and how to deploy your Presto cluster to a managed Kubernetes (EKS) cluster in Amazon where you will be able to analyse terabytes or even petabytes of data at scale.

Finally, I am going to talk about all available managed and non-managed Presto services on the market, describing pros and cons for each of them.

Who Should Attend!

  • Software Engineers curious about Presto
  • Software Engineers looking for distributed SQL query engine
  • Data Engineers looking for a way to optimize their queries over a huge data lake
  • Data Engineers looking to improve their Data Platforms

TAKE THIS COURSE

Tags

  • Kubernetes
  • Software Engineering
  • Presto

Subscribers

347

Lectures

47

TAKE THIS COURSE



Related Courses