Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organisations like Facebook.
In the first part of the course I will talk about Presto's theory including Presto's architecture and components - coordinator, worker, connector, query execution model, etc. Additionally, I will explain to you how Kafka, Cassandra, Hive, PostgreSQL and Redshift work before I mention the specifics to their connectors.
In the second part of the course, you are going to have many practical lectures where I will help you to build a development environment including Docker images for Hive and Presto, Helm chart for the whole Presto infrastructure and then deploy the chart to a local Kubernetes cluster.
Later, you will learn how to run a real world example by joining parquet files in S3 with PostgreSQL data in a single SQL query. When you learn how to run and use Presto's features locally, I will show you how to setup your AWS account and how to deploy your Presto cluster to a managed Kubernetes (EKS) cluster in Amazon where you will be able to analyse terabytes or even petabytes of data at scale.
Finally, I am going to talk about all available managed and non-managed Presto services on the market, describing pros and cons for each of them.
352
47
TAKE THIS COURSE