How does Presto work with Hadoop?

At a high level, a query engine is a piece of software that sits on top of a database or server and executes queries against data in that database or server to provide answers for users or applications.

Presto’s distributed SQL query engine runs on Hadoop. Its architecture consists of one coordinator node that works together with multiple worker nodes. Presto does not have its own storage system so it is a good complement to Hadoop/HDFS.

How is Hadoop related to Hive?

Apache Hive was the original SQL query engine developed at Facebook, built on top of Hadoop. You can query data stored in various databases and file systems that integrate with Hadoop. Hive makes it easy to perform batch SQL queries on large amounts of unstructured data. As compared to Presto, Hive is optimized for query throughput while Presto is optimized for latency. Hive is better suited for queries that require a large amount of memory.

Additional Resources

Introduction to Presto and Commonly Asked Questions

An Introduction to the Presto Architecture

What is a query engine?

Try a 10 min tutorial on executing Presto queries