JD.com is one of the largest e-commerce corporations. In big data platform of JD.com, there are tens of thousands of nodes and tens of petabytes off-line data which require millions of spark and MapReduce jobs to process everyday. As the main query engine, thousands of machines work as Presto nodes and Presto plays an import role in the field of In-place analysis and BI tools. Meanwhile, Alluxio is deployed to improve the performance of Presto. The practice of Presto & Alluxio in JD.com benefits a lot of engineers and analysts.
Presto & Alluxio on AWS: How we build a Up-To-Date Data-Platform at Ryte.
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Comcast, GrubHub, FINRA, LinkedIn, Lyft, Netflix, Slack, Zalando, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
For many latency-sensitive SQL workloads, Presto is often bound by retrieving distant data. In this talk, Rohit Jain from Facebook will introduce their teams’ collaboration with Alluxio on adding a local on-SSD Alluxio cache inside Presto workers at Facebook to improve queries with unsatisfied latency.
This is an open source community conference focused on the key data engineering challenges and solutions around building cloud-native data and AI platforms using latest technologies such as Alluxio, Apache Spark, Apache Airflow, Presto, Tensorflow, and Kubernetes.
Electronic Arts (EA) is a leading company in the gaming industry, providing over a thousand games to serve billions of users worldwide. The EA Data & AI Department builds hundreds of platforms to manage petabytes of data generated by games and users every day. These platforms consist of a wide range of data analytics, from real-time data ingestion to ETL pipelines. Formatted data produced by our department is widely adopted by executives, producers, product managers, game engineers, and designers for marketing and monetization, game design, customer engagement, player retention, and end-user experience.
How T3Go’s high-performance data lake using Apache Hudi and Alluxio shortened the time for data ingestion into the lake by up to a factor of 2. Data analysts using Presto, Hudi, and Alluxio in conjunction to query data on the lake saw queries speed up by 10 times faster.
In this talk, we describe the architecture to migrate analytics workloads incrementally to any public cloud (AWS, Google Cloud Platform, or Microsoft Azure) directly on on-prem data without copying the data to cloud storage.
In this presentation, Haoyuan Li shares an overview of PAX (Presto Alluxio Stack), its related industry trends, and how PAX solves challenges and brings values to its hundreds of users in the cloud.