Speed Up Uber’s Presto with Alluxio | A collaboration between Uber and Alluxio – part 1
This article shares how Uber and Alluxio collaborated to design and implement Presto local cache to reduce HDFS latency.
This article shares how Uber and Alluxio collaborated to design and implement Presto local cache to reduce HDFS latency.
The Alluxio 2.8 version focuses on the S3 API, enterprise-grade security, scalability and observability in data migration. Enhanced S3 API makes managing Alluxio easier than ever. Features such as encryption at rest and policy-driven data management further improve Alluxio’s functionality to support enterprise customers.
How T3Go’s high-performance data lake using Apache Hudi and Alluxio shortened the time for data ingestion into the lake by up to a factor of 2. Data analysts using Presto, Hudi, and Alluxio in conjunction to query data on the lake saw queries speed up by 10 times faster.
When applications are only reading and writing through Alluxio, the Alluxio file system provides strong consistency. However, when clients are writing data across both Alluxio and under storage, the consistency depends on the Alluxio write type and under storage type. This article discusses what to expect in each scenario.
In this talk, we describe the architecture to migrate analytics workloads incrementally to any public cloud (AWS, Google Cloud Platform, or Microsoft Azure) directly on on-prem data without copying the data to cloud storage.
Tags: cloud, data analytics, data lake, hdfs, hybrid, on-prem, presto, spark, storage
As the third largest e-commerce site in China, Vipshop processes large amounts of data collected daily to generate targeted advertisements for its consumers. In this article, Gang Deng from Vipshop describes how to meet SLAs by improving struggling Spark jobs on HDFS by up to 30x, and optimize hot data access with Alluxio to create … Continued
In Alluxio, an Under File System is the plugin to connect to any file systems or object stores, so users can mount different storages like AWS S3 or HDFS into Alluxio namespace. This under filesystem is designed to be modular, in order to enable users to easily extend this framework with their own Under File System implementation and connect to a new or customized storage system.
Tags: apache ozone, aws s3, hdfs, meetup, object stores, storage, under filesystem
In this office hour, we demonstrate how a “zero-copy burst” solution helps to speed up Spark and Presto queries in the public cloud while eliminating the process of manually copying and synchronizing data from the on-premise data lake to cloud storage. This approach allows compute frameworks to decouple from on-premise data sources and scale efficiently by leveraging Alluxio and public cloud resources such as AWS.
Tags: aws, cloud storage, compute, hdfs, hybrid cloud, office hour, performance, presto, spark, zero copy bursting
This article describes how engineers in the Data Service Center at Tencent PCG leverages Alluxio to optimize the analytics performance by 200% and minimize the operating cost in building Tencent Beacon Growing, a real-time data analytics platform.