Cloud analytics Caching

featured use case

Is data in the public cloud slowing your compute down? With Alluxio, get in-memory data access for Spark, Presto, Hive and other analytics frameworks on AWS S3, Google Cloud, or Microsoft Azure.

are analytics workloads in the cloud getting slow and expensive?

Running compute on top of S3/GCS/Azure comes with its own set of data engineering problems to solve.

Performance is variable and consistent query SLAs are hard to achieve

Metadata operations like list and rename are expensive, so workloads run longer

Egress costs, particularly cross-region, can add up, making the solution expensive

Eventual consistency on writes makes it hard to predict query results

alluxio accelerates analytic workloads in the cloud and saves costs

Co-locate your data with your compute for in-memory data access so you can cache data in the cloud in the same instance as your compute. Run in the cloud of your choosing or use AWS EMR with Alluxio.

Reading S3/GCS/Azure data into Spark, Presto, Hive, or any other compute framework and enabling data sharing is automated and transparent with Alluxio, which serves data to compute to improve the end-to-end model development efficiency. Alluxio can be deployed colocated with your compute cluster, exposing the data through Alluxio POSIX or HDFS compatible interfaces and backed by a mounted remote storage like S3.

Want help getting your analytics workloads faster and less expensive? Schedule a meeting with an Alluxio solutions engineer.

alluxio on aws

Alluxio on S3

Alluxio supports access to different storage systems through its unified namespace. Configure S3 as Alluxio’s under storage system in six steps.

Alluxio on EMR + S3

Run clusters on-demand for compute workloads with AWS EMR. Alluxio on EMR and S3 provides more functionality than EMRFS.

alluxio on google cloud

Alluxio on GCS

Alluxio supports access to different storage systems through its unified namespace. Configure GCS as Alluxio’s under storage system in four steps.

Alluxio on Dataproc + GCS

Deploy Alluxio on GCP with the Dataproc initialization action. To run an Alluxio cluster on Dataproc, you’ll need to sign up for a Google Cloud account first.

alluxio on azure

Alluxio on Azure Blob Store

Alluxio supports access to different storage systems through its unified namespace. Configure Azure Blob Store as Alluxio’s under storage system in three steps.

alluxio + presto and azure

Run Presto to query Alluxio as a distributed cache layer with Azure Blob Store. Alluxio will allow Presto tp access data from Azure and transparently cache the data frequently accessed.


Data Consistency Model in Alluxio

When applications are only reading and writing through Alluxio, the Alluxio file system provides strong consistency. However, when clients are writing data across both Alluxio and under storage, the consistency depends on the Alluxio write type and under storage type. This article discusses what to expect in each scenario.

Building a high-performance platform on AWS to support real-time gaming services using Presto and Alluxio

This blog explores an innovative platform with Presto as the computing engine and Alluxio as a data orchestration layer between Presto and S3 storage, to support online services with instantaneous response within the gaming industry. The preliminary results show that Presto with Alluxio outperforms S3 significantly in all cases.Alluxio with metadata caching shows up to 5.9x performance gain when handling large numbers of small files.

Reducing Large S3 API Costs Using Alluxio

This article described how engineers at datasapiens brought down S3 API costs by 200x by implementing Alluxio as a data orchestration layer between S3 and Presto.