This blog post discusses the synergy between Trino and Alluxio, and how to deploy Alluxio as the caching layer for Trino. You will learn Why should you choose Alluxio as a cache for TrinoHow do Trino and Alluxio work togetherHow to configure Alluxio to point to S3 storage like MinIOHow to query Alluxio with query … Continued
ACHIEVE DATA OUTCOMES WITH ALLUXIO
Are you managing a complex set of tools and infra to serve your data driven applications? Alluxio’s data orchestration platform connects all your compute and storage systems, enabling you to access your data anywhere with a 50% average reduction in cloud egress costs.

Watch the on-demand webinar to learn about the latest features and enhancements in Alluxio 2.9.0, including multi-cluster synchronization, Kubernetes operator, and flexible S3 API.
Learn how Expedia implemented Alluxio for their cross-region data lake federation practice and achieve a modern, scalable data platform with a 50% reduction in S3 egress costs.
powered by alluxio












We’re hiring! Join our team and build the future of data orchestration. See open positions >
Alluxio enables compute

Data locality
Bring your data close to compute.
Make your data local to compute workloads for Spark caching, Presto caching, Hive caching and more.

Data Accessibility
Make your data accessible.
No matter if it sits on-prem or in the cloud, HDFS or S3, make your files and objects accessible in many different ways.

Data On-Demand
Make your data as elastic as compute.
Effortlessly orchestrate your data for compute in any cloud, even if data is spread across multiple clouds.
“zero-copy” burst user spotlight: walmart

Why Walmart chose Alluxio’s “Zero-Copy” burst solution:
- No requirement to persist data into the cloud
- Improved query performance and no network hops on recurrent queries
- Lower costs without the need for creating data copies
See more on how Alluxio powers Walmart’s “zero-copy” burst solution in their presentation >
Featured Use Cases and Deployments
Managing data copies/app changes when bursting compute to cloud?
Zero-copy hybrid bursting with no app changes to intelligently make remote data accessible in the public cloud.
Expanding compute capacity across geo-distributed data centers?

Zero-copy bursting across data centers for Presto, Spark, and Hive with no app changes on data stored in HDFS.
Interact with Alluxio in any stack
Pick a compute. Pick a storage. Alluxio just works.
-- Pointing Table location to Alluxio
CREATE SCHEMA hive.web
WITH (location = 'alluxio://master:port/my-table/‘)
// Using Alluxio as input and output for RDD
scala> sc.textFile("alluxio://master:19998/Input")
scala> rdd.saveAsTextFile("alluxio://master:19998/Output")
// Using Alluxio as input and output for Dataframe
scala> df = sqlContext.read.parquet("alluxio://master:19998/Input.parquet")
scala> df.write.parquet("alluxio://master:19998/Output.parquet”)
-- Pointing Table location to Alluxio
hive> CREATE TABLE u_user (
userid INT,
age INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LOCATION 'alluxio://master:port/table_data';
Create and Query table stored in Alluxio
hbase(main):001:0> create 'test', 'cf'
hbase(main):002:0> list ‘test'
# Accessing Alluxio after mounting Alluxio service to local file system
$ ls /mnt/alluxio_mount
$ cat /mnt/alluxio_mount/mydata.txt
ALLUXIO
$ ./bin/alluxio fs mount \
--option aws.accessKeyId=<AWS_ACCESS_KEY_ID> \
--option aws.secretKey=<AWS_SECRET_KEY_ID> \
alluxio://master:port/s3 s3a://<S3_BUCKET>/<S3_DIRECTORY>
$ ./bin/alluxio fs mount \
alluxio://master:port/hdfs hdfs://namenode:port/dir/
$ ./bin/alluxio fs mount \
--option
fs.azure.account.key.<AZURE_ACCOUNT>.blob.core.windows.net=<AZURE_ACCESS_KEY> \
alluxio://master:port/azure
wasb://<AZURE_CONTAINER>@<AZURE_ACCOUNT>.blob.core.windows.net/<AZURE_DIRECTORY>/
$ ./bin/alluxio fs mount \
--option fs.gcs.accessKeyId=<GCS_ACCESS_KEY_ID> \
--option fs.gcs.secretAccessKey=<GCS_SECRET_ACCESS_KEY> \
alluxio://master:port/gcs gs://<GCS_BUCKET>/<GCS_DIRECTORY>
$ ./bin/alluxio fs mount \
--option aws.accessKeyId=<AWS_ACCESS_KEY_ID> \
--option aws.secretKey=<AWS_SECRET_KEY_ID> \
--option alluxio.underfs.s3.endpoint=http://<rgw-hostname>:<rgw-port> \
--option alluxio.underfs.s3.disable.dns.buckets=true \
alluxio://master:port/ceph s3a://<S3_BUCKET>/<S3_DIRECTORY>
$ ./bin/alluxio fs mount alluxio://master:port/nfs /mnt/nfs
What’s Happening
Today, we are thrilled to announce that Alluxio 2.9 is generally available (GA) for both the free open source Alluxio Community Edition and Alluxio Enterprise Edition! With GA, you can expect stability, support, and enterprise-readiness from Alluxio. In this blog post, we explore how Alluxio is enabling growth and agility for analytics and AI applications … Continued
This article introduces how to read and write Delta lake tables on Alluxio. You can build multi-cloud data lake using Delta Lake and Alluxio, reducing your data storage costs and increasing flexibility 1. Overview 1.1 About Delta Lake Delta Lake is an open source storage framework that enables building a Lakehouse architecture and brings reliability … Continued
This blog was originally published in Razorpay Engineering Blog: https://engineering.razorpay.com/how-trino-and-alluxio-power-analytics-at-razorpay-803d3386daaf Razorpay is a large fintech company in India. Razorpay provides a payment solution that offers a fast, affordable, and secure way to accept and disburse payments online. On the engineering side, the availability and scalability of analytics infrastructure are crucial to providing seamless experiences to … Continued
A Fortune 50 technology company has successfully implemented Alluxio to achieve a hybrid-cloud strategy, become multi-cloud ready, cut costs, and boost agility. … Continued
This article shares the data platform practice at Expedia to federate cross-region data lakes spanning multiple geographic regions in the cloud. 1. Background Expedia Group (NASDAQ: EXPE) is an American online travel shopping company for consumer and small business travel. Expedia powers travel for everyone, everywhere through our global platform, with industry-leading technology solutions to … Continued
Today, many organizations are running a multitude of data-driven applications and data platforms that span multiple geographic regions and across heterogeneous environments – public, … Continued