Announcing Alluxio 2.5 – advanced interfaces and acceleration for analytics and AI/ML pipelines

Now with an accelerated POSIX API for unified storage access, performance, and ease of management

Join at us for Alluxio Day virtual event on April 27th and hear from Nvidia, Microsoft, Alibaba & more!

Register >

Read about how to evaluate Alluxio’s data orchestration platform as a distributed cache for Apache Spark in a public cloud or on-premises

See benchmark white paper >

We’re hiring! Join our team and build the future of data orchestration. See open positions >

Alluxio enables compute

Data locality

Bring your data close to compute.
Make your data local to compute workloads for Spark caching, Presto caching, Hive caching and more.

Data Accessibility

Make your data accessible.
No matter if it sits on-prem or in the cloud, HDFS or S3, make your files and objects accessible in many different ways.

Data On-Demand

Make your data as elastic as compute.
Effortlessly orchestrate your data for compute in any cloud, even if data is spread across multiple clouds.

“zero-copy” burst user spotlight: walmart

Why Walmart chose Alluxio’s “Zero-Copy” burst solution:

  • No requirement to persist data into the cloud
  • Improved query performance and no network hops on recurrent queries 
  • Lower costs without the need for creating data copies

See more on how Alluxio powers Walmart’s “zero-copy” burst solution in their presentation >

Featured Use Cases and Deployments

Managing data copies/app changes when bursting compute to cloud?

Zero-copy hybrid bursting with no app changes to intelligently make remote data accessible in the public cloud.

Expanding compute capacity across geo-distributed data centers?

Zero-copy bursting across data centers for Presto, Spark, and Hive with no app changes on data stored in HDFS.

Interact with Alluxio in any stack

Pick a compute. Pick a storage. Alluxio just works.

Tutorial –> Full Docs –>

-- Pointing Table location to Alluxio 
WITH (location = 'alluxio://master:port/my-table/‘)

Full Docs

// Using Alluxio as input and output for RDD
scala> sc.textFile("alluxio://master:19998/Input")             
scala> rdd.saveAsTextFile("alluxio://master:19998/Output")

// Using Alluxio as input and output for Dataframe
scala> df ="alluxio://master:19998/Input.parquet")
scala> df.write.parquet("alluxio://master:19998/Output.parquet”)

Full Docs

-- Pointing Table location to Alluxio
hive> CREATE TABLE u_user (
userid INT,
age INT)
LOCATION 'alluxio://master:port/table_data';

Full Docs

Create and Query table stored in Alluxio
hbase(main):001:0> create 'test', 'cf'
hbase(main):002:0> list ‘test'

Full Docs

# Running a wordcount using Alluxio as input and output
$ bin/hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount \
  -libjars /<ALLUXIO_HOME>/client/alluxio-<VERSION>-client.jar \
  alluxio://master:19998/wordcount/input.txt \ 

Full Docs

# Accessing Alluxio after mounting Alluxio service to local file system
$ ls /mnt/alluxio_mount
$ cat /mnt/alluxio_mount/mydata.txt
$ ./bin/alluxio fs mount \
--option aws.accessKeyId=<AWS_ACCESS_KEY_ID> \
--option aws.secretKey=<AWS_SECRET_KEY_ID> \
alluxio://master:port/s3 s3a://<S3_BUCKET>/<S3_DIRECTORY>

Full Docs

$ ./bin/alluxio fs mount \
alluxio://master:port/hdfs hdfs://namenode:port/dir/

Full Docs

$ ./bin/alluxio fs mount \

Full Docs

$ ./bin/alluxio fs mount \
--option fs.gcs.accessKeyId=<GCS_ACCESS_KEY_ID> \
--option fs.gcs.secretAccessKey=<GCS_SECRET_ACCESS_KEY> \
alluxio://master:port/gcs gs://<GCS_BUCKET>/<GCS_DIRECTORY>

Full Docs

$ ./bin/alluxio fs mount \
--option aws.accessKeyId=<AWS_ACCESS_KEY_ID> \
--option aws.secretKey=<AWS_SECRET_KEY_ID> \
--option alluxio.underfs.s3.endpoint=http://<rgw-hostname>:<rgw-port> \
--option alluxio.underfs.s3.disable.dns.buckets=true \
alluxio://master:port/ceph s3a://<S3_BUCKET>/<S3_DIRECTORY>

Full Docs

$ ./bin/alluxio fs mount alluxio://master:port/nfs /mnt/nfs

Full Docs

Announcing Alluxio 2.0! Learn more about the release >

powered by alluxio

What’s Happening

Alluxio video presentations
Introducing what’s new in Alluxio 2.5

Alluxio 2.5 focuses on improving interface support to broaden the set of data driven applications which can benefit from data orchestration. The POSIX and … Continued

Accelerating Analytics and AI with Alluxio and NVIDIA GPUs

Data processing is increasingly making use of NVIDIA computing for massive parallelism. Advancements in accelerated compute mean that access to storage must also be quicker, whether in analytics, artificial intelligence (AI), or machine learning (ML) pipelines.

Bursting Your On-Premises Data Lake Analytics and AI Workloads on AWS

This post outlines a solution for building a hybrid data lake with Alluxio to leverage analytics and AI on Amazon Web Services (AWS) alongside a multi-petabyte on-premises data lake. Alluxio’s solution is called “zero-copy” hybrid cloud, indicating a cloud migration approach without first copying data to Amazon Simple Storage Service (Amazon S3).