Bring your data to compute with open source

Data orchestration for analytics and machine learning in the cloud

Looking for more compute capacity with remote data?
See how “Zero-Copy” hybrid bursting helps

Get the whitepaper >

How the Development Bank of Singapore solves on-prem compute capacity challenges with cloud bursting

Watch the tech talk >

Alluxio enables compute

Data locality

Bring your data close to compute.
Make your data local to compute workloads for Spark caching, Presto caching, Hive caching and more.

Data Accessibility

Make your data accessible.
No matter if it sits on-prem or in the cloud, HDFS or S3, make your files and objects accessible in many different ways.

Data On-Demand

Make your data as elastic as compute.
Effortlessly orchestrate your data for compute in any cloud, even if data is spread across multiple clouds.

Interact with Alluxio in any stack

Pick a compute. Pick a storage. Alluxio just works.

Tutorial –> Full Docs –>

-- Pointing Table location to Alluxio 
CREATE SCHEMA hive.web
WITH (location = 'alluxio://master:port/my-table/‘)

Full Docs

// Using Alluxio as input and output for RDD
scala> sc.textFile("alluxio://master:19998/Input")             
scala> rdd.saveAsTextFile("alluxio://master:19998/Output")

// Using Alluxio as input and output for Dataframe
scala> df = sqlContext.read.parquet("alluxio://master:19998/Input.parquet")
scala> df.write.parquet("alluxio://master:19998/Output.parquet”)

Full Docs

-- Pointing Table location to Alluxio
hive> CREATE TABLE u_user (
userid INT,
age INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LOCATION 'alluxio://master:port/table_data';

Full Docs

Create and Query table stored in Alluxio
hbase(main):001:0> create 'test', 'cf'
hbase(main):002:0> list ‘test'

Full Docs

# Running a wordcount using Alluxio as input and output
$ bin/hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount \
  -libjars /<ALLUXIO_HOME>/client/alluxio-<VERSION>-client.jar \
  alluxio://master:19998/wordcount/input.txt \ 
  alluxio://master:19998/wordcount/output

Full Docs

# Accessing Alluxio after mounting Alluxio service to local file system
$ ls /mnt/alluxio_mount
$ cat /mnt/alluxio_mount/mydata.txt
ALLUXIO
$ ./bin/alluxio fs mount \
--option aws.accessKeyId=<AWS_ACCESS_KEY_ID> \
--option aws.secretKey=<AWS_SECRET_KEY_ID> \
alluxio://master:port/s3 s3a://<S3_BUCKET>/<S3_DIRECTORY>

Full Docs

$ ./bin/alluxio fs mount \
alluxio://master:port/hdfs hdfs://namenode:port/dir/

Full Docs

$ ./bin/alluxio fs mount \
--option
fs.azure.account.key.<AZURE_ACCOUNT>.blob.core.windows.net=<AZURE_ACCESS_KEY> \
alluxio://master:port/azure 
wasb://<AZURE_CONTAINER>@<AZURE_ACCOUNT>.blob.core.windows.net/<AZURE_DIRECTORY>/

Full Docs

$ ./bin/alluxio fs mount \
--option fs.gcs.accessKeyId=<GCS_ACCESS_KEY_ID> \
--option fs.gcs.secretAccessKey=<GCS_SECRET_ACCESS_KEY> \
alluxio://master:port/gcs gs://<GCS_BUCKET>/<GCS_DIRECTORY>

Full Docs

$ ./bin/alluxio fs mount \
--option aws.accessKeyId=<AWS_ACCESS_KEY_ID> \
--option aws.secretKey=<AWS_SECRET_KEY_ID> \
--option alluxio.underfs.s3.endpoint=http://<rgw-hostname>:<rgw-port> \
--option alluxio.underfs.s3.disable.dns.buckets=true \
alluxio://master:port/ceph s3a://<S3_BUCKET>/<S3_DIRECTORY>

Full Docs

$ ./bin/alluxio fs mount alluxio://master:port/nfs /mnt/nfs
 
 
 
 
 

Full Docs

Featured Use Cases and Deployments

Data in the public cloud slowing your compute down?

Get in-memory access caching Spark and Presto data on AWS S3, Google Cloud Platform, or Microsoft Azure.

Managing data copies/app changes when bursting compute to cloud?

Zero-copy hybrid bursting with no app changes to intelligently burst processing to the cloud.

Data in on-premise object stores not fast enough?

Accelerate your Spark, Presto, and Tensorflow workloads for object stores on-premise or in the cloud.

alluxio for data engineers

Are your Presto/Spark queries slow on S3?

Do your Presto/Spark queries have inconsistent performance?

Are your metadata operations slow on S3?

Are your egress costs too high?

SEE how alluxio helps >

alluxio for data architects

Can you share data across your app framework?

Do you have problems running remote/multiple storage systems?

Is running HDFS in the cloud for temporary storage expensive?

Do you have the directive to use cloud for analytics?

SEE how alluxio helps >

Announcing Alluxio 2.0! Learn more about the release >

powered by alluxio

What’s Happening

Event
Testing Distributed System at Scale for the Cost of a Large Pizza on AWS

Building distributed systems is no small feat. Software testing is just one of many critical practices that engineers who build these systems need to utilize to ensure the quality and usability of their software. For distributed systems, scaling out testing frameworks to ensure that enterprises who run our in highly distributed environments is a complicated (and expensive task!)

Community Online Office Hour *
Event
Optimizing Query Performance by Decoupling Presto and Hive Data Warehouse

Alluxio, as a data orchestration layer provides the physical data independence, for Presto to interact with the data more efficiently. In addition to caching for IO acceleration, Alluxio also provides a catalog service to abstract the metadata in the Hive Metastore, and transformations to expose the data in compute-optimized way. In this talk, we describe some of the challenges of using Presto with Hive, and introduce Alluxio data orchestration for solving those challenges.

Community Online Office Hour *
News
10 Hot Big Data Companies To Watch In 2020

Data is not only growing in volume, it’s increasingly scattered across on-premises and cloud-based systems, complicating data management and governance tasks. Here are 10 companies with next-generation data management, data science and machine learning technology that solution providers should keep an eye on in 2020.

CRN

Blog
Data Orchestration Summit Recap and Highlights!

We are delighted by the success of the inaugural Data Orchestration Summit on Nov. 7, 2019! Organized by Alluxio, this one-day event was sold out with nearly 400 attendees! Data engineers, cloud engineers, data scientists joined the talks of 24 industry leaders from all over the globe to share their experiences building cloud-native data and … Continued