Bring your data to compute with open source

Data orchestration for analytics and machine learning in the cloud

Presto + Alluxio = Better Together

Alluxio and Starburst Data – the Presto company – announce strategic partnership at Presto Summit to accelerate modern cloud analytics

Community Office Hour, June 25th
Accelerating Spark in Kubernetes using Alluxio

Featured Blog
Scalable metadata service in Alluxio: storing billions of files

Webinar, June 27th
Accelerate Spark workloads on S3 with Alluxio

Featured Use Cases and Deployments

Data in the public cloud slowing your compute down?

Get in-memory data access for Spark and Presto on AWS S3, Google Cloud Platform, or Microsoft Azure.

Can’t burst HDFS in your hybrid cloud environment?

Simplify Hadoop for the hybrid cloud by making on-prem HDFS accessible to any compute in the cloud.

Data in on-premise object stores not fast enough?

Accelerate your Spark, Presto, and Tensorflow workloads for object stores on-premise or in the cloud.

Interact with Alluxio in any stack

Pick a compute. Pick a storage. Alluxio just works.

Full Docs

// Using Alluxio as input and output for RDD
scala> sc.textFile("alluxio://master:19998/Input")             
scala> rdd.saveAsTextFile("alluxio://master:19998/Output")

// Using Alluxio as input and output for Dataframe
scala> df ="alluxio://master:19998/Input.parquet")
scala> df.write.parquet("alluxio://master:19998/Output.parquet")

Full Docs

-- Pointing Table location to Alluxio 
WITH (location = 'alluxio://master:port/my-table/')

Full Docs

Create and Query table stored in Alluxio
hbase(main):001:0> create 'test', 'cf'
hbase(main):002:0> list 'test'

Full Docs

-- Pointing Table location to Alluxio
hive> CREATE TABLE u_user (
userid INT,
age INT)
LOCATION 'alluxio://master:port/table_data';

Full Docs

# Running a wordcount using Alluxio as input and output
$ bin/hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount \
  -libjars /<ALLUXIO_HOME>/client/alluxio-<VERSION>-client.jar \
  alluxio://master:19998/wordcount/input.txt \ 

Full Docs

# Accessing Alluxio after mounting Alluxio service to local file system
$ ls /mnt/alluxio_mount
$ cat /mnt/alluxio_mount/mydata.txt
$ ./bin/alluxio fs mount \
--option aws.accessKeyId=<AWS_ACCESS_KEY_ID> \
--option aws.secretKey=<AWS_SECRET_KEY_ID> \
alluxio://master:port/s3 s3a://<S3_BUCKET>/<S3_DIRECTORY>

Full Docs

$ ./bin/alluxio fs mount \
alluxio://master:port/hdfs hdfs://namenode:port/dir/

Full Docs

$ ./bin/alluxio fs mount \

Full Docs

$ ./bin/alluxio fs mount \
--option fs.gcs.accessKeyId=<GCS_ACCESS_KEY_ID> \
--option fs.gcs.secretAccessKey=<GCS_SECRET_ACCESS_KEY> \
alluxio://master:port/gcs gs://<GCS_BUCKET>/<GCS_DIRECTORY>

Full Docs

$ ./bin/alluxio fs mount \
--option aws.accessKeyId=<AWS_ACCESS_KEY_ID> \
--option aws.secretKey=<AWS_SECRET_KEY_ID> \
--option alluxio.underfs.s3.endpoint=http://<rgw-hostname>:<rgw-port> \
--option alluxio.underfs.s3.disable.dns.buckets=true \
alluxio://master:port/ceph s3a://<S3_BUCKET>/<S3_DIRECTORY>

Full Docs

$ ./bin/alluxio fs mount alluxio://master:port/nfs /mnt/nfs

Full Docs

powered by alluxio

What’s Happening

Accelerate Spark workloads on S3

Register for this webinar to learn how to run EMR Spark on Alluxio as a distributed file system cache for S3.

Alluxio Webinar *
Starburst Presto and Alluxio announce strategic OEM partnership

Announcing the OEM partnership with Alluxio and Starburst Data, the company behind Presto, the fastest growing SQL query engine in a disaggregated world. We’ll be offering a bundled solution that will bring the two open source technologies together to provide exceptional performance and multi-cloud capabilities for interactive analytic workloads.

Data Orchestration – The Missing Piece in the Data World

We are in the early stages of the data revolution. Organizations are racing to build data-driven cultures and innovate on data-driven applications. These applications impact many facets of our lives from the way we get to work to how we are medically diagnosed. However, the value of the data is far from being fully utilized and the speed of innovation can be dramatically improved. We believe the critical missing piece is a data orchestration layer.
The current pace of innovation is hindered by the necessity of reinventing the wheel in order for applications to efficiently access data. When an engineer or scientist wants to write an application to solve a problem, he or she needs to spend significant effort on getting the application to access the data efficiently

subscribe to the alluxio community newsletter