Bring your data to compute with open source

Data orchestration for analytics and machine learning in the cloud

Presto + Alluxio = Better Together

Alluxio and Starburst Data – the Presto company – announce strategic partnership at Presto Summit to accelerate modern cloud analytics

Community Office Hour, July 30
Building Fast SQL Analytics with Presto, Alluxio, and S3

Featured Blog
Scalable metadata service in Alluxio: storing billions of files

Webinar, June 27th
Accelerate Spark workloads on S3 with Alluxio

Featured Use Cases and Deployments

Data in the public cloud slowing your compute down?

Get in-memory data access for Spark and Presto on AWS S3, Google Cloud Platform, or Microsoft Azure.

Can’t burst HDFS in your hybrid cloud environment?

Simplify Hadoop for the hybrid cloud by making on-prem HDFS accessible to any compute in the cloud.

Data in on-premise object stores not fast enough?

Accelerate your Spark, Presto, and Tensorflow workloads for object stores on-premise or in the cloud.

Interact with Alluxio in any stack

Pick a compute. Pick a storage. Alluxio just works.

Full Docs

// Using Alluxio as input and output for RDD
scala> sc.textFile("alluxio://master:19998/Input")             
scala> rdd.saveAsTextFile("alluxio://master:19998/Output")

// Using Alluxio as input and output for Dataframe
scala> df = sqlContext.read.parquet("alluxio://master:19998/Input.parquet")
scala> df.write.parquet("alluxio://master:19998/Output.parquet")

Full Docs

-- Pointing Table location to Alluxio 
CREATE SCHEMA hive.web
WITH (location = 'alluxio://master:port/my-table/')

Full Docs

Create and Query table stored in Alluxio
hbase(main):001:0> create 'test', 'cf'
hbase(main):002:0> list 'test'

Full Docs

-- Pointing Table location to Alluxio
hive> CREATE TABLE u_user (
userid INT,
age INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LOCATION 'alluxio://master:port/table_data';

Full Docs

# Running a wordcount using Alluxio as input and output
$ bin/hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount \
  -libjars /<ALLUXIO_HOME>/client/alluxio-<VERSION>-client.jar \
  alluxio://master:19998/wordcount/input.txt \ 
  alluxio://master:19998/wordcount/output

Full Docs

# Accessing Alluxio after mounting Alluxio service to local file system
$ ls /mnt/alluxio_mount
$ cat /mnt/alluxio_mount/mydata.txt
ALLUXIO
$ ./bin/alluxio fs mount \
--option aws.accessKeyId=<AWS_ACCESS_KEY_ID> \
--option aws.secretKey=<AWS_SECRET_KEY_ID> \
alluxio://master:port/s3 s3a://<S3_BUCKET>/<S3_DIRECTORY>

Full Docs

$ ./bin/alluxio fs mount \
alluxio://master:port/hdfs hdfs://namenode:port/dir/

Full Docs

$ ./bin/alluxio fs mount \
--option
fs.azure.account.key.<AZURE_ACCOUNT>.blob.core.windows.net=<AZURE_ACCESS_KEY> \
alluxio://master:port/azure 
wasb://<AZURE_CONTAINER>@<AZURE_ACCOUNT>.blob.core.windows.net/<AZURE_DIRECTORY>/

Full Docs

$ ./bin/alluxio fs mount \
--option fs.gcs.accessKeyId=<GCS_ACCESS_KEY_ID> \
--option fs.gcs.secretAccessKey=<GCS_SECRET_ACCESS_KEY> \
alluxio://master:port/gcs gs://<GCS_BUCKET>/<GCS_DIRECTORY>

Full Docs

$ ./bin/alluxio fs mount \
--option aws.accessKeyId=<AWS_ACCESS_KEY_ID> \
--option aws.secretKey=<AWS_SECRET_KEY_ID> \
--option alluxio.underfs.s3.endpoint=http://<rgw-hostname>:<rgw-port> \
--option alluxio.underfs.s3.disable.dns.buckets=true \
alluxio://master:port/ceph s3a://<S3_BUCKET>/<S3_DIRECTORY>

Full Docs

$ ./bin/alluxio fs mount alluxio://master:port/nfs /mnt/nfs
 
 
 
 
 

Full Docs

powered by alluxio

What’s Happening

Event
Accelerate Spark workloads on S3

Register for this webinar to learn how to run EMR Spark on Alluxio as a distributed file system cache for S3.

Alluxio Webinar *
Event
Building Fast SQL Analytics with Presto, Alluxio, and S3

Many organizations want to run big data analytics with frameworks such as Presto on public clouds. However, reading and writing data to S3 directly can result in slow and inconsistent performance. Alluxio is a data orchestration layer for the cloud, and in this use case it caches data for S3, ensuring high and predictable performance as well as reduced network traffic.

Alluxio Community Office Hour *

subscribe to the alluxio community newsletter