Bring your data to compute with open source

Data orchestration for analytics and machine learning in the cloud

announcing alluxio 2.0

Alluxio 2.0 adds major capabilities to simplify & accelerate multi-cloud, data analytics and AI

Community Office Hour, July 30
Building Fast SQL Analytics with Presto, Alluxio, and S3

Featured Blog
Entering into Alluxio 2.0 & Marching to Data Orchestration

On-Demand Webinar
Accelerate Spark workloads on S3 with Alluxio

Featured Use Cases and Deployments

Data in the public cloud slowing your compute down?

Get in-memory data access for Spark and Presto on AWS S3, Google Cloud Platform, or Microsoft Azure.

Can’t burst HDFS in your hybrid cloud environment?

Simplify Hadoop for the hybrid cloud by making on-prem HDFS accessible to any compute in the cloud.

Data in on-premise object stores not fast enough?

Accelerate your Spark, Presto, and Tensorflow workloads for object stores on-premise or in the cloud.

Interact with Alluxio in any stack

Pick a compute. Pick a storage. Alluxio just works.

Full Docs

// Using Alluxio as input and output for RDD
scala> sc.textFile("alluxio://master:19998/Input")             
scala> rdd.saveAsTextFile("alluxio://master:19998/Output")

// Using Alluxio as input and output for Dataframe
scala> df = sqlContext.read.parquet("alluxio://master:19998/Input.parquet")
scala> df.write.parquet("alluxio://master:19998/Output.parquet")

Full Docs

-- Pointing Table location to Alluxio 
CREATE SCHEMA hive.web
WITH (location = 'alluxio://master:port/my-table/')

Full Docs

Create and Query table stored in Alluxio
hbase(main):001:0> create 'test', 'cf'
hbase(main):002:0> list 'test'

Full Docs

-- Pointing Table location to Alluxio
hive> CREATE TABLE u_user (
userid INT,
age INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LOCATION 'alluxio://master:port/table_data';

Full Docs

# Running a wordcount using Alluxio as input and output
$ bin/hadoop jar hadoop-mapreduce-examples-2.7.3.jar wordcount \
  -libjars /<ALLUXIO_HOME>/client/alluxio-<VERSION>-client.jar \
  alluxio://master:19998/wordcount/input.txt \ 
  alluxio://master:19998/wordcount/output

Full Docs

# Accessing Alluxio after mounting Alluxio service to local file system
$ ls /mnt/alluxio_mount
$ cat /mnt/alluxio_mount/mydata.txt
ALLUXIO
$ ./bin/alluxio fs mount \
--option aws.accessKeyId=<AWS_ACCESS_KEY_ID> \
--option aws.secretKey=<AWS_SECRET_KEY_ID> \
alluxio://master:port/s3 s3a://<S3_BUCKET>/<S3_DIRECTORY>

Full Docs

$ ./bin/alluxio fs mount \
alluxio://master:port/hdfs hdfs://namenode:port/dir/

Full Docs

$ ./bin/alluxio fs mount \
--option
fs.azure.account.key.<AZURE_ACCOUNT>.blob.core.windows.net=<AZURE_ACCESS_KEY> \
alluxio://master:port/azure 
wasb://<AZURE_CONTAINER>@<AZURE_ACCOUNT>.blob.core.windows.net/<AZURE_DIRECTORY>/

Full Docs

$ ./bin/alluxio fs mount \
--option fs.gcs.accessKeyId=<GCS_ACCESS_KEY_ID> \
--option fs.gcs.secretAccessKey=<GCS_SECRET_ACCESS_KEY> \
alluxio://master:port/gcs gs://<GCS_BUCKET>/<GCS_DIRECTORY>

Full Docs

$ ./bin/alluxio fs mount \
--option aws.accessKeyId=<AWS_ACCESS_KEY_ID> \
--option aws.secretKey=<AWS_SECRET_KEY_ID> \
--option alluxio.underfs.s3.endpoint=http://<rgw-hostname>:<rgw-port> \
--option alluxio.underfs.s3.disable.dns.buckets=true \
alluxio://master:port/ceph s3a://<S3_BUCKET>/<S3_DIRECTORY>

Full Docs

$ ./bin/alluxio fs mount alluxio://master:port/nfs /mnt/nfs
 
 
 
 
 

Full Docs

powered by alluxio

What’s Happening

Event
Building Fast SQL Analytics with Presto, Alluxio, and S3

Learn how to set up Presto with Alluxio such that Presto jobs can seamlessly read from and write to S3.
Compare the performance between Presto on S3 with Presto and Alluxio on S3.

Alluxio Community Office Hour *
News
New platform simplifies handling multi-cloud environments

A new platform launched today by Alluxio provides improved orchestration for data engineers managing and deploying analytical and AI workloads in the cloud, particularly for hybrid and multi-cloud environments.

BetaNews

Blog
2.0 is here! Embrace silos, orchestrate data, accelerate innovation!

Here in New York, at the AWS Summit, we are super excited to announce that Alluxio 2.0 is here, our most major release since the Alluxio launch.  A couple months ago, we released 2.0 Preview – which included some of the capabilities, but 2.0 now includes even more, to continue building on to our data … Continued

News
The 10 Top Big Data Startups Of 2019 (So Far)

Sales of big data and business analytics solutions are expected to reach $189 billion this year and that industry growth is spurring a steady stream of startups developing innovative big data products.

CRN

Slides from our latest talks
Accelerate Spark Workloads on S3

This webinar highlights a simple solution is to run Spark on Alluxio as a distributed cache for S3. Alluxio stores data in memory close … Continued

subscribe to the alluxio community newsletter