Register for this webinar to learn how to run EMR Spark on Alluxio as a distributed file system cache for S3.
Bring your data to compute with open source
Data orchestration for analytics and machine learning in the cloud
Community Office Hour, July 30
Building Fast SQL Analytics with Presto, Alluxio, and S3
Scalable metadata service in Alluxio: storing billions of files
Webinar, June 27th
Accelerate Spark workloads on S3 with Alluxio
Featured Use Cases and Deployments
Data in the public cloud slowing your compute down?
Get in-memory data access for Spark and Presto on AWS S3, Google Cloud Platform, or Microsoft Azure.
Can’t burst HDFS in your hybrid cloud environment?
Simplify Hadoop for the hybrid cloud by making on-prem HDFS accessible to any compute in the cloud.
Data in on-premise object stores not fast enough?
Accelerate your Spark, Presto, and Tensorflow workloads for object stores on-premise or in the cloud.
Interact with Alluxio in any stack
Pick a compute. Pick a storage. Alluxio just works.
// Using Alluxio as input and output for RDD scala> sc.textFile("alluxio://master:19998/Input") scala> rdd.saveAsTextFile("alluxio://master:19998/Output") // Using Alluxio as input and output for Dataframe scala> df = sqlContext.read.parquet("alluxio://master:19998/Input.parquet") scala> df.write.parquet("alluxio://master:19998/Output.parquet")
-- Pointing Table location to Alluxio CREATE SCHEMA hive.web WITH (location = 'alluxio://master:port/my-table/')
Create and Query table stored in Alluxio hbase(main):001:0> create 'test', 'cf' hbase(main):002:0> list 'test'
-- Pointing Table location to Alluxio hive> CREATE TABLE u_user ( userid INT, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LOCATION 'alluxio://master:port/table_data';
# Accessing Alluxio after mounting Alluxio service to local file system $ ls /mnt/alluxio_mount $ cat /mnt/alluxio_mount/mydata.txt
Alluxio enables compute
powered by alluxio
Many organizations want to run big data analytics with frameworks such as Presto on public clouds. However, reading and writing data to S3 directly can result in slow and inconsistent performance. Alluxio is a data orchestration layer for the cloud, and in this use case it caches data for S3, ensuring high and predictable performance as well as reduced network traffic.
Announcing the OEM partnership with Alluxio and Starburst Data, the company behind Presto, the fastest growing SQL query engine in a disaggregated world.
At Alluxio, we believe that in order to fundamentally solve the data access challenges, the world needs a new layer – a data orchestration platform – between computation frameworks and storage systems.