Register for this webinar to learn how to run EMR Spark on Alluxio as a distributed file system cache for S3.
Bring your data to compute with open source
Data orchestration for analytics and machine learning in the cloud
Community Office Hour, June 25th
Accelerating Spark in Kubernetes using Alluxio
Scalable metadata service in Alluxio: storing billions of files
Webinar, June 27th
Accelerate Spark workloads on S3 with Alluxio
Featured Use Cases and Deployments
Data in the public cloud slowing your compute down?
Get in-memory data access for Spark and Presto on AWS S3, Google Cloud Platform, or Microsoft Azure.
Can’t burst HDFS in your hybrid cloud environment?
Simplify Hadoop for the hybrid cloud by making on-prem HDFS accessible to any compute in the cloud.
Data in on-premise object stores not fast enough?
Accelerate your Spark, Presto, and Tensorflow workloads for object stores on-premise or in the cloud.
Interact with Alluxio in any stack
Pick a compute. Pick a storage. Alluxio just works.
// Using Alluxio as input and output for RDD scala> sc.textFile("alluxio://master:19998/Input") scala> rdd.saveAsTextFile("alluxio://master:19998/Output") // Using Alluxio as input and output for Dataframe scala> df = sqlContext.read.parquet("alluxio://master:19998/Input.parquet") scala> df.write.parquet("alluxio://master:19998/Output.parquet")
-- Pointing Table location to Alluxio CREATE SCHEMA hive.web WITH (location = 'alluxio://master:port/my-table/')
Create and Query table stored in Alluxio hbase(main):001:0> create 'test', 'cf' hbase(main):002:0> list 'test'
-- Pointing Table location to Alluxio hive> CREATE TABLE u_user ( userid INT, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LOCATION 'alluxio://master:port/table_data';
# Accessing Alluxio after mounting Alluxio service to local file system $ ls /mnt/alluxio_mount $ cat /mnt/alluxio_mount/mydata.txt
Alluxio enables compute
powered by alluxio
Announcing the OEM partnership with Alluxio and Starburst Data, the company behind Presto, the fastest growing SQL query engine in a disaggregated world. We’ll be offering a bundled solution that will bring the two open source technologies together to provide exceptional performance and multi-cloud capabilities for interactive analytic workloads.
We are in the early stages of the data revolution. Organizations are racing to build data-driven cultures and innovate on data-driven applications. These applications impact many facets of our lives from the way we get to work to how we are medically diagnosed. However, the value of the data is far from being fully utilized and the speed of innovation can be dramatically improved. We believe the critical missing piece is a data orchestration layer.
The current pace of innovation is hindered by the necessity of reinventing the wheel in order for applications to efficiently access data. When an engineer or scientist wants to write an application to solve a problem, he or she needs to spend significant effort on getting the application to access the data efficiently