Learn how to set up Google Cloud Dataproc with Alluxio so jobs can seamlessly read from and write to Cloud Storage. See how to run Dataproc Spark against a remote HDFS cluster.
Bring your data to compute with open source
Data orchestration for analytics and machine learning in the cloud
Featured Tech Talk: How the Development Bank of Singapore solves on-prem compute capacity challenges with cloud bursting
Alluxio enables compute
Bring your data close to compute.
Make your data local to compute workloads for Spark caching, Presto caching, Hive caching and more.
Make your data accessible.
No matter if it sits on-prem or in the cloud, HDFS or S3, make your files and objects accessible in many different ways.
Make your data as elastic as compute.
Effortlessly orchestrate your data for compute in any cloud, even if data is spread across multiple clouds.
alluxio for data engineers
Are your Presto/Spark queries slow on S3?
Do your Presto/Spark queries have inconsistent performance?
Are your metadata operations slow on S3?
Are your egress costs too high?
alluxio for data architects
Can you share data across your app framework?
Do you have problems running remote/multiple storage systems?
Is running HDFS in the cloud for temporary storage expensive?
Do you have the directive to use cloud for analytics?
Interact with Alluxio in any stack
Pick a compute. Pick a storage. Alluxio just works.
// Using Alluxio as input and output for RDD scala> sc.textFile("alluxio://master:19998/Input") scala> rdd.saveAsTextFile("alluxio://master:19998/Output") // Using Alluxio as input and output for Dataframe scala> df = sqlContext.read.parquet("alluxio://master:19998/Input.parquet") scala> df.write.parquet("alluxio://master:19998/Output.parquet”)
-- Pointing Table location to Alluxio hive> CREATE TABLE u_user ( userid INT, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LOCATION 'alluxio://master:port/table_data';
Create and Query table stored in Alluxio hbase(main):001:0> create 'test', 'cf' hbase(main):002:0> list ‘test'
# Accessing Alluxio after mounting Alluxio service to local file system $ ls /mnt/alluxio_mount $ cat /mnt/alluxio_mount/mydata.txt
Featured Use Cases and Deployments
Get in-memory access caching Spark and Presto data on AWS S3, Google Cloud Platform, or Microsoft Azure.
Zero-copy hybrid bursting with no app changes to intelligently burst processing to the cloud.
Accelerate your Spark, Presto, and Tensorflow workloads for object stores on-premise or in the cloud.
powered by alluxio
Alluxio has made available a range of cloud offerings and integrations with the latest Alluxio version 2.1. At the first Data Orchestration Summit at the Computer History Museum, the company also announced the strengthening of partnerships with Amazon AWS and Google Cloud.
The selected companies come from our massive data set of vendors and industry metrics. Yes, we use machine learning to analyze the industry in a detailed manner to determine a ranking for this list.
This online meetup shows why and how we solve some challenging technical issues, improve the speed, and reduce the costs of our AWS EMR … Continued
This tutorial describes steps to set up an EMR cluster with Alluxio as a distributed caching layer for Hive, and run sample queries to access data in S3 through Alluxio.