Building distributed systems is no small feat. Software testing is just one of many critical practices that engineers who build these systems need to utilize to ensure the quality and usability of their software. For distributed systems, scaling out testing frameworks to ensure that enterprises who run our in highly distributed environments is a complicated (and expensive task!)
Bring your data to compute with open source
Data orchestration for analytics and machine learning in the cloud
Alluxio enables compute
Bring your data close to compute.
Make your data local to compute workloads for Spark caching, Presto caching, Hive caching and more.
Make your data accessible.
No matter if it sits on-prem or in the cloud, HDFS or S3, make your files and objects accessible in many different ways.
Make your data as elastic as compute.
Effortlessly orchestrate your data for compute in any cloud, even if data is spread across multiple clouds.
Interact with Alluxio in any stack
Pick a compute. Pick a storage. Alluxio just works.
// Using Alluxio as input and output for RDD scala> sc.textFile("alluxio://master:19998/Input") scala> rdd.saveAsTextFile("alluxio://master:19998/Output") // Using Alluxio as input and output for Dataframe scala> df = sqlContext.read.parquet("alluxio://master:19998/Input.parquet") scala> df.write.parquet("alluxio://master:19998/Output.parquet”)
-- Pointing Table location to Alluxio hive> CREATE TABLE u_user ( userid INT, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LOCATION 'alluxio://master:port/table_data';
Create and Query table stored in Alluxio hbase(main):001:0> create 'test', 'cf' hbase(main):002:0> list ‘test'
# Accessing Alluxio after mounting Alluxio service to local file system $ ls /mnt/alluxio_mount $ cat /mnt/alluxio_mount/mydata.txt
Featured Use Cases and Deployments
Get in-memory access caching Spark and Presto data on AWS S3, Google Cloud Platform, or Microsoft Azure.
Zero-copy hybrid bursting with no app changes to intelligently burst processing to the cloud.
Accelerate your Spark, Presto, and Tensorflow workloads for object stores on-premise or in the cloud.
alluxio for data engineers
Are your Presto/Spark queries slow on S3?
Do your Presto/Spark queries have inconsistent performance?
Are your metadata operations slow on S3?
Are your egress costs too high?
alluxio for data architects
Can you share data across your app framework?
Do you have problems running remote/multiple storage systems?
Is running HDFS in the cloud for temporary storage expensive?
Do you have the directive to use cloud for analytics?
powered by alluxio
Alluxio, as a data orchestration layer provides the physical data independence, for Presto to interact with the data more efficiently. In addition to caching for IO acceleration, Alluxio also provides a catalog service to abstract the metadata in the Hive Metastore, and transformations to expose the data in compute-optimized way. In this talk, we describe some of the challenges of using Presto with Hive, and introduce Alluxio data orchestration for solving those challenges.
Alluxio 2.0 release was the biggest update since the birth of the project “Tachyon” from UC Berkley’s AmpLab. Gathering feedback from our Open Source … Continued
Join us for this tech talk where we’ll introduce the Starburst Presto, Alluxio, and cloud object store stack for building a highly-concurrent and low-latency … Continued
Data is not only growing in volume, it’s increasingly scattered across on-premises and cloud-based systems, complicating data management and governance tasks. Here are 10 companies with next-generation data management, data science and machine learning technology that solution providers should keep an eye on in 2020.
We are delighted by the success of the inaugural Data Orchestration Summit on Nov. 7, 2019! Organized by Alluxio, this one-day event was sold out with nearly 400 attendees! Data engineers, cloud engineers, data scientists joined the talks of 24 industry leaders from all over the globe to share their experiences building cloud-native data and … Continued