Announcing Alluxio 2.5 – advanced interfaces and acceleration for analytics and AI/ML pipelines
Now with an accelerated POSIX API for unified storage access, performance, and ease of management
Alluxio enables compute
Bring your data close to compute.
Make your data local to compute workloads for Spark caching, Presto caching, Hive caching and more.
Make your data accessible.
No matter if it sits on-prem or in the cloud, HDFS or S3, make your files and objects accessible in many different ways.
Make your data as elastic as compute.
Effortlessly orchestrate your data for compute in any cloud, even if data is spread across multiple clouds.
“zero-copy” burst user spotlight: walmart
Why Walmart chose Alluxio’s “Zero-Copy” burst solution:
- No requirement to persist data into the cloud
- Improved query performance and no network hops on recurrent queries
- Lower costs without the need for creating data copies
Featured Use Cases and Deployments
Zero-copy hybrid bursting with no app changes to intelligently make remote data accessible in the public cloud.
Zero-copy bursting across data centers for Presto, Spark, and Hive with no app changes on data stored in HDFS.
Interact with Alluxio in any stack
Pick a compute. Pick a storage. Alluxio just works.
// Using Alluxio as input and output for RDD scala> sc.textFile("alluxio://master:19998/Input") scala> rdd.saveAsTextFile("alluxio://master:19998/Output") // Using Alluxio as input and output for Dataframe scala> df = sqlContext.read.parquet("alluxio://master:19998/Input.parquet") scala> df.write.parquet("alluxio://master:19998/Output.parquet”)
-- Pointing Table location to Alluxio hive> CREATE TABLE u_user ( userid INT, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LOCATION 'alluxio://master:port/table_data';
Create and Query table stored in Alluxio hbase(main):001:0> create 'test', 'cf' hbase(main):002:0> list ‘test'
# Accessing Alluxio after mounting Alluxio service to local file system $ ls /mnt/alluxio_mount $ cat /mnt/alluxio_mount/mydata.txt
powered by alluxio
Alluxio, a developer of open source cloud data orchestration software, today announced a go-to-market solution in collaboration with Intel to offer an in-memory acceleration layer with 3rd Gen Intel Xeon Scalable processors and Intel Optane persistent memory (PMem) 200 series.
This whitepaper details how to evaluate Alluxio’s data orchestration platform as a distributed cache for Apache Spark in a public cloud or on-premises. We … Continued
Data processing is increasingly making use of NVIDIA computing for massive parallelism. Advancements in accelerated compute mean that access to storage must also be quicker, whether in analytics, artificial intelligence (AI), or machine learning (ML) pipelines.
This post outlines a solution for building a hybrid data lake with Alluxio to leverage analytics and AI on Amazon Web Services (AWS) alongside a multi-petabyte on-premises data lake. Alluxio’s solution is called “zero-copy” hybrid cloud, indicating a cloud migration approach without first copying data to Amazon Simple Storage Service (Amazon S3).