This whitepaper details how to leverage a public cloud, such as Amazon AWS, Google GCP, or Microsoft Azure to scale analytic workloads directly on … Continued
ADVANCED ANALYTICS & AI ON REMOTE DATA FOR HYBRID AND MULTI-CLOUD
Open Source Data Orchestration for the Cloud
Alluxio enables compute
Bring your data close to compute.
Make your data local to compute workloads for Spark caching, Presto caching, Hive caching and more.
Make your data accessible.
No matter if it sits on-prem or in the cloud, HDFS or S3, make your files and objects accessible in many different ways.
Make your data as elastic as compute.
Effortlessly orchestrate your data for compute in any cloud, even if data is spread across multiple clouds.
“zero-copy” burst user spotlight: walmart
Why Walmart chose Alluxio’s “Zero-Copy” burst solution:
- No requirement to persist data into the cloud
- Improved query performance and no network hops on recurrent queries
- Lower costs without the need for creating data copies
Featured Use Cases and Deployments
Zero-copy hybrid bursting with no app changes to intelligently make remote data accessible in the public cloud.
Zero-copy bursting across data centers for Presto, Spark, and Hive with no app changes on data stored in HDFS.
Interact with Alluxio in any stack
Pick a compute. Pick a storage. Alluxio just works.
// Using Alluxio as input and output for RDD scala> sc.textFile("alluxio://master:19998/Input") scala> rdd.saveAsTextFile("alluxio://master:19998/Output") // Using Alluxio as input and output for Dataframe scala> df = sqlContext.read.parquet("alluxio://master:19998/Input.parquet") scala> df.write.parquet("alluxio://master:19998/Output.parquet”)
-- Pointing Table location to Alluxio hive> CREATE TABLE u_user ( userid INT, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LOCATION 'alluxio://master:port/table_data';
Create and Query table stored in Alluxio hbase(main):001:0> create 'test', 'cf' hbase(main):002:0> list ‘test'
# Accessing Alluxio after mounting Alluxio service to local file system $ ls /mnt/alluxio_mount $ cat /mnt/alluxio_mount/mydata.txt
powered by alluxio
Alluxio, the developer of open source cloud data orchestration software, today announced the availability of Alluxio Structured Data Service (SDS) featuring a data Catalog Service and Transformation Service, two new major architectural components of its Data Orchestration Platform. Data engineers, architects and developers can now spend less resources storing data and more time delivering data to analytical compute engines.
With this release comes the General Availability (GA) of Alluxio Structured Data Services (SDS), the subsystem of Alluxio responsible for managing and transforming structured data, such as databases, tables, and partitions.
Accessing data to run analytic workloads in Spark across data centers and/or clouds can be challenging. Additionally, network I/O can bottleneck Spark jobs that … Continued