This article introduces the design and implementation of metadata storage in Alluxio Master, either on heap and off heap (based on RocksDB).
Product School
COMMUNITY VIRTUAL EVENT
Learn how Alluxio uses Apache Ranger’s centralized access policies to control access to virtual paths in the Alluxio virtual file system and enforce existing access policies for the HDFS under stores.
Check out the talks from our virtual community event, Alluxio Day XII, featuring presenters from Websec, Shopee, and Alluxio.
Alluxio 2.8 expands data access & security for data-driven applications in heterogeneous environments – Enhanced S3 API, data encryption & policy-driven data management, and more.
We’re hiring! Join our team and build the future of data orchestration. See open positions >
Alluxio enables compute

Data locality
Bring your data close to compute.
Make your data local to compute workloads for Spark caching, Presto caching, Hive caching and more.

Data Accessibility
Make your data accessible.
No matter if it sits on-prem or in the cloud, HDFS or S3, make your files and objects accessible in many different ways.

Data On-Demand
Make your data as elastic as compute.
Effortlessly orchestrate your data for compute in any cloud, even if data is spread across multiple clouds.
“zero-copy” burst user spotlight: walmart

Why Walmart chose Alluxio’s “Zero-Copy” burst solution:
- No requirement to persist data into the cloud
- Improved query performance and no network hops on recurrent queries
- Lower costs without the need for creating data copies
See more on how Alluxio powers Walmart’s “zero-copy” burst solution in their presentation >
Featured Use Cases and Deployments
Managing data copies/app changes when bursting compute to cloud?
Zero-copy hybrid bursting with no app changes to intelligently make remote data accessible in the public cloud.
Expanding compute capacity across geo-distributed data centers?

Zero-copy bursting across data centers for Presto, Spark, and Hive with no app changes on data stored in HDFS.
Interact with Alluxio in any stack
Pick a compute. Pick a storage. Alluxio just works.
-- Pointing Table location to Alluxio
CREATE SCHEMA hive.web
WITH (location = 'alluxio://master:port/my-table/‘)
// Using Alluxio as input and output for RDD
scala> sc.textFile("alluxio://master:19998/Input")
scala> rdd.saveAsTextFile("alluxio://master:19998/Output")
// Using Alluxio as input and output for Dataframe
scala> df = sqlContext.read.parquet("alluxio://master:19998/Input.parquet")
scala> df.write.parquet("alluxio://master:19998/Output.parquet”)
-- Pointing Table location to Alluxio
hive> CREATE TABLE u_user (
userid INT,
age INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LOCATION 'alluxio://master:port/table_data';
Create and Query table stored in Alluxio
hbase(main):001:0> create 'test', 'cf'
hbase(main):002:0> list ‘test'
# Accessing Alluxio after mounting Alluxio service to local file system
$ ls /mnt/alluxio_mount
$ cat /mnt/alluxio_mount/mydata.txt
ALLUXIO
$ ./bin/alluxio fs mount \
--option aws.accessKeyId=<AWS_ACCESS_KEY_ID> \
--option aws.secretKey=<AWS_SECRET_KEY_ID> \
alluxio://master:port/s3 s3a://<S3_BUCKET>/<S3_DIRECTORY>
$ ./bin/alluxio fs mount \
alluxio://master:port/hdfs hdfs://namenode:port/dir/
$ ./bin/alluxio fs mount \
--option
fs.azure.account.key.<AZURE_ACCOUNT>.blob.core.windows.net=<AZURE_ACCESS_KEY> \
alluxio://master:port/azure
wasb://<AZURE_CONTAINER>@<AZURE_ACCOUNT>.blob.core.windows.net/<AZURE_DIRECTORY>/
$ ./bin/alluxio fs mount \
--option fs.gcs.accessKeyId=<GCS_ACCESS_KEY_ID> \
--option fs.gcs.secretAccessKey=<GCS_SECRET_ACCESS_KEY> \
alluxio://master:port/gcs gs://<GCS_BUCKET>/<GCS_DIRECTORY>
$ ./bin/alluxio fs mount \
--option aws.accessKeyId=<AWS_ACCESS_KEY_ID> \
--option aws.secretKey=<AWS_SECRET_KEY_ID> \
--option alluxio.underfs.s3.endpoint=http://<rgw-hostname>:<rgw-port> \
--option alluxio.underfs.s3.disable.dns.buckets=true \
alluxio://master:port/ceph s3a://<S3_BUCKET>/<S3_DIRECTORY>
$ ./bin/alluxio fs mount alluxio://master:port/nfs /mnt/nfs
powered by alluxio












What’s Happening
Alluxio, the developer of the open source data orchestration platform for data driven workloads such as large-scale analytics and AI/ML, announced the immediate availability of version 2.8 of its Data Orchestration Platform.
The Alluxio 2.8 version focuses on the S3 API, enterprise-grade security, scalability and observability in data migration. Enhanced S3 API makes managing Alluxio easier than ever. Features such as encryption at rest and policy-driven data management further improve Alluxio’s functionality to support enterprise customers.
Today, many organizations are running a multitude of data-driven applications and data platforms that span multiple geographic regions and across heterogeneous environments – public, … Continued
Raft is an algorithm for state machine replication as a way to ensure high availability (HA) and fault tolerance. This blog shares how Alluxio has moved to a Zookeeper-less, built-in Raft-based journal system as a HA implementation.
By bringing Alluxio together with Spark, you can modernize your data platform in a scalable, agile, and cost-effective way. In this post, we provide … Continued