Open source data orchestration for analytics and machine learning in any cloud
COMMON PROBLEMS WE SOLVE
Inconsistent performance on S3
S3 performance for analytic workloads is inconsistent and data egress is expensive.
Limited compute capacity on-prem
Making HDFS or object store data accessible to any compute in any cloud is complex.
Slow on-prem object store
Object storage performance, particularly for metadata operations, is unpredictable.
Alluxio can help
Accelerate analytics with a
multi-tiered distributed caching layer close to compute
Access data in many ways with different APIs including HDFS, S3, POSIX, Java and REST
Scale data elastically on demand with a global namespace across many storage system
Alluxio enables data orchestration for compute in any cloud. It unifies data silos on-premise and across any cloud to give you the data locality, accessibility, and elasticity needed to reduce the complexities associated with orchestrating data for today’s big data and AI/ML workloads.
Scalable to over a billion files in a single cluster, Alluxio’s distributed architecture is built on three core components:
- Alluxio Master, which manages file and object metadata
- Alluxio Worker, which manages the node’s local space, as well as manages file and object blocks and interfaces with the storage systems underneath
- Alluxio Client, which allows analytics and AI/ML applications to interface with Alluxio
Key Technical Features
Support for hyperscale workloads
Supports a billion files and thousands of workers and clients, all with high-availability.
Integrates your compute frameworks like Spark, Presto, Tensorflow, Hive and more out-of-the-box using the HDFS, S3, Java, RESTful, or POSIX-based APIs.
Intelligent data caching and tiering
Automatically utilizes near-compute storage media for optimal data placement based on data topology and workload.
Built-in data policies
Provides highly customizable data policies for persistence, cross storage data migration, and distributed load.
Plug and play under stores
Integrates your under store systems like HDFS, S3, Azure Blob Store, Google Cloud Store and more through a range of interfaces.
Transparent unified namespace for file system and object stores
Mounts multiple storage systems into a single consolidated namespace for both read and write workloads.
Provides data protection on the wire and in the cloud with built-in auditing, role-based access control, LDAP, active directory, and encrypted communications.
Monitoring and management
Provides a user-friendly web interface and command line tools, allowing users to monitor and manage their cluster.
Enterprise high availability with tiered locality
Includes adaptive replication across regions and zones to maximize performance and availability.