Open source data orchestration for analytics and machine learning in any cloud
Data orchestration challenges for today’s data engineer
Analytics or ML in the cloud too slow?
S3 performance for analytics and ML workloads is inconsistent and data egress is expensive.
Hybrid cloud for data too hard to implement?
Making HDFS or object store data accessible to any compute in any cloud is complex.
Want to use object stores for big data workloads?
Object storage performance, particularly for metadata operations, is unpredictable.
Multi-cloud data access too complex?
Orchestrating data from multiple public clouds for big data workloads is complex and expensive.
Alluxio can help
Accelerate big data frameworks on the public cloud
Get in-memory data access for Spark and Presto for any cloud – AWS, Google Cloud Platform, or Microsoft Azure.
Run big data workloads in hybrid cloud environments
No matter where it sits – on-prem, in the cloud, or in HDFS – your data is accessible in many different ways.
Bring big data and AI workloads to any object store
Accelerate your Spark, Presto, and Tensorflow workloads for any object store, in any cloud.
Alluxio enables data orchestration for compute in any cloud. It unifies data silos on-premise and across any cloud to give you the data locality, accessibility, and elasticity needed to reduce the complexities associated with orchestrating data for today’s big data and AI/ML workloads.
Scalable to over a billion files in a single cluster, Alluxio’s distributed architecture is built on three core components:
- Alluxio Master, which manages file and object metadata
- Alluxio Worker, which manages the node’s local space, as well as manages file and object blocks and interfaces with the storage systems underneath
- Alluxio Client, which allows analytics and AI/ML applications to interface with Alluxio
Getting Started Tutorials For Alluxio And Presto
See how Alluxio speeds up Presto queries, even on remote data!
Key Technical Features
Support for hyperscale workloads
Supports a billion files and thousands of workers and clients, all with high-availability.
Integrates your compute frameworks like Spark, Presto, Tensorflow, Hive and more out-of-the-box using the HDFS, S3, Java, RESTful, or POSIX-based APIs.
Intelligent data caching and tiering
Automatically utilizes near-compute storage media for optimal data placement based on data topology and workload.
Built-in data policies
Provides highly customizable data policies for persistence, cross storage data migration, and distributed load.
Plug and play under stores
Integrates your under store systems like HDFS, S3, Azure Blob Store, Google Cloud Store and more through a range of interfaces.
Transparent unified namespace for file system and object stores
Mounts multiple storage systems into a single consolidated namespace for both read and write workloads.
Provides data protection on the wire and in the cloud with built-in auditing, role-based access control, LDAP, active directory, and encrypted communications.
Monitoring and management
Provides a user-friendly web interface and command line tools, allowing users to monitor and manage their cluster.
Enterprise high availability with tiered locality
Includes adaptive replication across regions and zones to maximize performance and availability.