ALLUXIO for data architects

How to solve common data architecture problems in the cloud

PROBLEM

Alluxio solution


  • Out of compute capacity – jobs take up half the cluster
  • Cannot burst analytics jobs to cloud without also copying data from on-prem
  • Alluxio is the only platform that can cache data for compute nodes in the cloud to easily increase capacity for overloaded on-prem HDFS clusters.
  • Overcome challenges of copying data to the cloud
  • Avoid more capital spend in data centers

  • Sharing data across app frameworks
  • Alluxio is the only data platform that easily shares data across compute nodes and frameworks.
  • Policy based pinning of datasets

  • Problems running with remote/multiple storage systems
  • Alluxio is the only data platform that mounts data storage silos and provides data locality to your compute.
  • Abstraction allows compute to run on any storage
  • Abstraction to easily add new storage technologies and avoid lock-in

  • High costs for running HDFS in the cloud for temporary storage
  • HDFS must be long running on static machines, must have 3 copies, and cannot access both HDFS and S3 simultaneously.
  • Alluxio replaces HDFS for temporary storage. S3 remains as the backing store.

Common Use cases: how data architects are using alluxio today

Hybrid Cloud Analytics

Simplify Hadoop for the hybrid cloud by making on-prem HDFS accessible to any compute in the cloud.

Cloud Analytics Caching

Get in-memory data access for Spark, Presto, or any analytics framework on AWS, Google Cloud Platform, or Microsoft Azure.

Watch the on-demand tech talk
Accelerate and Scale Big Data Analytics and ML Pipelines with Disaggregated Compute and Storage

Read the whitepaper
Hybrid Cloud Analytics: Scaling analytics workloads on on-prem to public clouds with Alluxio