Unified Big Data Analytics: Any Stack, Any Cloud
January 22, 2019
By 
No items found.

The big data stack has heavily evolved over the past few years with an explosion of data frameworks starting with MapReduce and expanding to Apache Spark, Presto, Hive on the structured data side as well as TensorFlow, Caffe on AI and ML side. In addition, the approach to managing and storing data has evolved as well starting from HDFS and now moving to newer approaches like object stores. With all the possible combinations of accessing data, data engineering has become increasingly complex, particularly in the hybrid and multi-cloud environments. Users are increasingly adding a new layer to their data stack that unifies files and objects and provides data locality across separated compute and storage environments.

This is the fundamental problem Alluxio solves. Alluxio is an open-source virtual distributed file system that provides a unified data access layer for hybrid and multi-cloud deployments. Alluxio enables distributed compute engines like Spark, Presto or Machine Learning frameworks like TensorFlow to transparently access different persistent storage systems (including HDFS, S3, Azure and etc) while actively leveraging in-memory cache to accelerate data access. Developed originally from UC Berkeley AMPLab as research project “Tachyon”, Alluxio has more than 900 contributors and is used by over 100 companies worldwide with the largest production deployment over 1000 nodes.

This presentation focuses on how Alluxio helps the big data analytics stack to be cloud-native. The trending Cloud object storage systems provide more cost-effective and scalable storage solutions but also different semantics and performance implications compared to HDFS. Applications like Spark or Presto will not benefit from the node-level locality or cross-job caching when retrieving data from the cloud object storage. Deploying Alluxio to access cloud solves these problems because data will be retrieved and cached in Alluxio instead of the underlying cloud or object storage repeatedly.

Complete the form below to access the full overview:

Presentations

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer