Accelerating Hive with Alluxio on S3

Alluxio Community Office Hour *

Hear about Bazaarvoice’s use case leveraging Apache Spark, Hive, and Alluxio on S3. And learn how to set up Hive with Alluxio so that Hive jobs can seamlessly read/write to S3.

Building Fast SQL Analytics with Presto, Alluxio, and S3

Alluxio Community Office Hour *

Learn how to set up Presto with Alluxio such that Presto jobs can seamlessly read from and write to S3.
Compare the performance between Presto on S3 with Presto and Alluxio on S3.

Running Spark & Alluxio in Kubernetes

Alluxio Community Office Hour *

The latest advances in container orchestration by Kubernetes bring cost savings and flexibility to compute workloads in public or hybrid cloud environments. On the other hand, it introduces new challenges such as how to move data to compute efficiently, how to unify data across multiple or remote clouds, how to co-locate data with compute and many more. Alluxio approaches these problems in a new way. It helps elastic compute workloads realize the true benefits of the cloud, while bringing data locality and data accessibility to workloads orchestrated by Kubernetes

Running Presto with Alluxio on Amazon EMR

Alluxio Community Office Hour - May *

Many organizations are leveraging EMR to run big data analytics on public cloud. However, reading and writing data to S3 directly can result in slow and inconsistent performance. Alluxio is a data orchestration layer for the cloud, and in this use case it caches data for S3, ensuring high and predictable performance as well as reduced network traffic.

Alluxio for Hybrid Cloud | HDFS and AWS S3 demo

Alluxio Community Office Hour *

Alluxio can help data scientists and data engineers interact with different storage systems in a hybrid cloud environment. Using Alluxio as a data access layer for Big Data and Machine Learning applications, data processing pipelines can improve efficiency without explicit data ETL steps and the resulting data duplication across storage systems.

Getting Started with Alluxio Open Source

Alluxio Community Office Hour *

Join us for our first monthly office hour. This month we will focus on:
Installing Alluxio using Docker and Homebrew on your local Linux/Mac machine and accessing data from S3 and HDFS, Understanding Alluxio’s architecture in the data ecosystem, Open Session for discussion on any topics such as solving the separation of compute and storage problem, unifying multiple storage systems, and more.

Running Apache Spark with Alluxio for Fast Data Analytics

Alluxio Community Office Hour *

In this Office Hour you’ll learn about:
Using Alluxio as the input/output for Spark applications, Saving and loading Spark RDDs and Dataframes with Alluxio, Open Session for discussion on any topics such as solving the separation of compute and storage problem, unifying multiple storage systems, and more

Running Machine Learning Workloads with Tensorflow + Alluxio + AWS S3

Alluxio Community Office Hour *

The Alluxio POSIX API enables data engineers to access any distributed file system or cloud storage as if accessing a local file system with an added performance improvement. This reduces the effort and complexity for data engineers to run their machine learning or legacy workloads on new data storage without data migration or data duplication.