office hour Archives | Page 3 of 4

Testing Distributed System at Scale for the Cost of a Large Pizza on AWS

Community Online Office Hour * February 25, 2020

Building distributed systems is no small feat. Software testing is just one of many critical practices that engineers who build these systems need to utilize to ensure the quality and usability of their software. For distributed systems, scaling out testing frameworks to ensure that enterprises who run our in highly distributed environments is a complicated (and expensive task!)

Running Presto with Alluxio on Amazon EMR

Community Online Office Hour * February 12, 2020

Many organizations are leveraging EMR to run big data analytics on public cloud. However, reading and writing data to S3 directly can result in slow and inconsistent performance. Alluxio is a data orchestration layer for the cloud, and in this use case it caches data for S3, ensuring high and predictable performance as well as reduced network traffic.

Community Office Hour: Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio

December 19, 2019

This office hour describes the concept and dataflow with respect to using the stack of Spark/Alluxio in Kubernetes with enhanced data locality even the storage service is outside or remote.

Tags: data locality, hdfs, kubernetes, office hour, spark

Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio

Alluxio Community Office Hour * December 17, 2019

One important performance optimization in Apache Spark is to schedule tasks on nodes with HDFS data nodes locally serving the task input data. However, more users are running Apache Spark natively on Kubernetes where HDFS is not an option. This office hour describes the concept and dataflow with respect to using the stack of Spark/Alluxio in Kubernetes with enhanced data locality even the storage service is outside or remote.

Community Office Hour: Improving Memory Utilization of Spark Jobs Using Alluxio

November 26, 2019

Many Spark users may not be aware of the differences in memory utilization between caching data directly in-memory into the Spark JVM versus storing data off-heap via an in-memory storage service like Alluxio. In this office hour, I will highlight the two approaches with a demo and open up for discussions

Tags: caching, memory, office hour, spark

Improving Memory Utilization of Spark Jobs Using Alluxio

Alluxio Community Office Hour * November 26, 2019

This office hour shares a demo and compares two approaches, caching data directly in-memory into the Spark JVM versus storing data off-heap via an in-memory storage service like Alluxio

Community Office Hour: Accelerating Hive with Alluxio on S3

October 3, 2019

Learn more about Bazaarvoice’s use case leveraging Apache Spark, Hive, and Alluxio on S3. Along with how to set up Hive with Alluxio so that Hive jobs can seamlessly read from/write to S3.

Tags: alluxio engineering, aws s3, compute storage separation, hdfs, hive, office hour, spark

Accelerating Hive with Alluxio on S3

Alluxio Community Office Hour * October 1, 2019

Hear about Bazaarvoice’s use case leveraging Apache Spark, Hive, and Alluxio on S3. And learn how to set up Hive with Alluxio so that Hive jobs can seamlessly read/write to S3.

Tag: office hour