Blog

Alluxio Blog

Effective Analytical Pipelines on AWS Using EMR, Alluxio, and S3

This article describes my lessons from a previous project which moved a data pipeline originally running on a Hadoop cluster managed by my team, to AWS using EMR and S3. The goal was to leverage the elasticity of EMR to offload the operational work, as well as make S3 a data lake where different teams can easily share data across projects.

Running Alluxio On HashiCorp Nomad

I recently worked on a PoC evaluating Nomad for a client. Since there were certain constraints limiting what was possible on the client environment, I put together something “quick” on my personal workstation to see what was required for Alluxio to play nice with Nomad.

Four Different Ways to Write to Alluxio

Alluxio is a new layer on top of under storage systems that can not only improve raw I/O performance but also enables applications flexible options to read, write and manage files. This article focuses on describing different ways to write files to Alluxio, realizing the tradeoffs in performance, consistency, and also the level of fault tolerance compared to HDFS.

Creating Grafana Dashboards to Visualize Alluxio Metrics

Grafana, a comprehensive metrics visualization software, ties into this process by pulling the metrics that systems like Alluxio collect through a sink and visualizes them in a more helpful fashion. This guide will cover how to set up Grafana and Graphite, a supported sink for Alluxio that will put metrics in a time-series database, along with exploring some of the possibilities that the combination offers.