Alluxio - Blog

Effective Analytical Pipelines on AWS Using EMR Alluxio and S3

This article describes my lessons from a previous project which moved a data pipeline originally running on a Hadoop cluster managed by my team, to AWS using EMR and S3. The goal was to leverage the elasticity of EMR to offload the operational work, as well as make S3 a data lake where different teams can easily share data across projects.

Building a Large-scale Interactive SQL Query Engine using Presto and Alluxio in JD.com

This article describes how JD built this interactive OLAP platform combining two open-source technologies: Presto and Alluxio.

Implementing a Secure Plug-and-play Distributed File System Service Using Alluxio in Baidu

In this article, you will learn how to incorporate Alluxio to implement a unified distributed file system service as well as how to add extensions on top of Alluxio including customized authentication schemes and UDF (user-defined functions) on Alluxio files.

Four Different Ways to Write to Alluxio

Alluxio is a new layer on top of under storage systems that can not only improve raw I/O performance but also enables applications flexible options to read, write and manage files. This article focuses on describing different ways to write files to Alluxio, realizing the tradeoffs in performance, consistency, and also the level of fault tolerance compared to HDFS.

Creating Grafana Dashboards to Visualize Alluxio Metrics

Monitoring metrics is highly important to operate distributed systems in production. Alluxio collects metrics using the Codahale Metrics Library on I/O throughput, RPC throughput, and resource usage. Alluxio metrics are shown in its webUI, but are also available through a REST endpoint or exportable to several third-party sinks in a time-series manner (see docs).

Accelerating Write-intensive Data Workloads on AWS S3

Alluxio is an open-source data orchestration system widely used to speed up data-intensive workloads in the cloud. Alluxio v2.0 introduced Replicated Async Write to allow users to complete writes to Alluxio file system and return quickly with high application performance, while still providing users with peace of mind that data will be persisted to the chosen under storage like S3 in the background.

Recap AWS Summit New York

Alluxio is a proud sponsor and exhibitor at the AWS Summit in New York. If you weren't able to attend, here are the highlights

The Practice of Alluxio in Ctrip Real-Time Computing Platform

Today, real-time computation platform is becoming increasingly important in many organizations. In this article, we will describe how ctrip.com applies Alluxio to accelerate the Spark SQL real-time jobs and maintain the jobs’ consistency during the downtime of our internal data lake (HDFS). In addition, we leverage Alluxio as a caching layer to dramatically reduce the workload pressure on our HDFS NameNode.

2.0 is here! Embrace silos orchestrate data accelerate innovation

Here in New York, at the AWS Summit, we are super excited to announce that Alluxio 2.0 is here, our most major release since the Alluxio launch. A couple months ago, we released 2.0 Preview - which included some of the capabilities, but 2.0 now includes even more, to continue building on to our data orchestration approach for the cloud.

Getting Started with the Alluxio+Presto Sandbox

Orchestrating Data for the Cloud World with Alluxio 2.0

Today, I’m thrilled to announce the GA of Alluxio 2.0.0, Alluxio’s biggest release to date (see our Release Notes & Release Blog) with over 900 commits.

Turn cloud storage or HDFS into your local file system for faster AI model training with TensorFlow

This article aims to provide a different approach to help connect and make distributed files systems like HDFS or cloud storage systems look like a local file system to data processing frameworks: the Alluxio POSIX API. To explain the approach better, we used the TensorFlow + Alluxio + AWS S3 stack as an example.

Your selections don't match any items.

Blog

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer