Welcome to Alluxio.io!

Notice anything new about our websites? That’s right – we are super excited to launch our new website – Alluxio.io!
As we continue our focus on our open source community, one important item on our mind was to rebuild our website to provide better user experience for our community. To that end, you’ll see lots of changes in the Alluxio web experience.

Alluxio Developer Tip: Why am I seeing the error “User yarn is not configured for any impersonation. impersonationUser: foo?”

Impersonation is simply the ability for one user to act on behalf of another user. For example, say user ‘yarn’ has the credentials to connect to a service, but user ‘foo’ does not. Therefore, user ‘foo’ would never be able to access the service. However, user ‘yarn’ can access the service and impersonate (act on behalf of) user ‘foo’, allowing access to user ‘foo’. Therefore, impersonation enables one user to access a service on behalf of another user.
The impersonation feature defines how users can act on behalf of other users. Therefore, it is important to know who the users are.

Top 10 Tips for Making the Spark + Alluxio Stack Blazing Fast

The Apache Spark + Alluxio stack is getting quite popular particularly for the unification of data access across S3 and HDFS. In addition, compute and storage are increasingly being separated causing larger latencies for queries. Alluxio is leveraged as compute-side virtual storage to improve performance. But to get the best performance, like any technology stack, you need to follow the best practices. This article provides the top 10 tips for performance tuning for real-world workloads when running Spark on Alluxio with data locality giving the most bang for the buck.

New York Meetup Recap – September 2018

we held our first New York City Alluxio Meetup! Work-Bench was very generous for hosting the Alluxio meetup in Manhattan. This was the first US Alluxio meetup outside of the Bay Area, so it was extremely exciting to get to meet Alluxio enthusiasts on the east coast!
The meetup focused on users of Alluxio with different applications from Hive and Presto. As an introduction, Haoyuan Li (creator and founder of Alluxio) and Bin Fan (founding engineer of Alluxio) gave an overview of Alluxio and the new features and enhancements of the new v1.8.0 release.

Effective caching for Spark RDDs with Alluxio

Recently, Qunar deployed Alluxio with Spark in production and found that Alluxio enables Spark streaming jobs to run 15x to 300x faster. In their case study, they described how Alluxio improved their system architecture, and mentioned that some existing Spark jobs would slow down or would never finish because they would run out of memory. After using Alluxio, those jobs were able to finish, because the data could be stored in Alluxio, instead of within Spark.
In this blog, we show by saving RDDs in Alluxio, Alluxio can keep larger data sets in-memory for faster Spark applications, as well as enable sharing of RDDs across separate Spark applications.

New Whitepaper: Structured Big Data Federation

Alluxio helps organizations handle their big data by providing a unified view of all of the data in your enterprise – on premise, in the cloud, or hybrid. Applications access data using a standard interface to a global virtual namespace. Alluxio also employs a memory-centric architecture to enable data access at memory speed. With the combined unification and performance benefits, Alluxio can effectively provide big data federation for organizations by acting as a virtual data lake.

Accelerating Cloud Pipelines with Alluxio and Fast Durable Writes

Using Alluxio, data can be shared between pipeline stages at memory speed. By reading and writing data in Alluxio, the data can stay in memory for the next stage of the pipeline, and this can greatly increase the performance. Alluxio Enterprise Edition (AEE) introduces Fast Durable Writes, a feature which enables low latency and fault-tolerant writes. In this article, we describe the Fast Durable Writes feature, and explore how Alluxio can be deployed and used with a data pipeline.

Announcing the Release of Alluxio Enterprise Edition and Community Edition v1.7.0

We are excited to announce the release of Alluxio Enterprise Edition (AEE) and Community Edition (ACE) v1.7.0. This release brings enhanced caching policies, further ecosystem integrations, and significant usability improvements. One highlight is the Alluxio FUSE API which provides users with the ability to interact with Alluxio through a local filesystem mount. Alluxio FUSE is particularly useful for integrating with deep learning frameworks such as Tensorflow.

What’s new in Alluxio 1.1 Release

Alluxio 1.1 release includes many great features and improvements from the community. Alluxio would not be what it is today without the growing open source community, and we would like to thank everyone involved in this project.
This release brings many new features and improvements, and in this post, we will highlight a few of the developments: performance improvements, access control features, and usability and integration improvements.