Alluxio 1.4.0 has been released with a large number of new features and improvements. This blog highlights some stand out aspects of the Alluxio 1.4.0 open source release: Improved Alluxio Under Storage API, Native File System REST Interface, Packet Streaming
Founding Engineer, Alluxio
Alluxio is the world’s first memory-speed virtual distributed storage system that bridges applications and underlying storage systems, providing unified data access orders of magnitudes faster than existing solutions. The Hadoop Distributed File System (HDFS) is a distributed file system for storing large volumes of data. HDFS popularized the paradigm of bringing computation to data and the co-located compute and storage architecture.
In this blog, we highlight two key benefits Alluxio brings to a compute cluster co-located with HDFS.
This is an excerpt from the Accelerating On-Demand Data Analytics with Alluxio whitepaper, which includes a detailed implementation guide in addition to this high level overview.
In the Big Data world, it is often the case that only a subset of the total data is relevant for answering the question at hand. As a result, the total cost of ownership for long running clusters for analytics is high while utilization is low, especially when adopting an architecture of co-locating compute and storage.
Alluxio provides Spark with a reliable data sharing layer, enabling Spark to excel at performing application logic while Alluxio handles storage.