Calvin Jia introduces Alluxio, explain how Alluxio can help Spark be more effective, show benchmark results with Spark RDDs and DataFrames, and describe production deployments with both Alluxio and Spark working together.
Haoyuan Li offers an overview of Alluxio (formerly Tachyon), a memory-speed virtual distributed storage system.
Alluxio Bay Area Meetup hosted at Samsung. Talks to include presentations on Unifying APIs, Accelerating ML, & Enabling Cloud Architectures.
Alluxio is the world’s first memory-speed virtual distributed storage system that bridges applications and underlying storage systems, providing unified data access orders of magnitudes faster than existing solutions. The Hadoop Distributed File System (HDFS) is a distributed file system for storing large volumes of data. HDFS popularized the paradigm of bringing computation to data and the co-located compute and storage architecture.
In this blog, we highlight two key benefits Alluxio brings to a compute cluster co-located with HDFS.
Big Data Day LA 2016 – In the past year, the Alluxio project experienced a tremendous improvement in performance and scalability and was extended with key new features including tiered storage, transparent naming, and unified namespace. At the same time, the Alluxio ecosystem has expanded to include support for more under storage systems and computation frameworks.
Calvin Jia and Jiri Simsa explain how the current Alluxio tiered storage can be easily configured to use memory, SSDs, and hard drives in different tiers.
UC Berkeley Amplab 2013 – Tachyon is a distributed file system enabling reliable data sharing at memory speed across cluster computing frameworks.