In this talk, we briefly introduce Alluxio, present several ways how Alluxio can help Spark be more effective, show benchmark results with Spark RDDs and DataFrames, and describe production deployments both Alluxio and Spark working together.
This is an excerpt from the Accelerating Data Analytics on Ceph Object Storage with Alluxio whitepaper.
As the volume of data collected by enterprises has grown, there is a continual need to find efficient storage solutions. Owing to its simplicity, scalability and cost-efficiency object storage, including Ceph, has increasingly become a popular alternative to traditional file systems. In most cases the object storage system, on-premise or in the cloud, is decoupled from compute nodes where analytics is run. There are several benefits of this separation.
Alluxio is used as a lightweight data access layer on the compute nodes to bring performance up to memory speeds without requiring a long running cluster. This talk will summarize why Alluxio’s architecture makes it a perfect fit for completing the on-demand cluster puzzle.
Haoyuan Li offers an overview of Alluxio (formerly Tachyon), a memory-speed virtual distributed storage system.
Alluxio Bay Area Meetup hosted at Samsung. Talks to include presentations on Unifying APIs, Accelerating ML, & Enabling Cloud Architectures.
Alluxio presents at Strata + Hadoop World Beijing 2016 with two talks: A keynote from founder Haoyuan Li and Alluxio’s latest use cases.
This is an excerpt from the Accelerating On-Demand Data Analytics with Alluxio whitepaper, which includes a detailed implementation guide in addition to this high level overview.
In the Big Data world, it is often the case that only a subset of the total data is relevant for answering the question at hand. As a result, the total cost of ownership for long running clusters for analytics is high while utilization is low, especially when adopting an architecture of co-locating compute and storage.
Strata+Hadoop World 2016 – Baidu deployed Alluxio to accelerate its big data analytics workload. Bin Fan and Haojun Wang explain why Baidu chose Alluxio, as well as the details of how they achieved a 30x speedup with Alluxio in their production environment with hundreds of machines. Based on the success of the big data analytics engine, Baidu is currently expanding the Alluxio and Spark infrastructure to accelerate other applications, such as machine learning.
Tachyon presents two talks at Strata + Hadoop World Singapore: Interactive data analytics with Spark on Tachyon in Baidu, and Make Tachyon ready for next-gen data center platforms with NVM