In this talk, we discuss how Alluxio can be deployed and used with a Spark data processing pipeline in the cloud. We show how pipeline stages can share data with Alluxio memory for improved performance benefits, and how Alluxio can improves completion times and reduces performance variability for Spark pipelines in the cloud.
Tag: on-prem object storage
The rise of robotics applications demands new cloud architectures that deliver high throughput and low latency. Bin Fan and Shaoshan Liu explain how PerceptIn designed and implemented a cloud architecture to support video streaming and online object recognition tasks and demonstrate how Alluxio supports these emerging cloud architectures.
Speeding Up Machine Learning in the Cloud with Alluxio
This presentation focuses on how Alluxio helps the big data analytics stack to be cloud-native. The trending Cloud object storage systems provide more cost-effective and scalable storage solutions but also different semantics and performance implications compared to HDFS. Applications like Spark or Presto will not benefit from the node-level locality or cross-job caching when retrieving data from the cloud object storage. Deploying Alluxio to access cloud solves these problems because data will be retrieved and cached in Alluxio instead of the underlying cloud or object storage repeatedly.
Cloud object storage systems provide different semantics and performance implications compared to HDFS. Applications like Presto cannot benefit from the node-level locality or cross-job caching when reading from the cloud. Deploying Alluxio with Presto to access cloud solves these problems because data will be retrieved and cached in Alluxio instead of the underlying cloud or object storage repeatedly. Bin will present the architecture to combine Presto with Alluxio with use cases from major internet companies like JD.com and NetEase.com, and their lessons learned to operate this architecture at scale.
Learn more about use cases with Alluxio leveraged in MOMO, JD.com, and TalkingData.
Joint webinar – Mesosphere DC/OS is a production-proven platform that powers both modern app components – containers and data services – so businesses can accelerate time to market with confidence, and save. We have seen tremendous interest from users to be able to run Alluxio via DC/OS.
Ceph Days 2017 – Adit Madan presents on enabling fast big data analytics on Ceph with Alluxio.
This is an excerpt from the Accelerating Data Analytics on Ceph Object Storage with Alluxio whitepaper.
As the volume of data collected by enterprises has grown, there is a continual need to find efficient storage solutions. Owing to its simplicity, scalability and cost-efficiency object storage, including Ceph, has increasingly become a popular alternative to traditional file systems. In most cases the object storage system, on-premise or in the cloud, is decoupled from compute nodes where analytics is run. There are several benefits of this separation.