A10 Big Data Application Summit
Alluxio meetups, conferences, events and more
The latest Alluxio meetups, webinars, conferences and more
Speeding Up Machine Learning in the Cloud with Alluxio
Alluxio has run in JD.com’s production environment on 100 nodes for six months. Mao Baolong, Yiran Wu, and Yupeng Fu explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average. This work has also extended Alluxio and enhanced the syncing between Alluxio and HDFS for consistency.
With the exponentially-growing deluge of data today, data lakes are pooling everywhere. So, how can you harness them for critical insights and is there an easy way to tap into the multitude of different storage systems that they”re stored in? Enter Alluxio, an agnostic and fast storage abstraction, which, when paired with deep learning and GPU-accelerated analytics yields a quick and easy way to harness the data. Join NVIDIA”s Applied Solutions Engineering (ASE) team as they walk through how to use Alluxio for fun and profit.
The rise of robotics applications demands new cloud architectures that deliver high throughput and low latency. Bin Fan and Shaoshan Liu explain how PerceptIn designed and implemented a cloud architecture to support video streaming and online object recognition tasks and demonstrate how Alluxio supports these emerging cloud architectures.
With the development of online services and clusters, the HDFS NameNode becomes a performance bottleneck of the HDFS cluster, which is not conducive to the horizontal expansion of the cluster.
The community’s Federation + viewFs solution solves the problem of horizontal scaling of HDFS, but the configuration of this solution is implemented on the client side, which is not conducive to the operation and management of large-scale clusters. Using Alluxio as a unified portal for multiple HDFS clusters, operation and maintenance management is convenient, and distributed cache capability is provided.
In this issue, the Drip Technology Salon and the Alluxio community invited the core engineers of Didi Travel, Alluxio, Kyligence, JD.com, and Tencent to revolve around Alluxio’s position and design philosophy in the big data ecosystem, architectural features, latest developments, and well-known The company’s production-level environmental application exploration and practice, as well as the experience in the use of the process and other topics, and in-depth participants to share.
Using Alluxio, a memory speed virtual distributed storage system, deployed on Mesos enables connecting any compute framework, such as Apache Spark, to storage systems via a unified namespace. Alluxio enables applications to interact with any data at memory speed. Alluxio can eliminate the pains of ETL and data duplication, and enable new workloads across all data. Gene will discuss the architecture of Mesos, Spark and Alluxio to achieve an optimal architecture for enterprises.
Many organizations and deployments use Alluxio with Apache Spark, and some of them scale out to over PB’s of data. Alluxio can enable Spark to be even more effective, in both on-premise deployments and public cloud deployments. Alluxio bridges Spark applications with various storage systems and further accelerates data intensive applications. In this talk, we briefly introduce Alluxio, and present different ways how Alluxio can help Spark jobs. We discuss best practices of using Alluxio with Spark, including RDDs and DataFrames, as well as on-premise deployments and public cloud deployments.