Alluxio meetups, conferences, events and more
The latest Alluxio meetups, webinars, conferences and more
Speed is usually a key factor when analyzing large amounts of data. Alluxio enables analytics applications, such as Apache Spark, to retrieve stored data at memory speeds. DC/OS makes it easy to deploy distributed programs (such as Alluxio and Spark) and containers across large clusters.
In this talk, we will first discuss the development of the DC/OS Alluxio package, which deploys Alluxio on top of DC/OS, and then then demo the deployment a complete analytics stack, both with and without Alluxio, in order to see the benefits Alluxio provides.
With the development of online services and clusters, the HDFS NameNode becomes a performance bottleneck of the HDFS cluster, which is not conducive to the horizontal expansion of the cluster.
The community’s Federation + viewFs solution solves the problem of horizontal scaling of HDFS, but the configuration of this solution is implemented on the client side, which is not conducive to the operation and management of large-scale clusters. Using Alluxio as a unified portal for multiple HDFS clusters, operation and maintenance management is convenient, and distributed cache capability is provided.
In this issue, the Drip Technology Salon and the Alluxio community invited the core engineers of Didi Travel, Alluxio, Kyligence, JD.com, and Tencent to revolve around Alluxio’s position and design philosophy in the big data ecosystem, architectural features, latest developments, and well-known The company’s production-level environmental application exploration and practice, as well as the experience in the use of the process and other topics, and in-depth participants to share.
Using Alluxio, a memory speed virtual distributed storage system, deployed on Mesos enables connecting any compute framework, such as Apache Spark, to storage systems via a unified namespace. Alluxio enables applications to interact with any data at memory speed. Alluxio can eliminate the pains of ETL and data duplication, and enable new workloads across all data. Gene will discuss the architecture of Mesos, Spark and Alluxio to achieve an optimal architecture for enterprises.
Many organizations and deployments use Alluxio with Apache Spark, and some of them scale out to over PB’s of data. Alluxio can enable Spark to be even more effective, in both on-premise deployments and public cloud deployments. Alluxio bridges Spark applications with various storage systems and further accelerates data intensive applications. In this talk, we briefly introduce Alluxio, and present different ways how Alluxio can help Spark jobs. We discuss best practices of using Alluxio with Spark, including RDDs and DataFrames, as well as on-premise deployments and public cloud deployments.
In this talk, we discuss how Alluxio can be deployed and used with a Spark data processing pipeline in the cloud. We show how pipeline stages can share data with Alluxio memory for improved performance benefits, and how Alluxio can improves completion times and reduces performance variability for Spark pipelines in the cloud.
The rise of robotics applications demands new cloud architectures that deliver high throughput and low latency. Bin Fan and Shaoshan Liu explain how PerceptIn designed and implemented a cloud architecture to support video streaming and online object recognition tasks and demonstrate how Alluxio supports these emerging cloud architectures.
Speeding Up Machine Learning in the Cloud with Alluxio
Alluxio has run in JD.com’s production environment on 100 nodes for six months. Mao Baolong, Yiran Wu, and Yupeng Fu explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average. This work has also extended Alluxio and enhanced the syncing between Alluxio and HDFS for consistency.