Many organizations are leveraging EMR to run big data analytics on public cloud. However, reading and writing data to S3 directly can result in slow and inconsistent performance. Alluxio is a data orchestration layer for the cloud, and in this use case it caches data for S3, ensuring high and predictable performance as well as reduced network traffic.
In this talk, we will focus on Alluxio design, its architecture, data flow and metadata flow. We will dive into the choices in its design space and share the experiences when implementing features like data tiering, storage options and cache eviction policies. We will also share our lessons in design, implementation and operation when working to build an open source distributed storage systems with 900 contributors for 5+ years.
Enterprises are increasingly looking towards object stores to power their big data & machine learning workloads in a cost-effective way. The combination of SwiftStack and Alluxio together, enables users to seamlessly move towards a disaggregated architecture.
Enterprises typically store large amounts of data in existing storage systems, which are often separate from big data analytics systems. Therefore, importing petabytes of data into a big data analytics system takes a long time with large overheads and high costs. Even worse, transferring large amounts of data results in data silos and unnecessary duplication, which creates serious data management problems.
Alluxio is the first memory-speed virtual distributed storage system in the world. It unifies the interface between the various computing frameworks and under storages. Data access can be several magnitude faster because of Alluxio’s memory-centric architecture. In addition, Alluxio’s tiered storage, unified namespace, flexible file API, web UI, and command-line tools increase the usability in different application scenarios.
Qunar has been running Alluxio in production for over a year. Lei Xu explores how stream processing on Alluxio has led to a 16x performance improvement on average and 300x improvement at service peak time on workloads at Qunar.
In this presentation, William Callaghan will focus on the challenges faced and lessons learned in building a human-in-the loop cyber threat analytics pipeline. They will discuss the topic of analytics in cybersecurity and highlight the use of technologies such as Spark Streaming/SQL, Cassandra, Kafka and Alluxio in creating an analytics architecture with missions-critical response times.
In this issue, the Drip Technology Salon and the Alluxio community invited the core engineers of Didi Travel, Alluxio, Kyligence, JD.com, and Tencent to revolve around Alluxio’s position and design philosophy in the big data ecosystem, architectural features, latest developments, and well-known The company’s production-level environmental application exploration and practice, as well as the experience in the use of the process and other topics, and in-depth participants to share.
The rise of robotics applications demands new cloud architectures that deliver high throughput and low latency. Bin Fan and Shaoshan Liu explain how PerceptIn designed and implemented a cloud architecture to support video streaming and online object recognition tasks and demonstrate how Alluxio supports these emerging cloud architectures.
This presentation focuses on how Alluxio helps the big data analytics stack to be cloud-native. The trending Cloud object storage systems provide more cost-effective and scalable storage solutions but also different semantics and performance implications compared to HDFS. Applications like Spark or Presto will not benefit from the node-level locality or cross-job caching when retrieving data from the cloud object storage. Deploying Alluxio to access cloud solves these problems because data will be retrieved and cached in Alluxio instead of the underlying cloud or object storage repeatedly.