Events

Enabling Big Data and AI workloads on the Object Store at DBS Bank

Strata Data Conference New York *

In this presentation, Vitaliy Baklikov from DBS Bank and Dipti Borkar from Alluxio will share how DBS Bank has built a modern big data analytics stack leveraging an object store as persistent storage even for data-intensive workloads and how it uses Alluxio to orchestrate data locality and data access for Spark workloads. In addition, deploying Alluxio to access data, solves many challenges that cloud deployments bring with separated compute and storage.

Webinar: Accelerate Presto & Spark workloads on S3

Alluxio Webinar *

While running analytics workloads using EMR on S3 is a common deployment today, many organizations face issues in performance and consistency. EMR can be bottlenecked when reading large amounts of data from S3, and sharing data across multiple stages of a pipeline can be difficult as S3 is eventually consistent for read-your-own-write scenarios.
A simple solution is to run Spark on Alluxio as a distributed file system cache for S3. Alluxio stores data in memory close to Spark, providing high performance, in addition to providing data accessibility and abstraction for deployments in both public and hybrid clouds.

Running Presto with Alluxio on Amazon EMR

Alluxio Community Office Hour - May *

Many organizations are leveraging EMR to run big data analytics on public cloud. However, reading and writing data to S3 directly can result in slow and inconsistent performance. Alluxio is a data orchestration layer for the cloud, and in this use case it caches data for S3, ensuring high and predictable performance as well as reduced network traffic.

Evolution of big data stacks under computational and storage separation architecture

Shanghai *

A new generation of open source big data, represented by Alluxio, born at the University of California at Berkeley, looks at this issue. Different from systems such as designing storage tight coupling to achieve low-cost reliable storage HDFS, by providing a virtual data storage layer defined and implemented by software for data applications, abstracting and integrating cloudy, hybrid cloud, multi-data center and other environments The underlying files and objects, and through intelligent workload analysis and data management, make data close to computing and provide data locality, big data and machine learning applications can be achieved with the same performance and lower cost.

Alluxio for Hybrid Cloud | HDFS and AWS S3 demo

Alluxio Community Office Hour *

Alluxio can help data scientists and data engineers interact with different storage systems in a hybrid cloud environment. Using Alluxio as a data access layer for Big Data and Machine Learning applications, data processing pipelines can improve efficiency without explicit data ETL steps and the resulting data duplication across storage systems.

New features and performance optimization of open source big data storage system Alluxio

Chengdu Meetup *

This technical salon will focus on big data, storage, database and Alluxio application practice, and invite Tencent technical experts and industry technical experts to share the basic principles of Alluxio system, big data system architecture, database application operation and maintenance, AI computer. Themes such as visual technology and landing practice bring rich practical content and experience exchange.

Building a Distributed Data Access Layer for Analytics on Any Cloud

Data Council SF *

In this talk, we will focus on Alluxio design, its architecture, data flow and metadata flow. We will dive into the choices in its design space and share the experiences when implementing features like data tiering, storage options and cache eviction policies. We will also share our lessons in design, implementation and operation when working to build an open source distributed storage systems with 900 contributors for 5+ years.