Enabling Big Data and AI workloads on the Object Store at DBS Bank

Strata Data Conference New York *

In this presentation, Vitaliy Baklikov from DBS Bank and Dipti Borkar from Alluxio will share how DBS Bank has built a modern big data analytics stack leveraging an object store as persistent storage even for data-intensive workloads and how it uses Alluxio to orchestrate data locality and data access for Spark workloads. In addition, deploying Alluxio to access data, solves many challenges that cloud deployments bring with separated compute and storage.

Evolution of big data stacks under computational and storage separation architecture

Shanghai *

A new generation of open source big data, represented by Alluxio, born at the University of California at Berkeley, looks at this issue. Different from systems such as designing storage tight coupling to achieve low-cost reliable storage HDFS, by providing a virtual data storage layer defined and implemented by software for data applications, abstracting and integrating cloudy, hybrid cloud, multi-data center and other environments The underlying files and objects, and through intelligent workload analysis and data management, make data close to computing and provide data locality, big data and machine learning applications can be achieved with the same performance and lower cost.

Spark+AI Summit SF 2019

SAIS 2019 *

What’s Spark+AI Summit? It’s the world’s largest conference that is focused on Apache Spark – Alluxio’s older cousin open source project from the same lab (UC Berkeley’s AMPLab – now RISElab).

Open Source Global Tech Leadership Meetup

Global Tech Leadership Conference *

Open source software always plays critical role in software development. From Linux kernel to TensorFlow, it drives a lot of awesome projects which created trend and led direction of technology.
We are pleased to have several experts, Reynold Xin, Dongxu Huang, Qing Han, Bin Fan, Amelia Wong, etc. who will share the technology and stories on their successful open source project.

Production Spark and Tachyon Use Cases

Spark Summit Europe *

During the past several years, Spark has significantly changed the landscape of big data computing. It improves performance of various applications dramatically. However, in certain Spark use cases, the bottleneck is in the I/O stack. In this talk, we will introduce Tachyon, a distributed memory-centric storage system. In addition, we will talk about several production use cases where Tachyon further improves Spark applications’ performance by orders of magnitude.

Data Driven #46 (a FirstMark Event)

Data Driven NYC *

Check out our new blog post: “Internet of Things: Are We There Yet? (The 2016 IoT Landscape)”: The Internet of Things is all about data!

Fast big data analytics and machine learning using Alluxio and Spark in Baidu

Strata+Hadoop World San Jose *

A few months ago, Baidu deployed Alluxio to accelerate its big data analytics workload. Bin Fan and Haojun Wang explain why Baidu chose Alluxio, as well as the details of how they achieved a 30x speedup with Alluxio in their production environment with hundreds of machines. Based on the success of the big data analytics engine, Baidu is currently expanding the Alluxio and Spark infrastructure to accelerate other applications, such as machine learning.

Unified Namespace and Tiered Storage in Alluxio

Strata+Hadoop World San Jose *

Calvin Jia and Jiri Simsa explain how the current Alluxio tiered storage can be easily configured to use memory, SSDs, and hard drives in different tiers. Alluxio users and administrators do not have to manually migrate the data because data in Alluxio is managed transparently between all the configured tiers, similar to the way the CPU manages L1, L2, and lower-level caches. Meanwhile, Alluxio also provides users fine-grained control of manipulating data to plug in their own data-management strategies; users can also pin files in Alluxio to a specific storage or specify a TTL to files. Calvin and Jiri also describe the interface for managing heterogeneous data sources into the Alluxio namespace, which takes advantage of Alluxio’s ability to interoperate with different underlying storage systems such as HDFS, S3, GlusterFS, or Swift.