Big Data Bellevue Meetup May 19, 2022 Today, data engineering in modern enterprises has become increasingly more complex and resource-consuming, particularly because (1) the rich amount of organizational data is often distributed across data centers, cloud regions, or even cloud providers, and (2) the complexity of the big data stack has been quickly increasing over … Continued
Big Data Bellevue & Cloudy With a Chance of Data Meetup October 20, 2022 Distributed systems are made up of many components such as authentication, a persistence layer, stateless services, load balancers, and stateful coordination services. These coordination services are central to the operation of the system, performing tasks such as maintaining system configuration state, … Continued
This talk will go over a generic example of stateful coordination service moving from Zookeeper to Raft.
The presentation talks about the best practices to set up and techniques to build a cluster with open source Alluxio on AWS EKS, for one of our clients, which made it Scalable, Reliable, and Secure by adapting to Kubernetes RBAC.
In this talk, we will present how using Alluxio computation and storage ecosystems can better interact benefiting the “bringing the data close to the code” approach. Moving away from the complete disaggregation of computation and storage, data locality can enhance the computation performance. During this talk, we will present our observations and testing results that will show important enhancements in accelerating Spark Data Analytics on Ceph Objects Storage using Alluxio.
Alluxio and Presto are a powerful combination to address the compute problem, which is part of the strategy used by Simbiose Ventures to create a product called StorageQuery – A platform to query files in cloud storages with SQL.
In Alluxio, an Under File System is the plugin to connect to any file systems or object stores, so users can mount different storages like AWS S3 or HDFS into Alluxio namespace. This under filesystem is designed to be modular, in order to enable users to easily extend this framework with their own Under File System implementation and connect to a new or customized storage system.
For many latency-sensitive SQL workloads, Presto is often bound by retrieving distant data. In this talk, Rohit Jain, James Sun from Facebook and Bin Fan from Alluxio will introduce their teams’ collaboration on adding a local on-SSD Alluxio cache inside Presto workers to improve unsatisfied Presto latency.
Today, many people run deep learning applications with training data from separate storage such as object storage or remote data centers. This presentation will demo the Intel Analytics Zoo + Alluxio stack, an architecture that enables high performance while keeping cost and resource efficiency balanced without network being I/O bottlenecked.