Products
On-Demand Videos
video
AI/ML Infra Meetup | Open Source Michelangelo: Uber's Predictive to Generative end to end ML Lifecycle management platform

In this talk, Eric Wang, Senior Staff Software Engineer introduces Uber’s open-source generative end-to-end ML lifecycle management platform: Michelangelo.
video
AI/ML Infra Meetup | Unlock the Future of Generative AI: TorchTitan's Latest Breakthroughs

In this talk, Jiani Wang, Software Engineer Meta's Pytorch Team, dives into the overview and the latest advancements in TorchTitan.
video
AI/ML Infra Meetup | Bringing Data to GPUs Anywhere + Get Low-Latency on Object Store with Alluxio

In this talk, Bin Fan, VP of Technology at Alluxio, explores how to enable efficient data access across distributed GPU infrastructure, achieving low-latency performance for feature stores and RAG workloads.
.png)
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
video
Zookeeper vs Raft: Stateful Distributed Coordination with HA and Fault Tolerance
Big Data Bellevue & Cloudy With a Chance of Data Meetup
October 20, 2022
Distributed systems are made up of many components such as authentication, a persistence layer, stateless services, load balancers, and stateful coordination services. These coordination services are central to the operation of the system, performing tasks such as maintaining system configuration state, ensuring service availability, name resolution, and storing other system metadata. Given their central role in the system it is essential that these systems remain available, fault tolerant and consistent. By providing a highly available file system-like abstraction as well as powerful recipes such as leader election, Apache Zookeeper is often used to implement these services. This talk will go over a generic example of stateful coordination service moving from Zookeeper to Raft.
Meetup Groups
Big Data Bellevue: https://www.meetup.com/big-data-bellevue-bdb/
Cloudy With a Chance of Data: https://www.meetup.com/meetup-datascience/
Large Scale Analytics Acceleration
Hybrid Multi-Cloud
video
Architecting Data Platform Across Regions and Clouds for Analytics and AI
Data platform teams are increasingly challenged with accessing multiple data stores that are separated from compute engines, such as Spark, Presto, TensorFlow or PyTorch. Whether your data is distributed across multiple datacenters and/or clouds, a successful heterogeneous data platform requires efficient data access.
In October’s Product School, Alluxio’s Lead Solutions Engineer Greg Palmer will present and demo how Alluxio enables you to embrace the cloud migration strategy or multi-cloud architecture for large-scale analytics and AI workloads. Alluxio also helps scale out your platform adoption for analytics and AI across multiple tenants and applications teams.
Large Scale Analytics Acceleration
Model Training Acceleration
Model Distribution
Hybrid Multi-Cloud
Data Platform Modernization
video
Modern Data Stack in Motion
ALLUXIO DAY x APAC Modern Data Stack 2022
In this presentation, Yingjun Wu, Founder @ RisingWave Labs will talk about the birth, the growth, and the prosperity of modern data stack. I will show you why modern data stack is more than a buzzword, and how it will possibly evolve in the next couple of years.
No items found.
video
Apache Hudi: Community-Driven Development
ALLUXIO DAY x APAC Modern Data Stack 2022
September 22, 2022
Apache Hudi’s open-source community is very active and healthy. In this talk, an overview of community-driven major features will be presented, followed by a deep-dive into two of those features, metastore and table management service, driven by Bytedance to illustrate Hudi’s platform vision.
No items found.
video
The Power of Data Orchestration: Storage Acceleration and Servitization at Shopee
ALLUXIO DAY x APAC Modern Data Stack 2022
September 22, 2022
Shopee is the leading e-commerce platform in SouthEast Asia. In this presentation, Luo Li from Shopee will share their Data Infra team’s recent project on acceleration with Presto and storage servitization. He will share the details on how Shopee leverages Alluxio to accelerate Presto query and provide standardized methods of accessing data through Alluxio-Fuse and Alluxio-S3.
Large Scale Analytics Acceleration
Data Platform Modernization
video
The Architecture Overview of OceanBase DataBase
ALLUXIO DAY XV 2022
September 15, 2022
OceanBase Database, is an open-source, distributed Hybrid Transactional/Real-time Operational Analytics (HTAP) database management system that has set new world records in both the TPC-C and TPC-H benchmark tests. OceanBase Database starts from 2010, and it has been serving all of the critical systems in Alipay. Besides Alipay, OceanBase has also been serving customer from a variety of sectors, including Internet, financial services, telecommunications and retail industry.
In this tech talk, we will talk about the architecture of OceanBase and some typical use cases. This talk will include some technical topic such as Paxos replication, 2PC commit, LSM-Tree like storage, SQL optimizer and executor, city-level disaster recovery, etc.
No items found.
video
Accelerating Cloud Training With Alluxio
ALLUXIO DAY XV 2022
September 15, 2022
This talk introduces the three game level progressions to use Alluxio to speed up your cloud training with production use cases from Microsoft, Alibaba, and BossZhipin.
- Level 1: Speed up data ingestion from cloud storage
- Level 2: Speed up data preprocessing and training workloads
- Level 3: Speed up full training workloads with a unified data orchestration layer
Model Training Acceleration
Cloud Cost Savings
Hybrid Multi-Cloud
Data Platform Modernization
video
ML-Based SQL Query Resource Usage Prediction
ALLUXIO DAY XV 2022
September 15, 2022
With the advent of the Big Data era, it is usually computationally expensive to calculate the resource usages of a SQL query. Can we estimate the resource usages of SQL queries more efficiently without any computation in a SQL engine kernel? In this session, Chunxu and Beinan would like to introduce how Twitter’s data platform leverages a machine learning-based approach in Presto and BigQuery to estimate query utilization with 90%+ accuracy.
Model Training Acceleration
Cloud Cost Savings
video
Real-Time Analytics: Going Beyond Stream Processing With Apache Pinot
ALLUXIO DAY XV 2022
September 15, 2022
Streaming systems form the backbone of the modern data pipeline as the stream processing capabilities provide insights on events as they arrive. But what if we want to go further than this and execute analytical queries on this real-time data? That’s where Apache Pinot comes in.
OLAP databases used for analytical workloads traditionally executed queries on yesterday’s data with query latency in the 10s of seconds. The emergence of real-time analytics has changed all this and the expectation is that we should now be able to run thousands of queries per second on fresh data with query latencies typically seen on OLTP databases.
Apache Pinot is a realtime distributed OLAP datastore, which is used to deliver scalable real time analytics with low latency. It can ingest data from streaming sources like Kafka, as well as from batch data sources (S3, HDFS, Azure Data Lake, Google Cloud Storage), and provides a layer of indexing techniques that can be used to maximize the performance of queries.
Come to this talk to learn how you can add real-time analytics capability to your data pipeline.
No items found.
video
Deconstructing a Machine Learning Pipeline with Virtual Data Lake
As more and more companies turn to AI / ML / DL to unlock insight, AI has become this mythical word that adds unnecessary barriers to new adaptors. Oftentimes it was regarded as luxury for those big tech companies only – this should not be the case.
In this talk, Jingwen will first dissect the ML life cycle into five stages – starting from data collection, to data cleansing, model training, model validation, and end at model inference / deployment stages. For each stage, Jingwen will then go over its concept, functionality, characteristics, and use cases to demystify ML operations. Finally, Jingwen will showcase how Alluxio, a virtual data lake, could help simplify each stage.
Model Training Acceleration
Model Distribution
Hybrid Multi-Cloud
Data Platform Modernization
Cloud Cost Savings
video
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Alluxio foresaw the need for agility when accessing data across silos separated from compute engines like Spark, Presto, Tensorflow and PyTorch. Embracing the separation of storage from compute, the Alluxio data orchestration platform simplifies adoption of the data lake and data mesh paradigm for analytics and AI/ML. In this talk, Bin Fan will share observations to help identify ways to use the platform to meet the needs of your data environment and workloads.
越來越多的企業架構已轉向混合雲和多雲環境。雖然這種轉變帶來了更大的靈活性和敏捷性,但也意味著必須將計算與存儲分離,這就對企業跨框架、跨雲和跨存儲系統的數據管理和編排提出了新的挑戰。此分享將讓聽眾深入了解Alluxio數據編排理念在數據中台對存儲和計算的解耦作用,以及數據編排針對存算分離場景提出的創新架構,同時結合來自金融、運營商、互聯網等行業的典型應用場景來展現Alluxio如何為大數據計算帶來真正的加速,以及如何將數據編排技術用於AI模型訓練!
*This is a bilingual presentation.
Large Scale Analytics Acceleration
Model Training Acceleration
Cloud Cost Savings
Storage Cost Savings
Hybrid Multi-Cloud
video
Alluxio and Apache Ranger Best Practices
As data stewards and security teams provide broader access to their organization’s data lake environments, having a centralized way to manage fine-grained access policies becomes increasingly important. Alluxio can use Apache Ranger’s centralized access policies in two ways: 1) directly controlling access to virtual paths in the Alluxio virtual file system or 2) enforcing existing access policies for the HDFS under stores. This presentation discusses how the Alluxio virtual filesystem can be integrated with Apache Ranger.
Large Scale Analytics Acceleration