Alluxio Community Newsletter - February 2024

HIGHLIGHTS

Alluxio featured in a16z’s LLM App Stack

The LLM App stack serves as a blueprint for developers from tech companies and AI-advanced companies building applications powered by LLM. Alluxio is well-positioned in the LLM stack as part of the data pipeline. Alluxio sits at the intersection of compute and storage, and provides an efficient offline model training cache capable of serving datasets of any size directly to training nodes without impacting the training performance, thereby accelerating model training and serving, boosting GPU utilization, and reducing costs for AI workloads.

Learn More

Top Data Predictions for 2024

Our Founder and CEO, Haoyuan Li, has revealed the top data predictions for 2024. Predictions include compute power as the new oil, moving GenAI from pilots to production, overcoming data silos, cloud cost optimization, and a shift towards commoditized storage, among others.

Read Now

Alluxio Named to the InsideBigData Impact 50 List and Emerging Startups 2024

We are honored to be named to the InsideBigData Impact 50 list and recognized as one of the Emerging Startups 2024: Top Big Data Analytics Startups 🎉

Mini Videos & GOOD READS

Event Recap Videos | Front Row at Alluxio’s Data Infra Meetup: Recap & Highlights

Relive the excitement of Alluxio’s recent Data Infra Meetup! We’ve captured the essence of the event, highlighting the insightful presentations by tech leaders from Uber, ByteDance, CMU, and Alluxio. Gain insights on how to build a scalable and cost-effective data platform with epic performance!

Watch Now

Mini Videos | Rise of Data Access Challenges for AI Workloads Series

Alluxio’s Community Evangelist ChanChan Mao shares common solutions to address the challenges of data access for AI workloads. Uncover the shortcomings of those solutions and why they fail to offer a scalable and optimized data access architecture for AI/ML’s growing data volumes. Explore how Alluxio provides a performant and scalable data access layer to maximize the utilization of GPU resources and offers a better long-term solution as AI and ML data volumes continue to scale.

We have new videos releasing every 2 weeks. Subscribe to our channel and stay tuned!

upcoming events

AI User Group Meetup | Optimizing AI Platform for Automotive ADAS and Autonomous Driving | Tuesday, Match 5, 6 pm PT

Alluxio is heading to San Francisco on Tuesday, March 5 to speak at AI for Developers meetup. Join us and other speakers for an eventful night as we deep dive into all things AI. Alluxio’s Core Maintainer & Open Source Product Manager Shouwei Chenwill share an exciting presentation on “Optimizing AI Platform for Automotive ADAS and Autonomous Driving” at 7:30pm PT

Register In-person

Register for Livestream

Scale 21x | Cloud native data and model lifecycle management for AI | Saturday, March 16, 12:30 pm PT

Alluxio will be attending Scale 21x in Pasadena, CA next month! Look out for a session by Alluxio’s AI Platform Tech Lead Lu Qiu and Research Scientist Chunxu Tang on Saturday, March 16 @ 12:30pm. They will delve deep into the intricacies of managing the full lifecycle of AI data and models in a cloud-native environment.

Register Now

KubeCon Europe 2024 | Advanced CSI-FUSE Filesystem for AI/ML Data Management in Kubernetes | Tuesday, March 19, 15:50 CET

AI/ML workloads, known for their data intensity, often depend on cloud storage. However, Kubernetes faces significant challenges in accessing this cloud storage data efficiently, primarily due to the lack of a Kubernetes volume interface and a heavy reliance on object storage-specific APIs. In this session, Lu Qiu, AI Platform Tech Lead @ Alluxio will discuss how to use an advanced CSI-FUSE filesystem to address the above challenges and guide users in selecting the most suitable CSI-FUSE solution for their specific Kubernetes AI/ML workloads.

Learn More

Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML | Thursday, March 21 @ 6:30 – 7:30 PM PT

Join the Big Data Bellevue Meetup (Seattle & Virtual, Thursday, March 21 @ 6:30 PM) to hear from Bin Fan (VP of Open Source @ Alluxio), who will discuss a critical challenge of optimizing data loading for distributed Python applications within AI/ML workloads in the cloud, focusing on popular frameworks like Ray and Hugging Face.

Register In-person

Register for Livestream

Data Council | Tackling I/O Challenges in Modern Data Lakes | Thursday, March 28

It has become increasingly popular to build modern open-source data lakes for big data analytics and AI workloads. On the application side, the I/O workload is also quickly evolving in its patterns. For example, recent machine learning jobs tend to retrieve hundreds of millions of relatively small files/objects in training, which increasingly challenge the scalability, cost-efficiency and throughput of metadata serving. In this talk, Hope Wang, Developer Advocate at Alluxio, will share the analysis of these industry trends, challenges, and success stories working in the open-source ecosystem.

Learn More

Past events on-demand

Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader Should Understand

Catch up with the replay of of our recent webinar, where Kevin Petrie (VP of Research at Eckerson Group) and Omid Razavi (SVP of Customer Success at Alluxio) shared key trends and guiding principles for data and AI leaders, including cost governance, applications and workflows, data pipelines, data stores and infrastructure.

Watch Now

Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Platform

Tarik Bennett shared insights into real-world examples and best practices for deploying AI across on-prem, hybrid, and multi-cloud environments during our latest monthly webinar. This on-demand session will introduce how to embrace the separation of storage from compute and simplify the adoption of multi-cloud for AI.

Watch Now

Got a tech question for the Alluxio Community? Chat with us on Slack!

Be our stargazers on GitHub ⭐

If you like our product, please give it a star on GitHub, and share the goodness!

WHITEPAPERs

Efficient Data Access Strategies For Large-scale AI – Architecture and Considerations in Machine Learning Pipeline

Rise of the Data Access Layer for Analytics & AI

Choosing the Right Architecture for Enterprise AI Workloads in Production

HOT JOBS

We currently have 30+ opportunities across the globe! Learn more about our job openings in Customer Success, Sales, Product, and Engineering teams. Are you awesome or know of anyone to refer? Check out the full list of opportunities and apply here.

Senior Account Support Engineer (San Mateo, California)

Senior Solutions Engineer (San Mateo, California)

Senior Account Executive (San Mateo, California)

Software Engineering Manager (San Mateo, California)