As we step into 2024, we look back and celebrate an incredible year of 2023 for the Alluxio community.
First and foremost, thank you to all of our contributors and the broader community! Together, we have achieved remarkable milestones. 💖
📈 Highlights by Numbers
Let’s take a look at the Alluxio in 2023 by numbers.
- 1,250 Total contributors across the globe
- 106 Blog posts in multiple languages, 25 YouTube videos
- 42 Tech talks in 32 conferences and events
- 30 Alluxio-hosted events
🗓️ Key Milestones We Reached
We kicked off 2023 strong by being named one of the Best Open Source Projects. Since then, we’ve reached the following milestones:
- March – Alluxio named as one of Best Open Source Software by Datamation
- April – Alluxio “Modern Data Platform Excellence” eBook series launched, including tuning guides of Trino, Presto, and PyTorch
- May – Uber shares the millions-cost-saving project of HDFS DataNode with Alluxio cache on the Uber engineering blog
- July – Large scale AI user case study published with Zhihu using Alluxio as a high-performance data access layer for LLM training and deployment
- October – AI Infra Day, Alluxio-hosted flagship virtual AI event to bring together tech leaders and practitioners to share their success stories in building AI infra
- December – Alluxio Summit goes global by kicking off the first international event in Asia featuring 24 speakers from organizations like Zhihu, iFlytech, Kuaishou, Shopee, Alipay
😍 Ecosystems We Love
We also gave back to our ecosystem by furthering collaborations with open-source projects such as Trino, Presto, PyTorch, Ray, and more. Better together! Notable collaborations include:
- Alluxio-developed new Trino object caching (coming soon), Alluxio & Trino integration, 5-min tutorial, Trino performance tuning eBook and blogs
- Alluxio-developed Presto SDK cache enhancement, Presto & Alluxio integration, Presto performance tuning eBook and blogs
- Alluxio & PyTorch reference architecture on AWS, PyTorch training performance tuning eBook and blogs
- Alluxio & Ray integration
- Alluxio on Kubernetes to achieve data locality
💯 Stories We Told
A wide diversity of organizations around the globe use Alluxio as the foundation of their production data platforms. Here are a few highlights of content written by the community sharing their experiences, designs, and best practices.
- Uber: Optimizing HDFS with DataNode Local Cache
- Shopee: “Data Access as a Service” – Using Alluxio to Accelerate Interactive Queries and Enhance Developer Experience with Flexible APIs
- Alipay: Optimizing Alluxio for Efficient Large-Scale Training on Billions of Files
- Zhihu: Building a High-performance Data Access Layer for Model Training and Model Serving for LLM
As we’ve shared in the milestone, we published the “Modern Data Platform Excellence” eBook series to help developers with the best performance tuning tips of Trino, Presto, and PyTorch to be more efficient. More than 2,000 people have downloaded these eBooks.
Our technical blogs introduce how developers made their design decisions and how key features are implemented in Alluxio, including Cross Cluster Synchronization, Data Caching Strategies for Data Analytics and AI, Call Chain Relationship Between Presto, Hive and Alluxio, A Deep Dive into Caching in Presto, Introducing DORA: The Next-generation Alluxio Architecture, Consistent Hashing in Alluxio DORA, Data Locality on Cloud for AI.
As video format is getting more popular among developers, we kicked off our Alluxio Mini Series on YouTube, featuring product demos, user journeys, and educational content to help developers learn more about Alluxio and onboard Alluxio more easily. Some of our videos include What is Alluxio?, Efficient Alluxio Caching with Paging Storage, Alluxio on Kubernetes Architecture, Model Training Platform Architecture with Alluxio, End-to-End Machine Learning Pipeline with Alluxio, and Speed Up Your Data Access by 8x With Alluxio vs S3FS.
🎙️ Community Events We Hosted and Participated
In 2023, the Alluxio community put together a number of meetups and events across the globe to share the latest trends.
Among others, two Alluxio-hosted events received the most attention from community members.
- AI Infra Day 2023: With 500+ attendees globally, this one-day virtual event gathered 7 speakers with 6 sessions from companies such as Uber, Meta and more. Watch the sessions here.
- Alluxio Summit Beijing 2023: This event takes Alluxio Summit global with the first stop in Beijing, with 500+ in-person and 10,000+ online attendees. Featuring 24 speakers from organizations like Zhihu, iFlytech, Kuaishou, Shopee, Alipay, China Unicom, Bilibili, etc., 7 keynotes on Alluxio and other technologies, and 15 breakout sessions on tech deep dives, use cases, and ecosystem talks on both big data and AI. Read the recap here.
In addition to the two major community events, we presented at leading data and AI conferences around the world.
- Scale 20x, Kubernetes BATCH + HPC Day Europe, Linux Foundation Open Source Summit North America & Europe, DBTA Data Summit, Trino Fest, Data+AI Summit, Ray Summit, AI Conference, Community Over Code North America, PyTorch Conference, KubeCon North America, Scale By The Bay, PrestoCon, Open Source Analytics Conference, Trino Summit, QCon Shanghai, Data&AI Con Shanghai, SACC Shanghai & Beijing, Intel Top 100 Innovative AI Companies Online Seminar, DataFunCon.
Local meetups are also a part of our effort to bring developers closer in smaller, in-person, more engaging settings:
- Seattle Spark+AI Meetup, PyData Meetup, Presto Meetup, Ray Meetup at PingCAP, University of San Francisco Meetup, Alluxio & Doris Joint Meetup, Alluxio local meetups in Beijing/Shanghai/Shenzhen.
🐲 Ride High on the Dragon’s Wings in 2024
Together with the community, we plan to double down and invest further in the Alluxio open-source project as well as ecosystem collaborations. A few things on the roadmap include enhancing the support for AI workloads to meet the needs of developers working on data and AI infra. We will further deepen the integration with your favorite compute frameworks including Trino, HuggingFace, Ray and more.
Additionally, we also have an exciting lineup of events coming in 2024, including Alluxio Summit US and in-person meetups. Moreover, we plan to deliver more developer-facing content and short videos to help developers get started, technical dives into Alluxio’s features and highlights, and share more user success stories. We welcome all community members to contribute to all formats of content.
To stay up to date with community news and discuss hot topics with other members, here are a few useful links:
- Join the Alluxio Community Slack to engage with Alluxio users and developers
- Follow us on Twitter and Linkedin
- Subscribe to our YouTube channel
- Download Alluxio
- Contributor guide
- Quick start guide
Most of all, if you ❤️ Alluxio as we do, please ⭐ us on GitHub.