Founder Blog: Alluxio Chapter 2.0

March 28, 2019

Haoyuan Li

In the early 2000s, big data was born, and technology companies were racing to create the next-gen compute frameworks or storage systems geared towards the requirements brought about by big data. By the time I was a first year Ph.D. student at UC Berkeley’s AMPLab in 2011, numerous advances in big data related technologies such as Apache Spark was emerging. Through working on Apache Spark and getting exposed to cutting-edge technologies it became clear that sharing data among data driven applications with different compute frameworks and moving data across storage systems would become the bottleneck for any organization that wants to extract value from their data. To solve these challenges, I created Alluxio (formerly Tachyon), which for the lack of a defined category I called it a virtualized distributed file system in my original thesis. Since then, Alluxio has evolved as the data ecosystem has greatly expanded. We have been seeing the rise of hybrid & multi cloud environments, the fast growth of the AI/ML/DL workloads and technologies, the explosion of object stores, and the eagerness to develop a culture of self-servicing data in all leading companies. All of these advancements further exacerbate the need for greater data mobility and accessibility. As a result, the value that Alluxio brings is ever more critical today.

Today, Alluxio is deployed and trusted by industry leading companies such as China Unicom and Development Bank of Singapore. Some of the large deployments have more than 1,000 nodes in a single Alluxio cluster, powering some of the critical infrastructures in the world. At the same time, our community has grown to 1000+ contributors, and our software can handle billions of files and manage petabyte scale data.

I believe in order for us to take full advantage of the opportunity as the leader in this market and realize our vision, I need a partner in crime so to speak who believes in the vision, has extensive open source go-to-market experience, and shares my passion for creating the future. I have found all those qualities in Steven Mih, who I am thrilled to welcome as our new CEO. I connected with Steven about a year ago, and I’ve greatly enjoyed learning from his experiences and getting to know him. In addition to having deep go-to-market experience, Steven is also an open source veteran having held leadership roles at Couchbase and Mesosphere. With Steven onboard I will be assuming the role of CTO and chairman of the board, doubling down to focus on the technology and product vision, as well as spending time with users, all of which are areas that I am deeply passionate about.

I am more excited today than ever as I believe with Steven onboard, Alluxio is in the perfect position to realize the vision of being the data orchestration layer enabling new technology stacks and serving organizations to unlock the power of data for all. Cheers!

Share this post

Blog

Diagnose & Fix Slow Distributed Training

Got periodic drops in GPU utilization? GPU Stalls? Training capacity grinding to a halt? Learn how checkpoint writes could be the cause of your suddent, yet periodic drops in training performance.

20x Faster Training Data Reads with Alluxio and Ray on Anyscale: A Cross-Region Benchmark

Alluxio and Anyscale benchmark achieves 20x faster cross-region data reads for AI training workloads on GCS.

Alluxio AI 3.9 Brings Checkpoint Acceleration to Any AI Training Framework

Alluxio AI 3.9 introduces POSIX Write Cache, eliminating the checkpoint write bottleneck in distributed training with 7.6 GiB/s per node throughput and sub-2ms P99 latency. Get all of the details here!

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo