Uber builds one of the biggest data lakes in the industry, which stores exabytes of data. In this talk, we will introduce the evolution of our data storage architecture, and delve into multiple key initiatives during the past several years.
Specifically, we will introduce:
- Our on-prem HDFS cluster scalability challenges and how we solved them
- Our efficiency optimizations that significantly reduced the storage overhead and unit cost without compromising reliability and performance
- The challenges we are facing during the ongoing Cloud migration and our solutions
Jing Zhao is a Principal Engineer on the Data team at Uber. He is a committer and PMC member of Apache Hadoop and Apache Ratis.