Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Alluxio – Data Orchestration for Analytics and AI in the Cloud
Data storage is migrating from the colocated model (e.g., HDFS) to a more cost-effective, scalable but often fully disaggregated and remote data lake model (e.g. S3). This has created a strong need for data orchestration in the cloud like what K8s does for container-based workloads, so that data can be presented in the right layout at right location for data applications on the cloud. Originally developed from UC Berkeley AMPLab project “Tachyon”, Alluxio (www.alluxio.io) implements the world’s first open-source data orchestration system in the cloud: an unified access layer for data-driven applications in bigdata and ML, enabling Spark, Presto or TensorFlow to transparently access different external storage systems while actively leveraging in-memory cache to accelerate data access. In this talk, we will present: trends and challenges in the data ecosystem in cloud era; Data engineering in the cloud with data orchestration; Use cases of using tech stacks (Presto or Tensorflow) with Alluxio on S3
(H.Y.) Li is the Founder, and CTO of Alluxio. He co-created Alluxio (formerly Tachyon), an open source virtual distributed file system.
Bin Fan is the VP of Open Source at Alluxio. Prior to Alluxio, he worked for Google to build the next-generation storage infrastructure.