Nowadays, cloud native environments have attracted lots of data-intensive applications deployed and ran on them, due to the efficient-to-deploy and easy-to-maintain advantages provided by cloud native platforms and frameworks such as Docker, Kubernetes. However, cloud native frameworks does not provide the data abstraction support to the applications natively. Therefore, we build Fluid project, which co-orchestrate data and containers together. We use Alluxio as the cache runtime inside Fluid to warm up hot data. In this report, we will introduce the design and effects of the Fluid project.
This is an open source community conference focused on the key data engineering challenges and solutions around building cloud-native data and AI platforms using latest technologies such as Alluxio, Apache Spark, Apache Airflow, Presto, Tensorflow, and Kubernetes.
Today, many people run deep learning applications with training data from separate storage such as object storage or remote data centers. This presentation will demo the Intel Analytics Zoo + Alluxio stack, an architecture that enables high performance while keeping cost and resource efficiency balanced without network being I/O bottlenecked.
Dataproc is Google’s managed Hadoop and Spark platform. In this talk, we will showcase how to swiftly build a hybrid cloud data platform with Alluxio and Presto and migrate data seamlessly.
Data infrastructure on-premises is increasingly complex and cloud adoption is attractive for business agility. Operating a hybrid environment is an approach to start benefiting from cloud elasticity quickly without abandoning the infrastructure on-premises. In this session I will discuss the benefits of using Alluxio’s Data Orchestration Platform to dynamically burst Apache Spark and Presto workloads to Amazon EMR for best performance and agility.
This talk describes benefits and methods Alluxio enables secure data access in the Comcast’s dx hybrid data cloud.
Unicom’s traditional batch architecture consists mainly of IOE, Hive, and Greenplum systems. With the development of business, a large number of computing application modules based on diverse scenarios, chimney-like, decentralized applications have emerged. To solve the problem of resource fragmentation, we have introduced a unified computing platform for computing ecology with Spark and Alluxio as the core. Alluxio plays an important role in accelerating data processing and ensuring process stability.
This talk introduces T3Go’s solution in building an enterprise-level data lake based on Apache Hudi & Alluxio, and how to use Alluxio to accelerate the reading and writing of data on the data lake when compute and storage are segregated.
Presto & Alluxio on AWS: How we build a Up-To-Date Data-Platform at Ryte.