data orchestration summit Archives

Alluxio Architecture and Scaling Performance

December 13, 2020

In this talk, I will introduce the high-level architecture of the current system, and present the various components of Alluxio. Also, I will discuss some of the main challenges of large scale Alluxio deployments, and the lessons we learned from those environments. This talk will detail some of the major scalability improvements added in the past several months, and how users can benefit from the changes.

Tags: architecture, data orchestration, data orchestration summit, scalability

Modernizing Global Shared Data Analytics Platform and our Alluxio Journey

December 13, 2020

In this keynote, you will learn about the evolution of the global data platform at Rakuten spread across multiple regions, and clouds. In addition, you will hear about the journey across the years, and the use of data orchestration for multiple use cases.

Tags: data analytics, data orchestration, data orchestration summit, rakuten

Introducing the Hub for Data Orchestration

December 13, 2020

We introduce Data Orchestration Hub, a management service that makes it easy to build an analytics or machine learning platform on data sources across regions to unify data lakes. Easy to use wizards connect compute engines, such as Presto or Spark, to data sources across data centers or from a public cloud to a private data center. In this session, you will witness the use of “The Hub” to connect a compute cluster in the cloud with data sources on-premises using Alluxio. This new service allows you to build a hybrid cloud on your own, without the expertise needed to manage or configure Alluxio.

Tags: data orchestration, data orchestration summit, hub

The Future of Computing is Distributed

December 13, 2020

Distributed applications are not new. The first distributed applications were developed over 50 years ago with the arrival of computer networks, such as ARPANET. Since then, developers have leveraged distributed systems to scale out applications and services, including large-scale simulations, web serving, and big data processing. However, until recently, distributed applications have been the exception, rather than the norm. However, this is changing quickly.

Tags: data orchestration, data orchestration summit, distributed applications

The Pandemic Changes Everything, The need for speed and resiliency

December 13, 2020

This is an open source community conference focused on the key data engineering challenges and solutions around building cloud-native data and AI platforms using latest technologies such as Alluxio, Apache Spark, Apache Airflow, Presto, Tensorflow, and Kubernetes.

Tags: data orchestration, data orchestration summit, intel

Alluxio Use Cases and Future Directions

December 13, 2020

In this keynote, Calvin Jia will share some of the hottest use cases in Alluxio 2 and discuss the future directions of the project being pioneered by Alluxio and the community. Bin Fan will provide an overview of the growth of Alluxio open-source community with highlights on community-driven collaboration with engineering teams from Microsoft and Alibaba to advance the technology.

Tags: alluxio engineering, data orchestration, data orchestration summit, use case

Data Orchestration for Analytics and AI in the Cloud Era

December 13, 2020

In this keynote from Haoyuan Li, founder and CEO of Alluxio, we will showcase how organizations have built data platforms based on data orchestration. The need to simplify data management and acceleration across different business personas has given rise to data orchestration as a requisite piece of the modern data platform. In addition, we will outline typical journeys for realizing a hybrid and multi-cloud strategy.

Tags: analytics, data orchestration, data orchestration summit, hybrid cloud

The practice of Presto & Alluxio in E-commerce big data platform

December 13, 2020

JD.com is one of the largest e-commerce corporations. In big data platform of JD.com, there are tens of thousands of nodes and tens of petabytes off-line data which require millions of spark and MapReduce jobs to process everyday. As the main query engine, thousands of machines work as Presto nodes and Presto plays an import role in the field of In-place analysis and BI tools. Meanwhile, Alluxio is deployed to improve the performance of Presto. The practice of Presto & Alluxio in JD.com benefits a lot of engineers and analysts.

Tags: big data, data orchestration, data orchestration summit, presto

How to Build a new under filesystem in Alluxio: Apache Ozone as an example

December 13, 2020

In this talk, Baolong Mao from Tencent will share his experience in developing Apache Ozone under file system, showing how to create a new Under File System in a few steps with minimal lines of code.

Tags: apache ozone, data orchestration, data orchestration summit

Tag: data orchestration summit