Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds

ODSC WEST 2019 Cloud storage brings great flexibility in management and cost-efficiency to data scientists, but also introduces new challenges related to data accessibility and data locality for machine learning applications. For instance, when the input data is stored in a remote cloud storage like AWS S3 or Azure blob storage, direct data access is … Continued

Tags: , , , , , , ,

Why Data Orchestration?

Today’s current pace of innovation is hindered by the necessity of reinventing the wheel in order for applications to efficiently access data. When an engineer or scientist wants to write an application to solve a problem, he or she needs to spend significant effort on getting the application to access the data efficiently and effectively, rather than focusing on the algorithms and the application’s logic.

Tags: , , , ,

Data Orchestration Summit 2019

Alluxio Conference *

Announcing the first Data Orchestration Summit in November 2019! This Summit brings together data engineers, cloud engineers, data scientists, and industry thought leaders who are solving data problems at the intersection of cloud, AI, and data.

Accelerating Spark with Kubernetes

Alluxio Tech Talk *

This tech talk gives a quick overview of Alluxio and the use cases it powers for Spark/Presto in Kubernetes. We also show you how to set up Alluxio and Spark/Presto to run in Kubernetes.

Accelerating Write-intensive Data Workloads on AWS S3

Alluxio is an open-source data orchestration system widely used to speed up data-intensive workloads in the cloud. Alluxio v2.0 introduced Replicated Async Write to allow users to complete writes to Alluxio file system and return quickly with high application performance, while still providing users with peace of mind that data will be persisted to the chosen under storage like S3 in the background.