Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds

ODSC WEST 2019 Cloud storage brings great flexibility in management and cost-efficiency to data scientists, but also introduces new challenges related to data accessibility and data locality for machine learning applications. For instance, when the input data is stored in a remote cloud storage like AWS S3 or Azure blob storage, direct data access is … Continued

Tags: , , , , , , ,

What can I do to speed up analytics performance on remote data?

Background Today’s advanced analytics applications run on more datasets that ever before. The locations of where data “lands” is becoming more dispersed. And the separation of compute and storage in modern environments lends well to running on these distributed datasets. Data can be stored in a remote location from the compute, such as in a … Continued

O’Reilly AI Conference Keynote: Data Orchestration for AI, Big Data, and Cloud

Haoyuan Li’s keynote at O’Reilly Beijing discusses open source data orchestration and the value of leveraging Alluxio with rising trends driving the need for a new architecture. Four big trends driving this need: Separation of compute & storage, hybrid-multi cloud environments, rise of object store and self-service data across the enterprise.

Tags: , , , , , , , , , , ,

Community Office Hour: Running Spark & Alluxio in Kubernetes

The data orchestration layer bridging the gap between data locality with improved performance and data accessibility for analytics workloads in Kubernetes, and enables portability across storage providers.
An overview of Alluxio and the cloud use case with Spark in Kubernetes. Learn how to set up Alluxio and Spark to run in Kubernetes.

Tags: , , , , , , , , , , , ,

Alluxio at Beijing Meetup

Haoyuan Li presents at Beijing Meetup on open source data orchestration and the value of leveraging Alluxio with rising trends driving the need for a new architecture. Four big trends driving this need: Separation of compute & storage, hybrid-multi cloud environments, rise of object store and self-service data across the enterprise.

Tags: , , , , , , , , ,

RocksDB Meetup at Twitter

Bay Area Meetup *

Twitter SF is hosting 2019’s half yearly RocksDB Meetup with speakers from Twitter, Facebook and the community on July 11th.

Unified Big Data Analytics – Any stack, Any Cloud

Boston Meetup *

This presentation focuses on how Alluxio helps the big data analytics stack to be cloud-native. The trending Cloud object storage systems provide more cost-effective and scalable storage solutions but also different semantics and performance implications compared to HDFS. Applications like Spark or Presto will not benefit from the node-level locality or cross-job caching when retrieving data from the cloud object storage. Deploying Alluxio to access cloud solves these problems because data will be retrieved and cached in Alluxio instead of the underlying cloud or object storage repeatedly.