Speeding Up I/O for Machine Learning

Alluxio Global Online Meetup *

This talk will guide the audience on how Alluxio can greatly simplify the data preparation phase in with remote and possibly multiple data sources. We will share the lessons and benchmark from Bill Zhao an engineer led in Apple when building a Machine Learning platform using Tensorflow, NFS, DC/OS and Alluxio.

Tech Talk: Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage

The ever increasing challenge to process and extract value from exploding data with AI and analytics workloads makes a memory centric architecture with disaggregated storage and compute more attractive. This decoupled architecture enables users to innovate faster and scale on-demand. Enterprises are also increasingly looking towards object stores to power their big data & machine learning workloads in a cost-effective way. However, object stores don’t provide big data compatible APIs as well as the required performance. 

In this webinar, the Intel and Alluxio teams will present a proposed reference architecture using Alluxio as the in-memory accelerator for object stores to enable modern analytical workloads such as Spark, Presto, Tensorflow, and Hive. We will also present a technical overview of Alluxio.

Tags: , , , , , , ,

How do you run TensorFlow on a remote storage system?

Problem It becomes increasingly more popular among data scientists to train models based on frameworks like TensorFlow on a local server or cluster while using remote shared storages like S3 or Google Cloud Storage to store a massive amount of the input data. This stack provides high flexibility and cost efficiency, especially requires no dev-ops … Continued

How do you train and deploy ML models using TensorFlow and S3?

TensorFlow is an open source machine learning platform used to build applications like deep neural networks. It consists of an ecosystem of tools, libraries, and community resources for machine learning, artificial intelligence and data science applications. S3 is an object storage service that was created originally by Amazon. It has a rich set of API’s … Continued

Unified Big Data Analytics – Any stack, Any Cloud

Boston Meetup *

This presentation focuses on how Alluxio helps the big data analytics stack to be cloud-native. The trending Cloud object storage systems provide more cost-effective and scalable storage solutions but also different semantics and performance implications compared to HDFS. Applications like Spark or Presto will not benefit from the node-level locality or cross-job caching when retrieving data from the cloud object storage. Deploying Alluxio to access cloud solves these problems because data will be retrieved and cached in Alluxio instead of the underlying cloud or object storage repeatedly.