Tag: cloud

Q&A with Alluxio’s Bin Fan on Data Orchestration, Cloud Migration, and Data Engineering Challenges

October 10, 2019 By Amelia Wong

For today’s blog post I interviewed Bin Fan, Founding Engineer and VP of Open Source at Alluxio. Bin is the PMC maintainer of the Alluxio open source project. Prior to Alluxio, he worked for Google on the next-generation storage infrastructure.

Alluxio – Data Orchestration for Analytics and AI in the Cloud

October 9, 2019

In this talk, we present: trends and challenges in the data ecosystem in cloud era; Data engineering in the cloud with data orchestration; Use cases of using tech stacks (Presto or Tensorflow) with Alluxio on S3.

Tags: aws s3, big data, cloud, data orchestration, hdfs, meetup, presto, spark, storage, tensorflow

From limited Hadoop compute capacity to increased data scientist efficiency

Alluxio Tech Talk * October 16, 2019

This tech talk will share approaches to burst data to the cloud along with
how Alluxio can enable “zero-copy” bursting of Spark workloads to cloud data services like EMR and Dataproc. Learn how DBS bank uses Alluxio to solve for limited on-prem compute capacity.

Data Orchestration for AI, Big Data, and Cloud

October 3, 2019

Haoyuan Li offers an overview of a data orchestration layer that provides a unified data access and caching layer for single cloud, hybrid, and multicloud deployments.

Tags: big data, cloud, compute, conference, data orchestration, storage

Why Data Orchestration?

September 18, 2019

Large-scale analytics and AI/ML applications require efficient data access, with data increasingly distributed across multiple data stores in private data centers and clouds. Data platform teams also need the flexibility to introduce new data sources and move to new storage options with minimal changes or downtime for their applications. This paper delves further into what’s driving the need for–and what problems are solved with—an Alluxio data orchestration layer as part of a modern data platform.

Tags: cloud, compute storage separation, data orchestration, performance, storage

Tech Talk: Accelerating analytics with EMR on your S3 data lake

September 12, 2019

EMR has become a widely used service to run big data analytics in the public cloud. But issues around slow/inconsistent EMR performance due to S3 data lakes creates challenges for organizations.

Alluxio is a data orchestration layer for the cloud that increases performance of analytic workloads running on AWS EMR using S3 as the storage.

Join us for this webinar where we will show you how to set up EMR Spark and Hive with Alluxio so jobs can seamlessly read from and write to your S3 data lake. You’ll see the performance gains with Alluxio in your EMR/S3 stack.

Tags: cloud, emr, hive, s3, spark, tech talk

Community Office Hour: Building a Cloud Native Stack with EMR Spark, Alluxio, and S3

August 27, 2019

Learn how to set up EMR Spark with Alluxio so Spark jobs can seamlessly read from and write to S3. See the performance comparison between Spark on S3 with Spark, and Alluxio on S3.

Tags: aws s3, cloud, emr, office hour, performance, spark

Austin Meetup: Efficient Data Engineering with Apache Spark, Hive, and Alluxio on S3

August 20, 2019

Alluxio’s first cloud, data & orchestration Austin meetup featuring talks and demos on efficient data engineering with Apache Spark, Hive and Alluxio on S3.

Tags: apache hive, aws s3, cloud, meetup, spark

What can I do to speed up analytics performance on remote data?

Background Today’s advanced analytics applications run on more datasets that ever before. The locations of where data “lands” is becoming more dispersed. And the separation of compute and storage in modern environments lends well to running on these distributed datasets. Data can be stored in a remote location from the compute, such as in a … Continued

Posts navigation

⇚ Previous 1 2 3 4 5 6 7 8 9 10 11 12 Next ⇛