spark Archives | Page 5 of 12

Improving Spark Memory Resource with Off-Heap In-Memory Storage

November 1, 2019 By Bin Fan and Adit Madan

In the previous tutorial ”Getting Started with Spark Caching using Alluxio in 5 Minutes”, we demonstrated how to get started with Spark and Alluxio. To share more thoughts and experiments on how Alluxio enhances Spark workloads, this article focuses on how Alluxio helps to optimize the memory utilization of Spark applications. For users who are … Continued

Online Meetup: Powering Data Science and AI with Apache Spark, Alluxio, and IBM

October 29, 2019

Learn why leading companies are moving towards a decoupled compute and storage architecture, and the associated challenges and requirements. Hear about how Spark and Alluxio together can solve the challenges.

Tags: analytics, compute storage separation, hdfs, meetup, performance, spark, use case

Building data lineage; Running Spark with Alluxio; Data Mesh

Big Data Application Meetup * November 21, 2019

Running Spark with Alluxio is a popular stack particularly for hybrid environments. In this session, Dipti will briefly introduce Alluxio, share the top 10 tips for performance tuning for real-world workloads, and demo Alluxio with Spark.

Alluxio – Data Orchestration for Analytics and AI in the Cloud

October 9, 2019

In this talk, we present: trends and challenges in the data ecosystem in cloud era; Data engineering in the cloud with data orchestration; Use cases of using tech stacks (Presto or Tensorflow) with Alluxio on S3.

Tags: aws s3, big data, cloud, data orchestration, hdfs, meetup, presto, spark, storage, tensorflow

From limited Hadoop compute capacity to increased data scientist efficiency

Alluxio Tech Talk * October 16, 2019

This tech talk will share approaches to burst data to the cloud along with
how Alluxio can enable “zero-copy” bursting of Spark workloads to cloud data services like EMR and Dataproc. Learn how DBS bank uses Alluxio to solve for limited on-prem compute capacity.

Community Office Hour: Accelerating Hive with Alluxio on S3

October 3, 2019

Learn more about Bazaarvoice’s use case leveraging Apache Spark, Hive, and Alluxio on S3. Along with how to set up Hive with Alluxio so that Hive jobs can seamlessly read from/write to S3.

Tags: alluxio engineering, aws s3, compute storage separation, hdfs, hive, office hour, spark

Powering Data Science and AI with Apache Spark, Alluxio, and IBM

Alluxio Global Online Meetup * October 15, 2019

In this online meetup, we will present the benefits of the fast analytics stack of Spark on Alluxio, and dive into China Unicom’s use case of leveraging Spark and Alluxio to process massive amounts of mobile data.

Tag: spark