apache spark Archives | Page 7 of 8

Apache Spark DataFrame caching with Alluxio

October 27, 2017

Many organizations deploy Alluxio together with Spark for performance gains and data manageability benefits. Qunar recently deployed Alluxio in production, and their Spark streaming jobs sped up by 15x on average and up to 300x during peak times. They noticed that some Spark jobs would slow down or would not finish, but with Alluxio, those jobs could finish quickly. In this blog post, we investigate how Alluxio helps Spark be more effective. Alluxio increases performance of Spark jobs, helps Spark jobs perform more predictably, and enables multiple Spark jobs to share the same data from memory.

Tags: apache spark, benchmark, caching, case study, performance

Alluxio at Spark Summit EU 2017

October 26, 2017 by Gene Pang

We briefly introduce Alluxio and present different ways Alluxio can help Spark jobs, along with best practices. We also discuss how Alluxio can be deployed and used with a Spark data processing pipeline in the cloud.

Tags: alluxio engineering, apache spark, architecture, aws s3, cloud, cloud storage, conference, developer tips, hybrid cloud, machine learning, rdd

Accelerating Spark Workloads in a Mesos Environment

October 26, 2017 by Gene Pang

MesosCon Europe 2017 – Gene Pang discusses the architecture of Mesos, Spark and Alluxio to achieve an optimal architecture for enterprises.

Tags: alluxio engineering, apache spark, architecture, aws s3, ceph, compute, conference, data, data engineering, Google Cloud Storage, hdfs, spark, storage, unified namespace

Best Practices for Using Alluxio with Apache Spark

June 6, 2017

Spark Summit SF 2017 – We briefly introduce Alluxio and present different ways Alluxio can help Spark jobs, along with best practices. We also discuss how Alluxio can be deployed and used with a Spark data processing pipeline in the cloud.

Tags: alluxio engineering, apache spark, aws, aws s3, cloud, cloud storage, conference, machine learning, spark

Alluxio (Formerly Tachyon): Unify Data At Memory Speed

April 2, 2017 by Gene Pang

Global Big Data Conference 2017 – In the past year, the Alluxio project experienced significant improvement in performance and scalability and was extended with key new features including tiered storage, transparent naming, and unified namespace

Tags: alluxio engineering, apache spark, big data, compute, conference, data, data engineering, performance, scale, spark, storage, tiered storage

Alluxio at Strata + Hadoop World San Jose 2017

March 16, 2017 by Calvin Jia

Calvin Jia introduces Alluxio, explain how Alluxio can help Spark be more effective, show benchmark results with Spark RDDs and DataFrames, and describe production deployments with both Alluxio and Spark working together.

Tags: alluxio engineering, apache spark, aws s3, ceph, conference, data, data engineering, data orchestration, Gluster, Google Cloud Storage, hdfs, NFS, performance, scale, spark, storage

Alluxio at Spark Summit East 2017

February 9, 2017 by Haoyuan Li & William Callaghan [eSentire]

In this talk, we briefly introduce Alluxio, present several ways how Alluxio can help Spark be more effective, show benchmark results with Spark RDDs & DataFrames, and describe production deployments with both Alluxio and Spark working together.

Tags: alluxio engineering, apache spark, architecture, big data, cloud, compute storage separation, conference, data, performance, rdd, spark, storage

Arimo Leverages Alluxio’s In-Memory Capability, Improving Time-to-Results for Deep Learning Models

November 25, 2016 By Arimo Team

Deep learning algorithms have traditionally been used in specific applications, most notably, computer vision, machine translation, text mining, and fraud detection. Deep learning truly shines when the model is big and trained on large-scale datasets. Meanwhile, distributed computing platforms like Spark are designed to handle big data and have been used extensively. Therefore, by having deep learning available on Spark, the application of deep learning is much broader, and now businesses can fully take advantage of deep learning capabilities using their existing Spark infrastructure.

Tag: apache spark