MOMO: Accelerating Ad Hoc Analysis with Spark SQL and Alluxio

This post is guest authored by our friends at MOMO: Haojun (Reid) Chan and Wenchun Xu Data Analysis Trends The hadoop ecosystem makes many distributed system/algorithms easier to use and generally lowers the cost of operations. However, enterprises and vendors are never satisfied with that, so higher performance becomes the next issue. We considered several options … Continued

Hedge Fund Improves Machine Learning Model Performance 4X with Alluxio

Quantitative hedge funds process large data sets with sophisticated financial models to drive investment decisions. Machine Learning is used to continuously improve models and maximize financial return. One firm with billions ($US) of assets under management turned to Alluxio to address the performance and cost challenges of large scale data processing in a hybrid cloud … Continued

Tags: , , ,

Enabling Decoupled Compute and Storage with Alluxio

Enabling Decoupled Compute and Storage with Alluxio This blog explores the benefits Alluxio brings to data platforms, including: The trends behind the rise of decoupled compute-storage architectures How Alluxio addresses data access issues for decoupled compute-storage architectures An example of Alluxio’s benefits using a SparkSQL workload Motivation The primary appeal of a coupled compute-storage architecture, … Continued

Flexible and Fast Storage for Deep Learning with Alluxio

Flexible and Fast Storage with Alluxio for Deep Learning Introduction In the age of growing datasets and increased computing power, deep learning has become a popular technique for AI. Deep learning models continue to improve their performance across a variety of domains, with access to more and more data, and the processing power to train … Continued

Effective Spark DataFrames with Alluxio

Introduction Many organizations deploy Alluxio together with Spark for performance gains and data manageability benefits. Qunar recently deployed Alluxio in production, and their Spark streaming jobs sped up by 15x on average and up to 300x during peak times. They noticed that some Spark jobs would slow down or would not finish, but with Alluxio, those … Continued

Tags: , , , ,

Using Alluxio to Improve the Performance and Consistency of HDFS Clusters

Introduction Alluxio is the world’s first memory-speed virtual distributed storage system that bridges applications and underlying storage systems, providing unified data access orders of magnitudes faster than existing solutions. The Hadoop Distributed File System (HDFS) is a distributed file system for storing large volumes of data. HDFS popularized the paradigm of bringing computation to data … Continued

Effective Spark RDDs with Alluxio

Introduction Organizations like Baidu and Barclays have deployed Alluxio with Spark in their architecture, and have achieved impressive benefits and gains. Recently, Qunar deployed Alluxio with Spark in production and found that Alluxio enables Spark streaming jobs to run 15x to 300x faster. In their case study, they described how Alluxio improved their system architecture, and mentioned that … Continued

Getting Started with Alluxio and Spark

Introduction Spark has brought significant innovation to Big Data computing, but its results are even more extraordinary when paired with other open source projects in the ecosystem. Alluxio, formerly Tachyon, provides Spark with a reliable data sharing layer, enabling Spark to excel at performing application logic while Alluxio handles storage. For example, global financial powerhouse … Continued