Top 5 Performance Tuning Tips for Running Presto on Alluxio

Presto is an open source distributed SQL engine widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Alluxio is an open-source distributed file system that provides a unified data access layer at in-memory speed. The combination of Presto and Alluxio is getting more popular in many companies like JD, NetEase to leverage Alluxio … Continued

Achieving 10x acceleration of Spark and Hive Jobs on AWS S3 with Alluxio Tiered Storage

The data engineering team at Bazaarvoice, a software-as-a-service digital marketing company based in Austin, Texas, must handle data at massive Internet-scale to serve its customers. Facing challenges with scaling their storage capacity up and provisioning hardware, they turned to Alluxio’s tiered storage system and saw 10x acceleration of their Spark and Hive jobs running on AWS S3.

In this whitepaper you’ll learn:

  • How to build a big data analytics platform on AWS that includes technologies like Hive, Spark, Kafka, Storm, Cassandra, and more
  • How to setup a Hive metastore using a storage tier for hot tables
  • How to leverage tiered storage for maximized read performance

Tags: , , , , , ,

Hedge Fund Improves Machine Learning Model Performance 4X with Alluxio

Quantitative hedge funds process large data sets with sophisticated financial models to drive investment decisions. Machine Learning is used to continuously improve models and maximize financial return. One firm with billions ($US) of assets under management turned to Alluxio to address the performance and cost challenges of large scale data processing in a hybrid cloud … Continued

Tags: , , ,

New Whitepaper: Structured Big Data Federation

Enterprises are adopting big data technologies to analyze and derive insight from their growing volumes of structured and unstructured data. A familiar problem is the requirement to analyze data from multiple independent storage silos concurrently. In order to consolidate the data, large enterprises typically use custom solutions or build a data lake. These approaches present … Continued

Effective Spark DataFrames with Alluxio

Introduction Many organizations deploy Alluxio together with Spark for performance gains and data manageability benefits. Qunar recently deployed Alluxio in production, and their Spark streaming jobs sped up by 15x on average and up to 300x during peak times. They noticed that some Spark jobs would slow down or would not finish, but with Alluxio, those … Continued

Tags: , , , ,