This blog explores an innovative platform with Presto as the computing engine and Alluxio as a data orchestration layer between Presto and S3 storage, to support online services with instantaneous response within the gaming industry. The preliminary results show that Presto with Alluxio outperforms S3 significantly in all cases.Alluxio with metadata caching shows up to 5.9x performance gain when handling large numbers of small files.
This article described how engineers at datasapiens brought down S3 API costs by 200x by implementing Alluxio as a data orchestration layer between S3 and Presto.
Testing Methodology Decision support workload is a typical workload that models multiple aspects of a decision support system, including queries and data maintenance. We selected 54 queries that represent a typical SQL query behavior in Hadoop for the test. The tests include three different configurations: Without Alluxio, Alluxio on PMem and Alluxio on DRAM. The … Continued
This article describes how Alluxio accelerates the training of deep learning models in a hybrid cloud environment with Intel’s Analytics Zoo open source platform, powered by oneAPI. Details on the new architecture and workflow, as well as Alluxio’s performance benefits and benchmarks results will be discussed.
This article describes how Alluxio can accelerate the training of deep learning models in a hybrid cloud environment when using Intel’s Analytics Zoo open source platform, powered by oneAPI. Details on the new architecture and workflow, as well as Alluxio’s performance benefits and benchmarks results will be discussed.
China Unicom is one of the five largest telecom operators in the world. China Unicom’s booming business in 4G and 5G networks has to serve an exploding base of hundreds of millions of smartphone users. This unprecedented growth brought enormous challenges and new requirements to the data processing infrastructure. The previous generation of its data processing system was based on IBM midrange computers, Oracle databases, and EMC storage devices. This architecture could not scale to process the amounts of data generated by the rapidly expanding number of mobile users. Even after deploying Hadoop and Greenplum database, it was still difficult to cover critical business scenarios with their varying massive data processing requirements.
The data engineering team at Bazaarvoice, a software-as-a-service digital marketing company based in Austin, Texas, must handle data at massive Internet-scale to serve its customers. Facing challenges with scaling their storage capacity up and provisioning hardware, they turned to Alluxio’s tiered storage system and saw 10x acceleration of their Spark and Hive jobs running on AWS S3.
In this whitepaper you’ll learn:
- How to build a big data analytics platform on AWS that includes technologies like Hive, Spark, Kafka, Storm, Cassandra, and more
- How to setup a Hive metastore using a storage tier for hot tables
- How to leverage tiered storage for maximized read performance
In this article, Thai Bui from Bazaarvoice describes how Bazaarvoice leverages Alluxio to build a tiered storage architecture with AWS S3 to maximize performance and minimize operating costs on running Big Data analytics on AWS EC2.
The Alluxio sandbox is the easiest way to test drive the popular data analytics stack of Spark, Alluxio, and S3 deployed in a multi-node cluster in a public cloud environment. The sandbox cluster is fully configured and ready for users to run applications ranging from the hello-world example to the TPC-DS benchmark suite. Don’t take our word for it; kick off the benchmark yourself to see the performance benefits of running Spark jobs that interface through Alluxio on S3 compared to running Spark jobs directly on S3. It is extremely easy to request and launch a sandbox cluster as a playground for 24 hours at no cost to you.