Products
Resource Hub
.png)
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
.jpeg)

Blog
.jpeg)
Blog
Accelerating Analytics by 200% with Impala, Alluxio, and HDFS at Tencent
In this article, Honghan Tian describes how engineers in the Data Service Center (DSC) at Tencent PCG (Platform and Content Business Group) leverages Alluxio to optimize the analytics performance and minimize the operating costs in building Tencent Beacon Growing, a real-time data analytics platform.
Large Scale Analytics Acceleration


Blog

Blog
Efficient Model Training in the Cloud with Kubernetes, TensorFlow, and Alluxio
This article presents the collaboration of Alibaba, Alluxio, and Nanjing University in tackling the problem of Deep Learning model training in the cloud. Various performance bottlenecks are analyzed with detailed optimizations of each component in the architecture. Our goal was to reduce the cost and complexity of data access for Deep Learning training in a hybrid environment, which resulted in over 40% reduction in training time and cost.
Hybrid Multi-Cloud
GPU Acceleration
Model Training Acceleration


Blog

Blog
Alluxio Accelerates Deep Learning in Hybrid Cloud using Intels Analytics Zoo open source platform powered by oneAPI
This article describes how Alluxio can accelerate the training of deep learning models in a hybrid cloud environment when using Intel’s Analytics Zoo open source platform, powered by oneAPI. Details on the new architecture and workflow, as well as Alluxio’s performance benefits and benchmarks results will be discussed.
Model Training Acceleration
Hybrid Multi-Cloud
Large Scale Analytics Acceleration
.jpeg)

Blog
.jpeg)
Blog
Everything you want to know about how to decouple SQL engines from Hive Data Warehouse
Are you using SQL engines, such as Presto, to query existing Hive data warehouse and experiencing challenges including overloaded Hive Metastore with slow and unpredictable access, unoptimized data formats and layouts such as too many small files, or lack of influence over the existing Hive system and other Hive applications?
Large Scale Analytics Acceleration


Presentation

Presentation
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes Using Alluxio
In the on-prem days, one key performance optimization for Apache Hadoop or Apache Spark workloads is to run tasks on nodes with local HDFS data. However, while adoption of the Cloud & Kubernetes makes scaling compute workloads exceptionally easy, HDFS is often not an option. Effectively accessing data from cloud-native storage services like AWS S3 or even on-premises HDFS becomes harder as data locality is lost.
Originated from UC Berkeley AMPLab, the open source project Alluxio approaches this problem in a new way by helping to move data closer to compute workloads efficiently and on-demand, and unify data across multiple or remote clouds, and many more. This webinar will describe the concept and internal mechanism using the stack of Spark+Alluxio in Kubernetes to enhance data locality even when the storage service is outside or remote.
Particularly, we will go over:
- Why Spark is able to make a locality-aware schedule when working with Alluxio in K8s environment using the host network
- Why a pod running Alluxio can share data efficiently with a pod running Spark on the same host using domain socket and host path volume
- The roadmap of Alluxio to further improve running analytics jobs like Spark and Presto, including the on-going closer integration with Presto
No items found.
Your selections don't match any items.
.jpeg)
.jpeg)