caching Archives | Page 2 of 5

Alluxio-FUSE as a data access layer for Dask

April 27, 2021

At Aspect Analytics we intend to use Dask, a distributed computation library for Python, to deal with MSI data stored as large tensors. In this talk we explore using Alluxio and Alluxio FUSE as a data consolidation and caching layer for some of our bioinformatics workflows.

Tags: alluxio day, aspect analytics, caching, dask, fuse

Community Office Hour: Improving Memory Utilization of Spark Jobs Using Alluxio

November 26, 2019

Many Spark users may not be aware of the differences in memory utilization between caching data directly in-memory into the Spark JVM versus storing data off-heap via an in-memory storage service like Alluxio. In this office hour, I will highlight the two approaches with a demo and open up for discussions

Tags: caching, memory, office hour, spark

Tutorial: Presto + Alluxio + Hive Metastore on Your Laptop in 10 min

October 23, 2019 By Bin Fan

This tutorial guides users to set up a stack of Presto, Alluxio and Hive Metastore on your local server, and it demonstrates how to use Alluxio as the caching layer for Presto queries.

Building a Large-scale Interactive SQL Query Engine using Presto and Alluxio in JD.com

September 24, 2019 By Baolong Mao

This article describes how JD built this interactive OLAP platform combining two open-source technologies: Presto and Alluxio.

Accelerating Write-intensive Data Workloads on AWS S3

August 7, 2019 By Zac Blanco and Bin Fan

Alluxio is an open-source data orchestration system widely used to speed up data-intensive workloads in the cloud. Alluxio v2.0 introduced Replicated Async Write to allow users to complete writes to Alluxio file system and return quickly with high application performance, while still providing users with peace of mind that data will be persisted to the chosen under storage like S3 in the background.

NetEase and Alluxio joint meetup

Hangzhou Meetup * July 26, 2019

Joint meetup in Hangzhou discusses: An introduction to new features of big data storage system Alluxio and optimization of cache performance, Practice & exploration of Spark & Alluxio, and the Interactive query system Impala.

Efficient Data Engineering with Apache Spark, Hive, and Alluxio on S3

Alluxio Meetup | Austin * August 15, 2019

Welcome to the first event of the Cloud, Data, & Orchestration Austin Meetup! This meetup will feature two talks and an opportunity to engage with other data engineers, developers, and Alluxio users. Thanks to Bazaarvoice for hosting!

Tag: caching