caching Archives | Page 3 of 5

Getting Started with the Alluxio-Presto Sandbox

July 11, 2019 By Zac Blanco

The Alluxio-Presto sandbox is a docker application featuring installations of MySQL, Hadoop, Hive, Presto, and Alluxio. The sandbox lets you easily dive into an interactive environment where you can explore Alluxio, run queries with Presto, and see the performance benefits of using Alluxio in a big data software stack.

Some compute frameworks like Apache Spark have single node-caching, how is Alluxio different than single-node caching?

While single node caching may be sufficient for some users, for many it does not improve the performance meaningfully. By definition, a single node cache is limited to what that single node has accessed. Also, most frameworks with a single node cache typically do not leverage the SSD or HDD in the node. Alluxio is … Continued

Running Presto with Alluxio on Amazon EMR

Alluxio Community Office Hour - May * May 21, 2019

Many organizations are leveraging EMR to run big data analytics on public cloud. However, reading and writing data to S3 directly can result in slow and inconsistent performance. Alluxio is a data orchestration layer for the cloud, and in this use case it caches data for S3, ensuring high and predictable performance as well as reduced network traffic.

Building a Distributed Data Access Layer for Analytics on Any Cloud

Data Council SF * April 18, 2019

In this talk, we will focus on Alluxio design, its architecture, data flow and metadata flow. We will dive into the choices in its design space and share the experiences when implementing features like data tiering, storage options and cache eviction policies. We will also share our lessons in design, implementation and operation when working to build an open source distributed storage systems with 900 contributors for 5+ years.

Two Ways to Keep Files in Sync Between Alluxio and HDFS

April 16, 2019 By David Zhu

Alluxio provides a distributed data access layer for applications like Spark or Presto to access different underlying file system (or UFS) through a single API in a unified file system namespace. If users only interact with the files in the UFS through Alluxio, since Alluxio has knowledge of any changes the client makes to the UFS, it will keep Alluxio namespace in sync with the UFS namespace.

Unified Namespace and Tiered Storage in Alluxio

Strata+Hadoop World San Jose * March 30, 2016

Calvin Jia and Jiri Simsa explain how the current Alluxio tiered storage can be easily configured to use memory, SSDs, and hard drives in different tiers. Alluxio users and administrators do not have to manually migrate the data because data in Alluxio is managed transparently between all the configured tiers, similar to the way the CPU manages L1, L2, and lower-level caches. Meanwhile, Alluxio also provides users fine-grained control of manipulating data to plug in their own data-management strategies; users can also pin files in Alluxio to a specific storage or specify a TTL to files. Calvin and Jiri also describe the interface for managing heterogeneous data sources into the Alluxio namespace, which takes advantage of Alluxio’s ability to interoperate with different underlying storage systems such as HDFS, S3, GlusterFS, or Swift.

Unified Big Data Analytics – Any stack, Any Cloud

Boston Meetup * January 22, 2019

This presentation focuses on how Alluxio helps the big data analytics stack to be cloud-native. The trending Cloud object storage systems provide more cost-effective and scalable storage solutions but also different semantics and performance implications compared to HDFS. Applications like Spark or Presto will not benefit from the node-level locality or cross-job caching when retrieving data from the cloud object storage. Deploying Alluxio to access cloud solves these problems because data will be retrieved and cached in Alluxio instead of the underlying cloud or object storage repeatedly.

Interactive Big Data Analytics with the Presto + Alluxio stack for the Cloud

Alluxio Tech Talk * March 12, 2019

In this tech talk, we will introduce the Starburst Presto, Alluxio, and Cloud object store stack for building a highly-concurrent and low-latency analytics platform. This stack provides a strong solution to run fast SQL across multiple storage systems including HDFS, S3 and others in public cloud, hybrid cloud and multi cloud environments.

Tag: caching