architecture Archives | Page 2 of 6

Improve Presto Architectural Decisions with Shadow Cache

October 12, 2021

This talk describes the design of shadow cache, a lightweight component to track the working set size of Alluxio cache. Shadow cache can keep track of the working set size over the past window dynamically, and is implemented by a series of bloom filters. We’ve deployed the shadow cache in Facebook Presto and leverage the result to understand the system bottleneck and help with routing design decisions.

Tags: alluxio day, architecture, cache, facebook, presto, shadow cache

Alluxio Architecture and Scaling Performance

December 13, 2020

In this talk, I will introduce the high-level architecture of the current system, and present the various components of Alluxio. Also, I will discuss some of the main challenges of large scale Alluxio deployments, and the lessons we learned from those environments. This talk will detail some of the major scalability improvements added in the past several months, and how users can benefit from the changes.

Tags: architecture, data orchestration, data orchestration summit, scalability

360 & Alluxio Joint Meetup: Distributed Storage and Alluxio Application

September 4, 2019

360 & Alluxio joint meetup in Beijing covers topics on distributed storage and Alluxio application practice.

Tags: architecture, community, meetup

NetEase and Alluxio joint meetup

Hangzhou Meetup * July 26, 2019

Joint meetup in Hangzhou discusses: An introduction to new features of big data storage system Alluxio and optimization of cache performance, Practice & exploration of Spark & Alluxio, and the Interactive query system Impala.

Evolution of big data stacks under computational and storage separation architecture

Shanghai * May 19, 2019

A new generation of open source big data, represented by Alluxio, born at the University of California at Berkeley, looks at this issue. Different from systems such as designing storage tight coupling to achieve low-cost reliable storage HDFS, by providing a virtual data storage layer defined and implemented by software for data applications, abstracting and integrating cloudy, hybrid cloud, multi-data center and other environments The underlying files and objects, and through intelligent workload analysis and data management, make data close to computing and provide data locality, big data and machine learning applications can be achieved with the same performance and lower cost.

Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with Disaggregated Compute and Storage

Alluxio | SwiftStack Tech Talk * March 2, 2019

Enterprises are increasingly looking towards object stores to power their big data & machine learning workloads in a cost-effective way. The combination of SwiftStack and Alluxio together, enables users to seamlessly move towards a disaggregated architecture.

Moving From Apache Thrift to gRPC: A Perspective From Alluxio

April 13, 2019 By Gokturk Gezer and Bin Feng

As part of the Alluxio 2.0 release, we have moved our RPC framework from Apache Thrift to gRPC. In this article, we will talk about the reasons behind this change as well as some lessons we learned along the way.
In Alluxio 1.x, the RPC communication between clients and servers is built mostly on top of Apache Thrift. Thrift enabled us to define Alluxio service interface in simple IDL files and implement client binding using native Java interfaces generated by Thrift compiler. However, we faced several challenges as we continued developing new features and improvements for Alluxio.

Alluxio: Solving the Framework-Storage Gap in Big Data

DSI Conference San Mateo * June 12, 2016

In this talk, Haoyuan Li, co-creator of Tachyon (and a founding committer of Spark) and CEO of Tachyon Nexus will explain how the next wave of innovation in storage will be driven by separating the functional layer from the persistent storage layer, and how memory-centric architecture through Tachyon is making this possible. Li will describe the future of distributed file storage and highlight how Tachyon supports specific use cases.

Modern Software Architectures and Data Pipelines

Scalæ By The Bay San Francisco * November 12, 2016

Throughout our four-year history, Scala and Scale By the Bay is leading the way on evangelizing and understansing modern software architectures. We have the best set of them here, including Akka, Kafka, Spark, Finagle, Lagom, and so on. How do they come together in a SMACK / MIND Stack? What are the best practices to follow and pitfalls to avoid? This panels of experienced practitioners will discuss and illuminate it all.

Tag: architecture