scale Archives | Alluxio

Testing Distributed System at Scale for the Cost of a Large Pizza on AWS

February 25, 2020

Building distributed systems is no small feat. Software testing is just one of many critical practices that engineers who build these systems need to utilize to ensure the quality and usability of their software. For distributed systems, scaling out testing frameworks to ensure that enterprises who run our in highly distributed environments is a complicated (and expensive task!)

Tags: aws, distributed systems, office hour, scale, testing

Bay Area Meetup: Alluxio 2.0 Deep Dive and Near Real-time Analytics with Spark

July 23, 2019

This meetup presents an overview of the motivations and design decisions behind the major changes in the Alluxio 2.0 release, and Real-time Data Processing for Sales Attribution Analysis with Alluxio, Spark and Hive at VIPShop.

Tags: alluxio engineering, apache hadoop, apache spark, compute, compute storage separation, data, data orchestration, hadoop, hdfs, meetup, scale, spark, storage

Efficient Data Engineering with Apache Spark, Hive, and Alluxio on S3

Alluxio Meetup | Austin * August 15, 2019

Welcome to the first event of the Cloud, Data, & Orchestration Austin Meetup! This meetup will feature two talks and an opportunity to engage with other data engineers, developers, and Alluxio users. Thanks to Bazaarvoice for hosting!

Scalable Filesystem Metadata Services with RocksDB

July 22, 2019

Alluxio maintainer and founding engineer Calvin Jia presents on Scalable Filesystem Metadata Services with RocksDB at the RocksDB meetup at Twitter.

Tags: alluxio engineering, meetup, metadata management, performance, scale, storage, unified namespace

Building fast and scalable big data and ML platforms at Pinterest and JD.com

June 21, 2019 by Calvin Jia & Yongsheng Wu [Pinterest]

This talk shares our design, implementation and optimization of Alluxio metadata service to address the scalability challenges, focusing on how to apply and combine techniques including tiered metadata storage (based on off-heap KV store RocksDB), fine-grained file system inode tree locking scheme, embedded state-replicate machine (based on RAFT), exploration and performance tuning in the correct RPC frameworks (thrift vs gRPC) and etc.

Tags: aws s3, data, machine learning, meetup, metadata management, performance, scale, tiered storage

Building fast and scalable big data and ML platforms at Pinterest and JD.com

Bay Area Meetup * June 19, 2019

This Alluxio Meetup features a chance to interact with other Alluxio users and developers, as well as three talks. Thanks to our joint host Data Council!

Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage System

Data by the Bay San Francisco * May 17, 2016

The goal is to make Alluxio accessible to an even wider set of users through a focus on security, new language bindings, and further increased stability. In addition, the team is working on new APIs to allow applications to access data more efficiently and manage data across different under storage systems.

Past, Present and Future of Alluxio [Chinese]

Shanghai Meetup * July 28, 2016

The Alluxio project has greatly improved system performance, Scalability and user experience, and added a series of new features, including scalable tiered storage, transparent UFS data reading and writing, unified namespaces, and more. Easy to use with Alluxio. At the same time, the Alluxio ecosystem has expanded to support different storage systems and computing frameworks. Alluxio now supports a variety of storage systems, including Amazon S3, Google Cloud Storage, Gluster, Ceph, HDFS, NFS and OpenStack Swift, as well as big data processing frameworks such as Spark, MapReduce, Flink and more. These integrations allow Alluxio to manage and help with more and more complex data.

Tag: scale