hadoop Archives | Page 3 of 3

Unified Namespace and Tiered Storage in Alluxio

Strata+Hadoop World San Jose * March 30, 2016

Calvin Jia and Jiri Simsa explain how the current Alluxio tiered storage can be easily configured to use memory, SSDs, and hard drives in different tiers. Alluxio users and administrators do not have to manually migrate the data because data in Alluxio is managed transparently between all the configured tiers, similar to the way the CPU manages L1, L2, and lower-level caches. Meanwhile, Alluxio also provides users fine-grained control of manipulating data to plug in their own data-management strategies; users can also pin files in Alluxio to a specific storage or specify a TTL to files. Calvin and Jiri also describe the interface for managing heterogeneous data sources into the Alluxio namespace, which takes advantage of Alluxio’s ability to interoperate with different underlying storage systems such as HDFS, S3, GlusterFS, or Swift.

Fast big data analytics and machine learning using Alluxio and Spark in Baidu

Strata+Hadoop World San Jose * March 26, 2016

A few months ago, Baidu deployed Alluxio to accelerate its big data analytics workload. Bin Fan and Haojun Wang explain why Baidu chose Alluxio, as well as the details of how they achieved a 30x speedup with Alluxio in their production environment with hundreds of machines. Based on the success of the big data analytics engine, Baidu is currently expanding the Alluxio and Spark infrastructure to accelerate other applications, such as machine learning.

Alluxio: Unifying APIs, Accelerating ML, & Enabling Cloud Architectures

Bay Area Meetup * September 14, 2016

Using intermediate APIs means developers can learn just one framework and still access features offered by different technologies. It means writing job logic only once and being able to test it easily on a new underlying service with no effort. Not only is modularity a win for users but it means creators of execution frameworks and storage systems can focus on performance and capability without having to worry about API maintenance.

Using Alluxio to Improve Spark & Hadoop HDFS System Performance and Reliability [Chinese]

Hadoop Summit China 2017 * March 15, 2017

Using Alluxio to Improve Spark & Hadoop HDFS System Performance and Reliability [Chinese]

How to Use Alluxio to improve Spark and Hadoop HDFS Performance of Data Access and System Reliability [Chinese]

Database Technology Conference China 2017 * May 9, 2017

China Unicom is one of the five largest telecom operators in the world. China Unicom’s booming business in 4G and 5G networks has to serve an exploding base of hundreds of millions of smartphone users. This unprecedented growth brought enormous challenges and new requirements to the data processing infrastructure. The previous generation of its data processing system was based on IBM midrange computers, Oracle databases, and EMC storage devices. This architecture could not scale to process the amounts of data generated by the rapidly expanding number of mobile users. Even after deploying Hadoop and Greenplum database, it was still difficult to cover critical business scenarios with their varying massive data processing requirements. The complicated the architecture of its incumbent computing platform created a lot of new challenges to effectively use resources.

Alluxio (formerly Tachyon): An open source memory-speed virtual distributed storage system

December 7, 2016

Strata+Hadoop 2016 – In the past year, the Alluxio project experienced a tremendous improvement in performance and scalability and was extended with key new features including tiered storage, transparent naming, and unified namespace. At the same time, the Alluxio ecosystem has expanded to include support for more under storage systems and computation frameworks.

Tags: alluxio engineering, architecture, big data, cloud, compute storage separation, conference, hadoop, performance, scale, storage, strata, tiered storage, unified namespace

A Reliable Memory-Centric Distributed Storage System

October 16, 2015 by Haoyuan Li

Tachyon: A reliable memory-centric distributed storage system presentation by founder Haoyuan Li.

Tags: apache spark, big data, data, hadoop, mapreduce, performance, spark, storage

Tachyon: A Reliable Memory-Centric Distributed Storage System

July 15, 2015 by Bin Fan

We introduce Tachyon, a memory centric fault-tolerant distributed file system, which enables reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce.

Tags: apache spark, big data, data, hadoop, mapreduce, performance, spark, storage

Tag: hadoop