For today’s blog post I interviewed Bin Fan, Founding Engineer and VP of Open Source at Alluxio. Bin is the PMC maintainer of the Alluxio open source project. Prior to Alluxio, he worked for Google on the next-generation storage infrastructure. Bin received his Ph.D. in Computer Science from Carnegie Mellon University on the design and … Continued
In this talk, we present: trends and challenges in the data ecosystem in cloud era; Data engineering in the cloud with data orchestration; Use cases of using tech stacks (Presto or Tensorflow) with Alluxio on S3.
Haoyuan Li offers an overview of a data orchestration layer that provides a unified data access and caching layer for single cloud, hybrid, and multicloud deployments.
Today’s current pace of innovation is hindered by the necessity of reinventing the wheel in order for applications to efficiently access data. When an engineer or scientist wants to write an application to solve a problem, he or she needs to spend significant effort on getting the application to access the data efficiently and effectively, rather than focusing on the algorithms and the application’s logic.
I recently worked on a PoC evaluating Nomad for a client. Since there were certain constraints limiting what was possible on the client environment, I put together something “quick” on my personal workstation to see what was required for Alluxio to play nice with Nomad.
Alluxio is a new layer on top of under storage systems that can not only improve raw I/O performance but also enables applications flexible options to read, write and manage files. This article focuses on describing different ways to write files to Alluxio, realizing the tradeoffs in performance, consistency, and also the level of fault tolerance compared to HDFS.
Alluxio is an open-source data orchestration system widely used to speed up data-intensive workloads in the cloud. Alluxio v2.0 introduced Replicated Async Write to allow users to complete writes to Alluxio file system and return quickly with high application performance, while still providing users with peace of mind that data will be persisted to the chosen under storage like S3 in the background.
we introduce Tachyon, a memory centric fault-tolerant distributed file system, which enables reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce.
This meetup presents an overview of the motivations and design decisions behind the major changes in the Alluxio 2.0 release, and Real-time Data Processing for Sales Attribution Analysis with Alluxio, Spark and Hive at VIPShop.