Products
Resource Hub
.png)
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.


Presentation

Presentation
Online Meetup: AWS S3 + Alluxio + Presto = ❤️ The Ryte Use Case
At Ryte, we analyze unstructured, semi-structured and structured data for more than one million users worldwide. The whole Ryte-Platform is built with a scalable architecture to support our heavy load and make it possible for our customers to drill-down from a high-level overview into the last byte of their websites.
In this presentation, I will show why & how we solve some challenging technical issues, improve the speed, and reduce costs of our AWS EMR Hadoop & Presto -Backend with Alluxio to an awesome level!
Topics:
- What is Ryte: Platform to optimize your Online-Marketing
- Requirements for the Ryte-Platform
- Why we use Presto on AWS EMR with S3
- When problems pop-up
- How we solve them with Alluxio in a perfect way
No items found.
.jpeg)

Blog
.jpeg)
Blog
QA with Alluxios Bin Fan on Data Orchestration Cloud Migration and Data Engineering Challenges
For today’s blog post I interviewed Bin Fan, Founding Engineer and VP of Open Source at Alluxio. Bin is the PMC maintainer of the Alluxio open source project. Prior to Alluxio, he worked for Google on the next-generation storage infrastructure.
No items found.


Presentation

Presentation
Alluxio – Data Orchestration for Analytics and AI in the Cloud
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Data storage is migrating from the colocated model (e.g., HDFS) to a more cost-effective, scalable but often fully disaggregated and remote data lake model (e.g. S3). This has created a strong need for data orchestration in the cloud like what K8s does for container-based workloads, so that data can be presented in the right layout at right location for data applications on the cloud. Originally developed from UC Berkeley AMPLab project “Tachyon”, Alluxio (www.alluxio.io) implements the world’s first open-source data orchestration system in the cloud: an unified access layer for data-driven applications in bigdata and ML, enabling Spark, Presto or TensorFlow to transparently access different external storage systems while actively leveraging in-memory cache to accelerate data access. In this talk, we will present: trends and challenges in the data ecosystem in cloud era; Data engineering in the cloud with data orchestration; Use cases of using tech stacks (Presto or Tensorflow) with Alluxio on S3
No items found.
.jpeg)

Blog
.jpeg)
Blog
Effective Analytical Pipelines on AWS Using EMR Alluxio and S3
This article describes my lessons from a previous project which moved a data pipeline originally running on a Hadoop cluster managed by my team, to AWS using EMR and S3. The goal was to leverage the elasticity of EMR to offload the operational work, as well as make S3 a data lake where different teams can easily share data across projects.
Large Scale Analytics Acceleration
.jpeg)

Blog
.jpeg)
Blog
Implementing a Secure Plugandplay Distributed File System Service Using Alluxio in Baidu
In this article, you will learn how to incorporate Alluxio to implement a unified distributed file system service as well as how to add extensions on top of Alluxio including customized authentication schemes and UDF (user-defined functions) on Alluxio files.
No items found.


Presentation

Presentation
360 & Alluxio Joint Meetup: Distributed Storage and Alluxio Application
360 & ALLUXIO JOINT MEETUP
Using Alluxio POSIX (FUSE) API in JD.com
- Alluxio FUSE landing in Jingdong
- Deep analysis of Alluxio FUSE principle and architecture
- How to improve POSIX compatibility of Alluxio FUSE
- JD’s contribution to the Alluxio community
No items found.


Presentation

Presentation
Bay Area Meetup: Interactive Analytics in the Cloud with Presto and Alluxio
ALLUXIO BAY AREA MEETUP
This talk describes a stack to combine Presto, Alluxio, and Cloud object storage systems (e.g.,AWS S3) for high-concurrent and low-latency SQL queries on big data on the cloud. Presto, an open-source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Alluxio is an open-source data orchestration that brings data closer to compute and provides a unified data access layer at in-memory speeds. Presto can use Alluxio as a distributed caching tier on top of S3 for the hot data to query, avoiding reading data repeatedly from the cloud.
This talk covers:
- The architecture of Presto, its separation of compute and storage, cloud-readiness, recent advancements in the project such as Cost-Based Optimizer and Kubernetes Support.
- An overview of Alluxio’s key concepts, architecture and data flow,
- Presto and Alluxio production use cases running hundreds of nodes, including ING Bank, JD.com, and NetEase Games.
No items found.


Presentation

Presentation
Austin Meetup: Efficient Data Engineering with Apache Spark, Hive, and Alluxio on S3
Cloud, Data, & Orchestration – Austin Meetup
At Bazaarvoice, a software-as-a-service digital marketing company, the data engineering team is tasked to handle data at massive Internet-scale to serve over 1,900 of the biggest internet retailers and brands.
We built our data pipelines all in the cloud using Apache Spark and Hive on AWS EC2 accessing data in S3. AWS enables us to scale “out” the infrastructure capacity effortlessly to keep up with the Internet-scale data and web traffic, but scaling out also exposes certain limitations like the ability to further scale “up”. While this cloud native stack is scalable and elastic we experience performance limitations, because data access is limited by the network bandwidth, and this is exacerbated for workloads that involve repeated queries.
To address the data access challenges, we leverage Alluxio, an open source data orchestration system for analytics in the cloud. Alluxio serves as a transparent caching layer for hot and warm data, such that Hive and Spark jobs are able to access all data transparently in S3. We have seen 10x performance acceleration of Spark and Hive jobs on S3 with Alluxio.
No items found.
.jpeg)

Blog
.jpeg)
Blog
Four Different Ways to Write to Alluxio
Alluxio is a new layer on top of under storage systems that can not only improve raw I/O performance but also enables applications flexible options to read, write and manage files. This article focuses on describing different ways to write files to Alluxio, realizing the tradeoffs in performance, consistency, and also the level of fault tolerance compared to HDFS.
No items found.


Blog

Blog
Creating Grafana Dashboards to Visualize Alluxio Metrics
Monitoring metrics is highly important to operate distributed systems in production. Alluxio collects metrics using the Codahale Metrics Library on I/O throughput, RPC throughput, and resource usage. Alluxio metrics are shown in its webUI, but are also available through a REST endpoint or exportable to several third-party sinks in a time-series manner (see docs).
No items found.
.jpeg)

Blog
.jpeg)
Blog
Accelerating Writeintensive Data Workloads on AWS S3
Alluxio is an open-source data orchestration system widely used to speed up data-intensive workloads in the cloud. Alluxio v2.0 introduced Replicated Async Write to allow users to complete writes to Alluxio file system and return quickly with high application performance, while still providing users with peace of mind that data will be persisted to the chosen under storage like S3 in the background.
No items found.


Presentation

Presentation
Scalable Filesystem Metadata Services with RocksDB
Alluxio maintainer and founding engineer Calvin Jia presents on Scalable Filesystem Metadata Services with RocksDB at the RocksDB meetup at Twitter.
Alluxio provides a unified namespace where you can mount multiple different storage systems and access them through the same API. To serve the file system requests to operate on all the files and directories in this namespace, Alluxio masters must handle the file system metadata at a scale of all mounted systems combined. We are writing several engineering blogs describing the design and implementation of Alluxio master to address this scalability challenge. This is the first article focusing on metadata storage and service, particularly how to use RocksDB as an embedded persistent key-value store to encode and store the file system inode tree with high performance.
No items found.
Your selections don't match any items.