Products
Resource Hub
.png)
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
.jpeg)

Blog
.jpeg)
Blog
Four Different Ways to Write to Alluxio
Alluxio is a new layer on top of under storage systems that can not only improve raw I/O performance but also enables applications flexible options to read, write and manage files. This article focuses on describing different ways to write files to Alluxio, realizing the tradeoffs in performance, consistency, and also the level of fault tolerance compared to HDFS.
No items found.


Blog

Blog
Creating Grafana Dashboards to Visualize Alluxio Metrics
Monitoring metrics is highly important to operate distributed systems in production. Alluxio collects metrics using the Codahale Metrics Library on I/O throughput, RPC throughput, and resource usage. Alluxio metrics are shown in its webUI, but are also available through a REST endpoint or exportable to several third-party sinks in a time-series manner (see docs).
No items found.
.jpeg)

Blog
.jpeg)
Blog
Accelerating Writeintensive Data Workloads on AWS S3
Alluxio is an open-source data orchestration system widely used to speed up data-intensive workloads in the cloud. Alluxio v2.0 introduced Replicated Async Write to allow users to complete writes to Alluxio file system and return quickly with high application performance, while still providing users with peace of mind that data will be persisted to the chosen under storage like S3 in the background.
No items found.


Presentation

Presentation
Scalable Filesystem Metadata Services with RocksDB
Alluxio maintainer and founding engineer Calvin Jia presents on Scalable Filesystem Metadata Services with RocksDB at the RocksDB meetup at Twitter.
Alluxio provides a unified namespace where you can mount multiple different storage systems and access them through the same API. To serve the file system requests to operate on all the files and directories in this namespace, Alluxio masters must handle the file system metadata at a scale of all mounted systems combined. We are writing several engineering blogs describing the design and implementation of Alluxio master to address this scalability challenge. This is the first article focusing on metadata storage and service, particularly how to use RocksDB as an embedded persistent key-value store to encode and store the file system inode tree with high performance.
No items found.


Presentation

Presentation
Alluxio New York Meetup: Accelerating Analytical Workloads for Public & Hybrid Clouds
ALLUXIO NEW YORK MEETUP
The most innovative organizations like Uber, Twitter, and others have moved to disaggregated stacks – a separate tier for computational frameworks like Spark and Presto and a separate tier for Storage. And the need for more compute flexibility is making users move towards hybrid clouds.
In this meetup, Dipti and HY presented a new approach to hybrid analytical workloads using Alluxio, an open source data orchestration layer, which sits between compute and storage layer. Applications like Apache Spark or TensorFlow can then seamlessly access multiple disparate data sources with consistent performance using data locality and abstraction that the data orchestration tier brings.
Haoyuan Li (H.Y.), Alluxio
Haoyuan is the Founder and CTO of Alluxio. He graduated with a Computer Science Ph.D. from the AMPLab at UC Berkeley. At the AMPLab, he co-created and led Alluxio (formerly Tachyon), an open source virtual distributed file system. Before UC Berkeley, he got a M.S. from Cornell University and a B.S. from Peking University, all in Computer Science.
Dipti Borkar, Alluxio
Dipti Borkar is the VP of Product & Marketing at Alluxio with over 15 years experience in data and database technology across relational and non-relational. Prior to Alluxio, Dipti was VP of Product Marketing at Kinetica and Couchbase. Dipti holds a M.S. in Computer Science from the UC San Diego, and an MBA from the Haas School of Business at UC Berkeley.
No items found.
.jpeg)

Blog
.jpeg)
Blog
The Practice of Alluxio in Ctrip RealTime Computing Platform
Today, real-time computation platform is becoming increasingly important in many organizations. In this article, we will describe how ctrip.com applies Alluxio to accelerate the Spark SQL real-time jobs and maintain the jobs’ consistency during the downtime of our internal data lake (HDFS). In addition, we leverage Alluxio as a caching layer to dramatically reduce the workload pressure on our HDFS NameNode.
Large Scale Analytics Acceleration
.jpeg)

Blog
.jpeg)
Blog
Getting Started with the AlluxioPresto Sandbox
The Alluxio-Presto sandbox is a docker application featuring installations of MySQL, Hadoop, Hive, Presto, and Alluxio. The sandbox lets you easily dive into an interactive environment where you can explore Alluxio, run queries with Presto, and see the performance benefits of using Alluxio in a big data software stack.
Large Scale Analytics Acceleration
.jpeg)

Blog
.jpeg)
Blog
2.0 is here! Embrace silos orchestrate data accelerate innovation
Here in New York, at the AWS Summit, we are super excited to announce that Alluxio 2.0 is here, our most major release since the Alluxio launch. A couple months ago, we released 2.0 Preview - which included some of the capabilities, but 2.0 now includes even more, to continue building on to our data orchestration approach for the cloud.
No items found.
.jpeg)

Blog
.jpeg)
Blog
Turn cloud storage or HDFS into your local file system for faster AI model training with TensorFlow
This article aims to provide a different approach to help connect and make distributed files systems like HDFS or cloud storage systems look like a local file system to data processing frameworks: the Alluxio POSIX API. To explain the approach better, we used the TensorFlow + Alluxio + AWS S3 stack as an example.
Large Scale Analytics Acceleration
Model Training Acceleration


Blog

Blog
Recap: Presto Summit SF 2019
Alluxio is a proud sponsor and exhibitor at the Presto Summit in San Francisco. If you missed the conference, don’t worry we’ve got you covered!
Hybrid Multi-Cloud
Large Scale Analytics Acceleration


Presentation

Presentation
Alluxio at Beijing Meetup
Open Source data orchestration for ai, big data, and cloud
Haoyuan Li presents at Beijing Meetup on open source data orchestration and the value of leveraging Alluxio with rising trends driving the need for a new architecture. Four big trends driving this need: Separation of compute & storage, hybrid-multi cloud environments, rise of object store and self-service data across the enterprise.
Separation of compute and storage creates new challenges in how data is managed and orchestrated across frameworks, clouds, and storage systems. Utilizing a unified data orchestration platform simplifies your data’s cloud journey.
No items found.


Presentation

Presentation
Community Office Hour: Running Spark & Alluxio in Kubernetes
ALLUXIO COMMUNITY OFFICE HOUR
Kubernetes is widely used to orchestrate computation with improved flexibility and portability for computation in public or hybrid cloud environments across infrastructure providers. However, running data-intensive workloads introduces challenges such as efficiently moving data to compute frameworks, accessing data from multiple or remote clouds, and co-locating data with compute.
Alluxio solves these problems as a new data orchestration layer bridging the gap between data locality with improved performance and data accessibility for analytics workloads in Kubernetes, and enables portability across storage providers.
In this Office Hour:
- Overview of Alluxio and the cloud use case with Spark in Kubernetes
- How to set up Alluxio and Spark to run in Kubernetes
- Open Session for discussion on any topics such as solving the separation of compute and storage problem, and more
No items found.
Your selections don't match any items.
.jpeg)