Products
Resource Hub
.png)
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.


Presentation

Presentation
Hybrid Collaborative Tiered Storage with Alluxio
When an application reads data from AWS S3 or Alibaba Cloud OSS, it usually has serious performance problems, after all, it is through a remote network. Alluxio can provide a transparent data cache layer, automatic cache needs to read remote OSS/S3 data, but when does Alluxio itself pull remote data? Default all cache? Still on-demand caching? This PPT will introduce Alluxio’s hierarchical storage concept, combined with the ZFS system to maximize performance and reduce application development.
See results of 10x performance in Spark and Hive jobs that are running on AWS S3. Plus, learn how real world user Bazaarvoice implemented a tiered storage architecture for a boost in performance, enabling them to handle data at massive Internet-scale to serve its customers.
No items found.


Presentation

Presentation
Alluxio Overview: Unify Data at Memory Speed
Alluxio is an open source software solution that connects analytics applications to heterogeneous data sources through a data orchestration layer that sits between compute and storage. It runs on commodity hardware, creating a shared data layer abstracting the files or objects in underlying persistent storage systems. Applications connect to Alluxio via a standard interface, accessing data from a single unified source.
Haoyuan Li and Bin Fan discuss the data center challenges Alluxio addresses, the benefits provided, and an overview of how it works.
No items found.


Blog

Blog
A Better Big Data Ecosystem with Hadoop and Hitachi Content Platform Part1
This blog explores the challenges customers are facing with storing data long term in Hadoop, and discusses what the Hitachi Content Platform team is doing to help our customers solve these challenges with the help of Alluxio. Data is at the center of our digital world and for years Hadoop has been the go-to data processing platform because it is fast and scalable. While Hadoop has solved the data storage and processing problem for the last ~10 years, it achieves this by scaling storage and compute capacity in parallel. As a result, Hadoop environments have continued to expand compute capacity well beyond their needs as more and more of the storage is consumed by older, inactive data.
Cloud Cost Savings


Presentation

Presentation
Alluxio in MOMO, JD.com, TalkingData, and Vipshop [Chinese]
Alluxio in MOMO: Accelerating Ad Hoc Analysis
From our friends at MOMO
MOMO, a leading pan-entertainment social platform in China, has deployed Alluxio to accelerate ad-hoc query analytics. In the course of evaluating the best fit for Alluxio in their infrastructure they conducted several performance tests to understand how ad-hoc query analytics behaved in several scenarios. These tests give real-world insight to the performance benefits Alluxio provides. The MOMO findings include:
- With Alluxio, performance was improved 3-5x over the current mode
- Even when initially reading ‘cold’ data Alluxio delivered superior performance in most cases
- Alluxio can effectively scale-out to improve performance as requirements grow
No items found.
.jpeg)

Blog
.jpeg)
Blog
Effective caching for Spark RDDs with Alluxio
Recently, Qunar deployed Alluxio with Spark in production and found that Alluxio enables Spark streaming jobs to run 15x to 300x faster. In their case study, they described how Alluxio improved their system architecture, and mentioned that some existing Spark jobs would slow down or would never finish because they would run out of memory. After using Alluxio, those jobs were able to finish, because the data could be stored in Alluxio, instead of within Spark. In this blog, we show by saving RDDs in Alluxio, Alluxio can keep larger data sets in-memory for faster Spark applications, as well as enable sharing of RDDs across separate Spark applications.
Large Scale Analytics Acceleration
.jpeg)

Blog
.jpeg)
Blog
Starburst Presto Alluxio Better Together for Presto Caching
Presto was designed from the ground up to offer interactive analytics using a massively parallel processing SQL engine that can combine data from multiple sources using a variety of connectors. As more and more companies discover the power of “separation of storage and compute” along with querying the data where it lies, it’s not wonder Presto is being asked to add even more functionality. Alluxio focuses its innovation at the data layer as a key enabling technology for Presto and a wide range of analytics applications and use cases. Performance is always critical, but providing memory speed response time is only part of the solution. If the application can’t access the data, it’s of no use.
Large Scale Analytics Acceleration


Blog

Blog
Announcing Alluxio v1.8.0
We are excited to announce the release of Alluxio Enterprise Edition (AEE) and Community Edition (ACE) and Alluxio Open Source (AOS) v1.8.0. Click HERE to download! This release brings features and enhancements in Alluxio to simplify cloud adoption (and hybrid cloud, and migration from HDFS to object storage) for analytics and machine learning and improve useability. To help make it easier to get started using Alluxio, we have also collected a set of resources into a starter kit. The second item is a simple tutorial for how to mount a remote AWS S3 bucket and accelerate data access.
No items found.
.jpeg)

Blog
.jpeg)
Blog
Data Location Awareness Optimize Performance and Lower Cost with Tiered Locality
Caching frequently used data in memory is not a new computing technique, however it is a concept that Alluxio has taken to the next level with the ability to aggregate data from multiple storage systems in a unified pool of memory. Alluxio capabilities extend further to intelligently managing the data within that virtual data layer. Tiered locality uses awareness of network topology and configurable policies to manage data placement for performance and cost optimizations. This feature is particularly useful with cloud deployments across multiple availability zones. It can also be useful for cost savings in environments where cross-zone or cross-location traffic is more expensive than intra-zone data traffic.
Cloud Cost Savings
.jpeg)

Blog
.jpeg)
Blog
Asynchronous Caching in Alluxio High Performance for Partial Read Caching for Presto and Spark
An Alluxio cluster caches data from connected storage systems in memory to create a data layer that can be accessed concurrently by multiple application frameworks. This greatly improves performance for many analytics workloads. On-demand caching occurs when clients read blocks of data using a ‘CACHE’ read type from persistent storage systems connected to the Alluxio cluster. Prior to Alluxio v1.7, on-demand caching was on the critical path of read operations, requiring a full block to be read before the data was available for the application. Workloads which read partial blocks, for example SQL workloads, would be adversely affected on initial reads from connected storage.
Large Scale Analytics Acceleration


Blog

Blog
TalkingData Leading Data Broker in China Leverages Alluxio to Unify Terabytes of Data Across Disparate Data Sources
TalkingData leverages Alluxio as a single platform to manage all the data across disparate data sources on-premise and in the cloud. Alluxio removes the complexity of our environment by abstracting the different data sources and providing a unified interface. Applications simply interact with Alluxio, and Alluxio manages data access to different storage systems on behalf of the applications. Alluxio effectively democratizes data access, allowing data scientists and analysts in various business units to accomplish their goals without needing to consider where the data is located or having to go to central IT or the engineering team to transfer or prepare the data.
No items found.
.jpeg)

Blog
.jpeg)
Blog
Myntra Case Study Accelerating Analytics in the Cloud for Customized Mobile ECommerce
While looking for ways to streamline our data pipeline, we learned about Alluxio, an open source, memory speed, virtual distributed file system. We deployed Alluxio as the shared data layer for all of the intermediate stages in the data pipeline. By reading and writing data in Alluxio, the data can be read concurrently and stay in memory for the next stage of the pipeline. This increased the performance by speeding up the entire pipeline, and increased overall throughput of the pipeline allowing us to provide interactive response to our app users.
Large Scale Analytics Acceleration


Presentation

Presentation
Using Alluxio as a Fault-Tolerant Pluggable Optimization Component to Compute Frameworks of JD System
STRATA DATA CONFERENCE LONDON 2018
JD.com is China’s largest online retailer and its biggest overall retailer, as well as the country’s biggest internet company by revenue. Currently, JD.com’s BDP platform runs more than 400,000 jobs (15+ PB) daily, on a system with more than 15,000 cluster nodes and a total capacity of 210 PB.
Alluxio, formerly Tachyon, is the world’s first system that unifies disparate storage systems at memory speed. In the big data ecosystem, Alluxio lies between computation frameworks or jobs and various kinds of storage systems. Additionally, Alluxio’s memory-centric architecture enables data access orders of magnitude faster than existing solutions.
Alluxio has run in JD.com’s production environment on 100 nodes for six months. Mao Baolong, Yiran Wu, and Yupeng Fu explain how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFSURLs and Alluxio as a pluggable optimization component. To give just one example, one framework, JDPresto, has seen a 10x performance improvement on average. This work has also extended Alluxio and enhanced the syncing between Alluxio and HDFS for consistency.
No items found.


Blog

Blog
Tencent Case Study Delivering Customized News to Over 100 Million Users per Month with Alluxio
Tencent is one of the largest technology companies in the world and a leader in multiple sectors such as social networking, gaming, e-commerce, mobile and web portal. Tencent News, one of Tencent’s many offerings, strives to create a rich, timely news application to provide users with an efficient, high-quality reading experience. To provide the best experience to more than 100 million monthly active users of Tencent News, we leverage Alluxio with Apache Spark to create a scalable, robust, and performant architecture.
No items found.
.jpeg)

Blog
.jpeg)
Blog
MOMO Accelerating Ad Hoc Analysis with Spark SQL and Alluxio
Alluxio clusters act as a data access accelerator for remote data in connected storage systems. Temporarily storing data in memory, or other media near compute, accelerates access and provides local performance from remote storage. This capability is even more critical with the movement of compute applications to the cloud and data being located in object stores separate from compute. Caching is transparent to users, using read/write buffering to maintain continuity with persistent storage. Intelligent cache management utilizes configurable policies for efficient data placement and supports tiered storage for both memory and disk (SSD/HDD).
Large Scale Analytics Acceleration
.jpeg)

Blog
.jpeg)
Blog
Lenovo Case Study Analytics on Data from Multiple Locations and Eliminating ETL
Lenovo is an Alluxio customer with a common problem and use case in the world of data analytics. They have petabytes of data in multiple data centers in different geographic locations. Analyzing it requires an ETL process to get all of the data in the right place. This is both slow, because data has to be transferred across the network, and costly because multiple copies of the data need to be stored. Freshness and quality of the data can also suffer as the data is also potentially out of date and incomplete because regulatory issues prevent certain data from being transferred.
Large Scale Analytics Acceleration
.jpeg)

Blog
.jpeg)
Blog
New Whitepaper Structured Big Data Federation
Alluxio helps organizations handle their big data by providing a unified view of all of the data in your enterprise – on premise, in the cloud, or hybrid. Applications access data using a standard interface to a global virtual namespace. Alluxio also employs a memory-centric architecture to enable data access at memory speed. With the combined unification and performance benefits, Alluxio can effectively provide big data federation for organizations by acting as a virtual data lake.
Hybrid Multi-Cloud
Your selections don't match any items.




.jpeg)