Alibaba Cloud is the largest cloud computing company in China. It integrates Alluxio with its OSS(open storage service), and leverages Alluxio as a fast data-access layer on top of OSS.
Powered By Alluxio
Arimo leverages Alluxio’s in-memory capability, improving time-to-results for deep learning models up to 60%.
Baidu uses Alluxio for running fast SQL queries over globally-distributed databases. Petabyte-scale data was distributed over multiple data centers, and Alluxio accelerates the remote data access and stores the frequently used “hot” data that would be local to the compute nodes.
Barclays describes how they iteratively process raw data directly from the central data warehouse into Spark and how Alluxio is their key enabling technology.
Bazaarvoice stores massive amount of data on AWS S3 and leverages Alluxio in production to speed up their big data analytics. In this architecture, Alluxio enables data locality, data caching and fixes the semantic differences of AWS S3 storage, achieving 5-10x speed up for their Hive queries running on S3.
China Unicom is the world’s fourth-largest mobile service provider by subscriber base. It is uses in production to accelerate HDFS access for the SparkSQL analytics. It has seen 6-10X performance improvement.
Comcast brings Alluxio into its framework stack for operationalizing predictive ML models to improve customer experience, to eliminate bottlenecks in the process from model inception to deployment and monitoring. Alluxio provides the universal data plane in the stack on top of various under-stores (Ex. S3, HDFS, RDBMS).
Cray has fused supercomputing with an open, standards-based framework to deliver an industry first: the Cray® Urika®-GX agile analytics platform. Alluxio provides a unified view of enterprise data allowing compute frameworks to access stored data at memory speed and co-locates compute and data with memory-speed access to data while virtualizing across different storage systems.
Ctrip is a leading Chinese provider of travel services including accommodation reservation, transportation ticketing. It uses Alluxio to boost performance of Spark SQL workloads and alleviate the pressure on HDFS Name Node. In addition, Alluxio is deployed as the single entry to unify two HDFS clusters.
DBS is the leading financial services group headquartered in Singapore. DBS built their data-intensive compute independent of the storage, and based on technologies like Spark, Alluxio and object stores.
Didi Chuxing is a major Chinese ride-sharing, AI and autonomous technology company. It leverages Alluxio for several purposes inside the data analytics platform: (1) accelerating data access from the remote data centers (2) integrating the data from several different data sources from different data centers (3) sharing the data across the jobs and compute frameworks
eSentire leads the industry in Managed Detection and Response services. It uses Alluxio together with Spark Streaming/SQL and Cassandra in creating an analytics architecture with missions-critical response times to fight cybercrime.
ESRI leverages Alluxio in its mapping and spatial analytics software to read and write geospatial data to a plethora of distributed data stores, such as Amazon S3, HDFS, or OpenStack Swift, including data stores are not natively supported by the ArcGIS platform.
Guardant Health is the world leader in comprehensive liquid biopsy. With Alluxio, Minio, and Spark, Guardant Health is able to create a performant and robust yet scalable system to perform large scale data processing in a cloud-native manner.
Huatai deploys Alluxio Enterprise as the storage layer that unifies data from disparate sources at memory speed, providing high performance and a predictable SLA for leveraging even petabytes of data.
Huawei bands together with Alluxio to release a big data storage acceleration solution, integrating Huawei’s FusionStorage with Alluxio’s memory-speed virtual distributed storage system, to realize unified data management, improved analysis efficiency, faster application performance and popularize big data for processes including storage, analysis, and archiving.
Huya is the leading live streaming platform in China focused on gaming. Huya is using Alluxio to cache data across different data centers to speed up the analytics jobs and avoid reading data remotely repeatedly.
IBM deploys Alluxio over Swift and SoftLayer to build a flexible and efficient big data analytics platform
ING is leveraging Presto (interactive query), Alluxio (data orchestration & acceleration), S3 (massive storage), and DC/OS (container orchestration) to build and operate a modern Security Analytics & Machine Learning platform. Run this stack in several different data centers reduced the queries from 10+ minutes to under 10 seconds
Intel uses Alluxio in several scenarios to share data across different applications and computing frameworks, reduce application’s memory consumption and GC overhead, and cache remote data as a local storage manager
JD.com is China’s largest online retailer. It uses uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component. One example of their computing framework, JDPresto, has gained a 10x performance improvement on average by deploying Alluxio.
Kyligence is a big data intelligence company that offers solutions for big data analytics. Alluxio enables effective data management across different storage systems through its use of transparent naming and mounting API. With Alluxio, Kyligence Analytics Platform gained a good balance between performance, cost and management effort in the Cloud.
Lenovo, the number one manufacturer of personal computers and one of the largest smartphone vendors in the world, can now seamlessly and securely access data from data centers worldwide without labor intensive and error-prone ETL, making it available to analytics running in a single data center at in-memory speeds with Alluxio.
Lianjia is the leading online-to-offline real estate agency service in China. Lianjia built an OLAP platform using SparkSQL on top of Alluxio to accelerate Ad-Hoc SQL queries on a large amount of data.
Ligadata is using Alluxio to speed up Spark workloads by 3x in production by accelerating file system metadata operations backed by object storage.
Lucidworks leverages Alluxio in the cloud to accelerate remote Solr data access and cloud recovery.
Microsoft AI leverages Alluxio to bridge high computation workloads including TensorFlow jobs with Azure Blob storage seamlessly.
MOMO is a leading mobile pan-entertainment social platform in China. It leverages Alluxio with Spark SQL to Speed Up Ad-hoc Analysis
Myntra is a leading Indian fashion e-commerce marketplace company. With Alluxio in its data processing pipeline, Myntra improved CX with faster actionable business intelligence from their data. The Myntra team has also contributed to Alluxio open source by documenting how to use Alluxio with Azure blob store for other users.
Netease uses Alluxio to improve the performance of interactive queries on Presto. Alluxio is deployed together with the Presto workers, and accelerates the data access from the remote HDFS clusters.
Nielsen runs Alluxio since v1.5 on AWS EMR stack to speed up the performance of Spark.
Nvidia leverages Alluxio as part of its GPU-accelerated data analytics framework to manage different storage systems, and provide a quick and easy access to information within various data lakes.
Oracle’s Big Data File System is based on Alluxio, and it is designed to accelerate data access for data pipelines with features that significantly improve the runtime performance of Spark applications. BDFS accelerates data access to and from Oracle Cloud Infrastructure Object Storage Classic by providing an active caching layer.
PerceptIn designed and implemented a cloud architecture using Alluxio that manages enormous amount of incoming data in different storage systems with high throughput and low latency.
Qiniu Cloud Atlab has built AVA, a training platform for deep learning which uses Alluxio to effectively integrate GPU computing resources and storage resources from KODO (the object storage offered by Qiniu Cloud). Alluxio accelerated training tasks to read a large number of sample files such as video and images by 50%.
Qunar leverages Alluxio in product to boost the performance of real-time data analytics, resulting in 15x speedup on average. In addition, it uses Alluxio’s unified namespace to enables different applications and frameworks to easily interact with the data from different storage systems.
Ryte speeds up Presto to read from S3 by having Alluxio to solve the performance bottleneck in metadata operations.
Samsung uses Alluxio with different storage media available in systems including NVME SSDs while providing in-line performance consistent with the speed of the underlying storage media.
Samsung built its big data analysis platform “Brightics” to leverage Alluxio to manage data in Hadoop ecosystem for user analysis and visualization tool.
Shopee deploys Alluxio as a distributed caching layer for Presto in a satellite cluster to hide the performance variance of HDFS as the datalake.
Sogou is one of the largest search engines in China. Alluxio is deployed in its production big data platform with more than 1000 nodes to help improve the reliability of Spark Shuffle service and improve Hive performance.
Suning is one of the largest non-government retailers in China. It uses Alluxio to unify storage systems and manage multiple HDFS clusters.
TalkingData is China’s largest data broker covering more than 600M smart devices on a monthly basis. They leverage Alluxio as a single platform to manage all data across disparate data sources on-prem and in cloud, removing complexities by masking the different data sources and providing a unified interface.
Tencent is a leader in social networking, gaming, e-commerce, mobile and web portal. Tencent News leverages Alluxio with Apache Spark to create a scalable, robust, and performant architecture to provide the best experience to more than 100 million monthly active users of Tencent News.
Tencent Cloud is offering Alluxio in Tencent EMR stack for users to speed up analytics performance and mount external data sources.
Two Sigma is the fifth-largest hedge fund in the world. It uses Alluxio to accelerate the data access from the remote HDFS cluster for the Spark nodes provisioned in the cloud, bringing the training speed of model for algorithmic trading for 10X+ faster.
Vipshop is a leading online retailer in China that processes and analyzes petabytes of data to answer complex questions like how users are behaving, why a purchase was made, and what ads are most effective. With Alluxio, Vipshop can access, store, and manage data across disparate storage systems on-prem and in the cloud.
Wells Fargo uses Alluxio to accelerate Spark workloads in their data preparation and exploration pipeline, saving dozens of minutes to load per each iteration. With Alluxio, data is loaded once and can be served from memory for the subsequent accesses, saving hours in workload processing time.