Two Sigma Case Study – Cloud bursting with Spark for on-premise Hadoop

Two Sigma, a leading hedge fund with more than $50 billion under management, turned to Alluxio for help with bursting Spark workloads in a public cloud to enable hybrid workloads for on-premise HDFS. With Alluxio, Two Sigma sees better performance, increased flexibility and dramatically lower costs with the number of model runs per day increased by 4x and the cost of compute reduced by 95%.

Tags: , , , ,

Alluxio Developer Tip: Why am I seeing the error “User yarn is not configured for any impersonation. impersonationUser: foo?”

What is User Impersonation? Impersonation is simply the ability for one user to act on behalf of another user. For example, say user ‘yarn’ has the credentials to connect to a service, but user ‘foo’ does not. Therefore, user ‘foo’ would never be able to access the service. However, user ‘yarn’ can access the service … Continued

Using Alluxio as a Fault-tolerant Pluggable Optimization Component of JD.com’s Computation Frameworks

JD.com is China’s largest online retailer and its biggest overall retailer, as well as the country’s biggest internet company by revenue. Currently, JD.com’s BDP platform runs more than 400,000 jobs (15+ PB) daily, on a system with more than 15,000 cluster nodes and a total capacity of 210 PB.

Alluxio has run in JD.com’s production environment on 100 nodes for six months. See how JD.com uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component.

Tags: , , , , ,

Tencent Case Study: Delivering Customized News to Over 100 Million Users per Month with Alluxio

This post is guest authored from our friends at Tencent: Can He Download or print the case study here Tencent is one of the largest technology companies in the world and a leader in multiple sectors such as social networking, gaming, e-commerce, mobile and web portal. Tencent News, one of Tencent’s many offerings, strives to create a … Continued

MOMO: Accelerating Ad Hoc Analysis with Spark SQL and Alluxio

This post is guest authored by our friends at MOMO: Haojun (Reid) Chan and Wenchun Xu Data Analysis Trends The hadoop ecosystem makes many distributed system/algorithms easier to use and generally lowers the cost of operations. However, enterprises and vendors are never satisfied with that, so higher performance becomes the next issue. We considered several options … Continued

MOMO: Accelerating Ad Hoc Analysis with Spark SQL and Alluxio

From our friends at MOMO The hadoop ecosystem makes many distributed system/algorithms easier to use and generally lowers the cost of operations. However, enterprises and vendors are never satisfied with that, so higher performance becomes the next issue. We considered several options to address our performance needs and focused our efforts on Alluxio, which improves performance … Continued

Tags: , , , , ,

Flexible and Fast Storage for Deep Learning with Alluxio

Flexible and Fast Storage with Alluxio for Deep Learning Introduction In the age of growing datasets and increased computing power, deep learning has become a popular technique for AI. Deep learning models continue to improve their performance across a variety of domains, with access to more and more data, and the processing power to train … Continued