The Architecture of Decoupling Compute and Storage with Alluxio

Tags: , , ,


As Spark, MapReduce, and many other frameworks are being widely deployed at enterprise productions, an efficient and flexible compute and storage architecture often becomes a hot topic for debate among both IT and LOB practitioners. Although there are good reasons to run compute in a traditional hyperconverge environment as a part of a data lake implementation, decoupling storage and computation is becoming increasingly popular, as O’Reilly recently pointed out in a recent 2017 trend post. For example, Alluxio, IBM, Huawei, EMC, and Red Hat teams have come together to examine real-world application examples and provide joint solutions.

Calvin Jia and Haoyuan Li explain how to decouple compute and storage with Alluxio, exploring the decision factors and considerations—application workload pattern, data locality, cost of infrastructure, network bandwidth, cloud deployment, etc.—and production best practices and solutions to best utilize CPUs, memory, and different tiers of disaggregated compute and storage systems to build out a multitenant high-performance platform that addresses real-world business demand.