White Paper

Cray Analytics and Alluxio – Wrangling Enterprise Storage

For business to not just survive — but to flourish — it’s become imperative to make decisions with near immediacy, continuously pivot strategy and tactics, and merge streams of inquiries into meaningful action. Executing requires high-frequency insights — the competitive advantage in today’s frenetic business landscape. Together with Alluxio, Inc., we enable businesses to gain the competitive advantage with faster time to insights with our integrated solution of Cray high-performance analytics platform and Alluxio’s memory-speed virtual storage system — Alluxio Enterprise Edition.

For businesses on the cusp of innovation and seeking that information advantage, Cray has fused supercomputing with an open, standards-based framework to deliver an industry first: the Cray® Urika®-GX agile analytics platform. This advanced platform has an unprecedented combination of versatility and speed to tackle the largest problems at super scale and uncover hidden patterns with a fast time to insight.

Alluxio provides a unified view of enterprise data that spans disparate storage systems, locations and clouds, allowing any big data compute framework to access stored data at memory speed. Alluxio Enterprise Edition gives Cray customers the ability to co-locate compute and data with memory-speed access to data while virtualizing across different storage systems under a unified namespace. Alluxio Enterprise Edition supercharges data storage and gives customers an easy-to-install integrated package, with an administrative interface as well as mission-critical features such as enterprise security, data replication and enterprise-grade support and professional services.

Alluxio proves its value in production scenarios where customers need to analyze vast amounts of data in real-time. It overcomes I/O limitations for running big data workloads that access remote storage systems and allows big data workloads to share data at memory speed for orders of magnitude higher performance in running jobs.

Benefits of the Urika-GX platform and Alluxio Enterprise Edition integration

Easy to deploy for access to multiple storage types

With this integration, you can deploy Alluxio in minutes and realize fast on-demand analytics by unified in-memory access to a multitude of data sources such as Ceph, HDFS, Amazon S3 and Google Cloud Storage. Choose between command-line and GUI-based installation to suit your preferences on any number of high-memory compute nodes on the Urika-GX platform.

The Urika-GX system is optimized for Apache Spark™ in-memory analytics, and combining that with Alluxio on Urika-GX gives you single data access gateway. You can expand this in-memory computation capabilities to other frameworks with Alluxio on the Urika-GX platform.

Unified view and management of all your data

Alluxio connects any compute framework, such as Spark or Presto, to disparate storage systems via a unified global file system namespace, and enables any application to interact with data stored on premise and in the cloud at memory speed. This eliminates data duplication and migration, as well as enables new workflows across data stored in any storage system, including cross-data center storage.

Efficient resource utilization

By decoupling compute from storage, Alluxio allows organizations to scale compute and storage resources independently. You can scale number of compute nodes with high-performance and reliable components and leave the job of data access to Alluxio on the Urika-GX platform, which utilizes Cray’s supercomputing network for node-to-node communication, thus ensuring effective “data-locality” even for remote data sources.

High-performance and predictable SLA

Alluxio is co-located with compute, and thereby provides memory-speed access to data, and orders of magnitude improvement in runtime for different types of jobs. The Urika-GX system, with its dense memory hierarchy, enables Alluxio to cache large amounts of multiple data types and sources to enable fast data access, high computational performance and predictable SLAs.