Problem Sometimes big data analytics need process input data from two different storage systems at the same time. For instance, a data scientists may need to join two tables one from a HDFS cluster and one from S3. Existing Solutions Certain computation frameworks may be able to connect to storage systems including HDFS and popular cloud … Continued
Tag: compute storage separation
Two Sigma, a leading hedge fund with more than $50 billion under management, turned to Alluxio for help with bursting Spark workloads in a public cloud to enable hybrid workloads for on-premise HDFS. With Alluxio, Two Sigma sees better performance, increased flexibility and dramatically lower costs with the number of model runs per day increased by 4x and the cost of compute reduced by 95%.
This is a recap of the Two Sigma and Alluxio joint meetup hosted in New York. Two Sigma is a leading hedge fund that leverages cutting edge technology to train their models with petabytes of data in on-premise storage. Special thanks to Two Sigma for hosting. Here are the slides from the presentation. In this meetup, Bin Fan from … Continued
Two Sigma Open Source Meetup
This presentation focuses on how Alluxio enables the big data analytics stack to be cloud-native. Today’s cloud object storage systems provide more cost-effective and scalable storage solutions but also different semantics and performance implications compared to HDFS. Applications like Spark or Presto will not benefit from the node-level locality or cross-job caching when retrieving data from the cloud object storage. Deploying Alluxio to access cloud solves these problems because data will be retrieved and cached in Alluxio instead of the underlying cloud or object storage repeatedly.
Learn more about Alluxio, a virtual unified file system and data orchestration layer for big data and machine learning workloads in the cloud.
Today’s enterprises are decoupling storage and compute as they migrate to the cloud, and that’s where Alluxio comes in. Alluxio is the data orchestration layer between storage and compute, bringing your data closer to your Presto workloads for better performance on top of S3.
See how Presto + Alluxio gives you the performance needed for your compute, regardless of where it is – in the cloud or on-premise.
The cloud is rapidly becoming ubiquitous, with continued adoption focused on the flexibility and cost benefits of a utility infrastructure model. Enterprises are increasingly taking a “data first” view of infra- structure, which demands a new way of thinking in a world in which data is stored and accessed from multiple locations and providers. Performance and interoperability challenges, however, can present obstacles to cloud adoption and complicate data management. Techniques such as the use of data silos, ETL processes and multiple data copies, which are commonly employed to accommodate cloud limitations, often tend to offset the expected benefits of cloud infrastructure. Alluxio offers a new way to enhance the benefits of cloud infra- structure without the performance limitations or interoperability challenges resulting from accessing disparate data sources in multiple, often remote, locations.