The latest advances in container orchestration by Kubernetes bring cost savings and flexibility to compute workloads in public or hybrid cloud environments. On the other hand, it introduces new challenges such as how to move data to compute efficiently, how to unify data across multiple or remote clouds, how to co-locate data with compute and many more. Alluxio approaches these problems in a new way. It helps elastic compute workloads realize the true benefits of the cloud, while bringing data locality and data accessibility to workloads orchestrated by Kubernetes
Tag: compute storage separation
Problem Sometimes big data analytics need process input data from two different storage systems at the same time. For instance, a data scientists may need to join two tables one from a HDFS cluster and one from S3. Existing Solutions Certain computation frameworks may be able to connect to storage systems including HDFS and popular cloud … Continued
A new generation of open source big data, represented by Alluxio, born at the University of California at Berkeley, looks at this issue. Different from systems such as designing storage tight coupling to achieve low-cost reliable storage HDFS, by providing a virtual data storage layer defined and implemented by software for data applications, abstracting and integrating cloudy, hybrid cloud, multi-data center and other environments The underlying files and objects, and through intelligent workload analysis and data management, make data close to computing and provide data locality, big data and machine learning applications can be achieved with the same performance and lower cost.
Hear how DBS Bank is taking a new approach to making data-intensive compute independent of the storage. They will share the challenges as well as the new technology stack that includes technologies like Spark, Alluxio and object stores.
Many organizations are leveraging EMR to run big data analytics on public cloud. However, reading and writing data to S3 directly can result in slow and inconsistent performance. Alluxio is a data orchestration layer for the cloud, and in this use case it caches data for S3, ensuring high and predictable performance as well as reduced network traffic.
Enterprises are increasingly looking towards object stores to power their big data & machine learning workloads in a cost-effective way. The combination of SwiftStack and Alluxio together, enables users to seamlessly move towards a disaggregated architecture.
Alluxio can help data scientists and data engineers interact with different storage systems in a hybrid cloud environment. Using Alluxio as a data access layer for Big Data and Machine Learning applications, data processing pipelines can improve efficiency without explicit data ETL steps and the resulting data duplication across storage systems.
Wenbo Zhao (Two Sigma) and Bin Fan (Alluxio) will be presenting on how Two Sigma uses Alluxio to make data-intensive compute independent of the storage beneath.
The goal is to make Alluxio accessible to an even wider set of users through a focus on security, new language bindings, and further increased stability. In addition, the team is working on new APIs to allow applications to access data more efficiently and manage data across different under storage systems.