Data processing is increasingly making use of NVIDIA computing for massive parallelism. Advancements in accelerated compute mean that access to storage must also be quicker, whether in analytics, artificial intelligence (AI), or machine learning (ML) pipelines.
Software Engineer, Alluxio
This post outlines a solution for building a hybrid data lake with Alluxio to leverage analytics and AI on Amazon Web Services (AWS) alongside a multi-petabyte on-premises data lake. Alluxio’s solution is called “zero-copy” hybrid cloud, indicating a cloud migration approach without first copying data to Amazon Simple Storage Service (Amazon S3).
We’re pleased to announce the general availability of Alluxio Data Orchestration Hub, your single pane of glass to orchestrate data for analytics and AI. The data ecosystem is complex with the separation of storage and compute across data centers and cloud providers. With this release we’ve made great strides towards simplifying data access and management across multiple environments.
Migrating SQL workloads from a fully on-premise environment to cloud infrastructure has numerous benefits, including alleviating resource contention and reducing costs by paying for computation resources on an on-demand basis. In the case of Presto running on data stored in HDFS, the separation of compute in the cloud and storage on-premises is apparent since Presto’s … Continued
Alluxio 2.3.0 focuses on streamlining the user experience in hybrid cloud deployments where Alluxio is deployed with compute in the cloud to access data on-prem. Features such as environment validation tools and concurrent metadata synchronization greatly improve Alluxio’s functionality. Integrations with AWS EMR, Google Dataproc, K8s, and AWS Glue make Alluxio easy to use in a variety of cloud environments. In this article, we will share some of the highlights of the release. For more, please visit our release notes page.
We are excited to announce the release of Alluxio Enterprise Edition (AEE) and Community Edition (ACE) v1.7.0. This release brings enhanced caching policies, further ecosystem integrations, and significant usability improvements. One highlight is the Alluxio FUSE API which provides users with the ability to interact with Alluxio through a local filesystem mount. Alluxio FUSE is particularly useful for integrating with deep learning frameworks such as Tensorflow.
Open source Alluxio 1.5.0 has been released with a large number of new features and improvements. Alluxio allows any application to access data from any storage system transparently and at memory speed. Interoperability with other technologies in the ecosystem is an important step for enabling this, and in the 1.5.0 release, we have improved the accessibility of Alluxio in several key ways.
Alluxio 1.4.0 has been released with a large number of new features and improvements. This blog highlights some stand out aspects of the Alluxio 1.4.0 open source release: Improved Alluxio Under Storage API, Native File System REST Interface, Packet Streaming
This is an excerpt from the Accelerating Data Analytics on Ceph Object Storage with Alluxio whitepaper.
As the volume of data collected by enterprises has grown, there is a continual need to find efficient storage solutions. Owing to its simplicity, scalability and cost-efficiency object storage, including Ceph, has increasingly become a popular alternative to traditional file systems. In most cases the object storage system, on-premise or in the cloud, is decoupled from compute nodes where analytics is run. There are several benefits of this separation.