Alluxio is the data orchestration platform to unify data silos across heterogeneous environments. The following blog will discuss the architecture combining Spark with Alluxio.
Unisound is an artificial intelligence company focusing on Internet of Things services. Unisound’s AI technology stacks include the perception and expression capabilities of signals, voices, images, and texts, and the cognitive technologies such as knowledge, understanding, analysis, and decision-making, towards a multi-modal AI system. Atlas is the supercomputing platform supporting all kinds of AI applications including model training and reasoning inferencing.
This blog is the first in a series introducing Alluxio as the data platform to unify data silos across heterogeneous environments. The next blog will include insights from PrestoDB committer Beinan Wang to uncover the value for analytics use cases, specifically with PrestoDB as the compute engine.
Alluxio 2.6 significantly improves the performance of data-intensive AI/ML workloads across any storage, and also improves the general maintainability and visibility of Alluxio clusters, especially for large-scale deployments. We have taken the feedback and contributions from the community and introduced features which simplify deployment, introduce new data management capabilities, optimize performance, and provide enhanced visibility into system behavior.
Alluxio 2.5 focuses on improving interface support to broaden the set of data driven applications which can benefit from data orchestration. The POSIX and S3 client interfaces have greatly improved in performance and functionality as a result of the widespread usage and demand from AI/ML workloads and system administration needs. Alluxio is rapidly evolving to meet the needs of enterprises that are deploying it as a key component of their AI/ML stacks.
Data processing is increasingly making use of NVIDIA computing for massive parallelism. Advancements in accelerated compute mean that access to storage must also be quicker, whether in analytics, artificial intelligence (AI), or machine learning (ML) pipelines.
This post outlines a solution for building a hybrid data lake with Alluxio to leverage analytics and AI on Amazon Web Services (AWS) alongside a multi-petabyte on-premises data lake. Alluxio’s solution is called “zero-copy” hybrid cloud, indicating a cloud migration approach without first copying data to Amazon Simple Storage Service (Amazon S3).
How T3Go’s high-performance data lake using Apache Hudi and Alluxio shortened the time for data ingestion into the lake by up to a factor of 2. Data analysts using Presto, Hudi, and Alluxio in conjunction to query data on the lake saw queries speed up by 10 times faster.
When applications are only reading and writing through Alluxio, the Alluxio file system provides strong consistency. However, when clients are writing data across both Alluxio and under storage, the consistency depends on the Alluxio write type and under storage type. This article discusses what to expect in each scenario.