In this office hour, we demonstrate how a “zero-copy burst” solution helps to speed up Spark and Presto queries in the public cloud while eliminating the process of manually copying and synchronizing data from the on-premise data lake to cloud storage. This approach allows compute frameworks to decouple from on-premise data sources and scale efficiently by leveraging Alluxio and public cloud resources such as AWS.
International Data Corporation (IDC) reported that the global datasphere will grow from 33 zettabytes in 2018 to 175 zettabytes by 20251. This trend becomes more and more complicated with the variety and velocity of data growth, and it continuously changes the ways data is collected, stored, processed, and analyzed. New analytics solutions, including machine learning, deep learning, and artificial intelligence (AI), and new architectures and tools are being developed to extract and deliver value from the huge datasphere.
This article describes how Alluxio accelerates the training of deep learning models in a hybrid cloud environment with Intel’s Analytics Zoo open source platform, powered by oneAPI. Details on the new architecture and workflow, as well as Alluxio’s performance benefits and benchmarks results will be discussed.
This webinar will describe the concept and internal mechanism using the stack of Spark+Alluxio in Kubernetes to enhance data locality even when the storage service is outside or remote.
Learn why leading companies are moving towards a decoupled compute and storage architecture, and the associated challenges and requirements. Hear about how Spark and Alluxio together can solve the challenges.
This is a guest blog by Ashwin Sinha with an original blog source. This blog introduces Wormhole— open source Dockerized solution for deploying Presto & Alluxio clusters for blazing fast analytics on file system (we use S3, GCS, OSS). When it comes to analytics, generally people are hands-on in writing SQL queries and love to analyse data which resides in a warehouse (e.g. MySQL database). But as data grows, these … Continued
In this article, you will learn how to incorporate Alluxio to implement a unified distributed file system service as well as how to add extensions on top of Alluxio including customized authentication schemes and UDF (user-defined functions) on Alluxio files.
Alluxio 2.0 is the most ambitious platform upgrade since the inception of Alluxio with greatly expanded capabilities to empower users to run analytics and AI workloads on private, public or hybrid cloud infrastructures leveraging valuable data wherever it might be stored.
This release, now available for download, includes many advancements that will allow users to push the limits of their data-workloads in the cloud.
In this tech talk, we will introduce the key new features and enhancements such as:
we introduce Tachyon, a memory centric fault-tolerant distributed file system, which enables reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce.