Learn about Alibaba’s use case in deep learning and gene computing acceleration using Alluxio in Kubernetes.
Alluxio video presentations
This talk includes why Netflix needed to build Iceberg, the project’s high-level design, and will highlight the details that unblock better query performance.
This talk covers an overview of the project and highlight best practices for creating performant input pipelines.
Learn why leading companies are moving towards a decoupled compute and storage architecture, and the associated challenges and requirements. Hear about how Spark and Alluxio together can solve the challenges.
Want to leverage your existing investments in Hadoop with your data on-premise and still benefit from the elasticity of the cloud?
Like other Hadoop users, you most likely experience very large and busy Hadoop clusters, particularly when it comes to compute capacity. Bursting HDFS data to the cloud can bring challenges – network latency impacts performance, copying data via DistCP means maintaining duplicate data, and you may have to make application changes to accomodate the use of S3.
“Zero-copy” hybrid bursting with Alluxio keeps your data on-prem and syncs data to compute in the cloud so you can expand compute capacity, particularly for ephemeral Spark jobs.
Learn more about Bazaarvoice’s use case leveraging Apache Spark, Hive, and Alluxio on S3. Along with how to set up Hive with Alluxio so that Hive jobs can seamlessly read from/write to S3.
Haoyuan Li offers an overview of a data orchestration layer that provides a unified data access and caching layer for single cloud, hybrid, and multicloud deployments.
In this online presentation, we present how ING is leveraging Presto (interactive query), Alluxio (data orchestration & acceleration), S3 (massive storage), and DC/OS (container orchestration) to build and operate our modern Security Analytics & Machine Learning platform. We will share the challenges we encountered and how we solved them.
EMR has become a widely used service to run big data analytics in the public cloud. But issues around slow/inconsistent EMR performance due to S3 data lakes creates challenges for organizations.
Alluxio is a data orchestration layer for the cloud that increases performance of analytic workloads running on AWS EMR using S3 as the storage.
Join us for this webinar where we will show you how to set up EMR Spark and Hive with Alluxio so jobs can seamlessly read from and write to your S3 data lake. You’ll see the performance gains with Alluxio in your EMR/S3 stack.