Learn how to set up Google Cloud Dataproc with Alluxio so jobs can seamlessly read from and write to Cloud Storage. See how to run Dataproc Spark against a remote HDFS cluster.
JD.com is China’s largest online retailer. It uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component.
This session talks about challenges associated with querying diverse data sources at Walmart and how those are tackled using Presto & Alluxio.
In this talk, we share our lessons in building and rebuilding our monitoring systems and data platforms at Electronic Arts (EA).
This talk includes why Netflix needed to build Iceberg, the project’s high-level design, and will highlight the details that unblock better query performance.
Learn why leading companies are moving towards a decoupled compute and storage architecture, and the associated challenges and requirements. Hear about how Spark and Alluxio together can solve the challenges.
Today’s current pace of innovation is hindered by the necessity of reinventing the wheel in order for applications to efficiently access data. When an engineer or scientist wants to write an application to solve a problem, he or she needs to spend significant effort on getting the application to access the data efficiently and effectively, rather than focusing on the algorithms and the application’s logic.
Learn how to set up EMR Spark with Alluxio so Spark jobs can seamlessly read from and write to S3. See the performance comparison between Spark on S3 with Spark, and Alluxio on S3.
Alluxio is an open-source data orchestration system widely used to speed up data-intensive workloads in the cloud. Alluxio v2.0 introduced Replicated Async Write to allow users to complete writes to Alluxio file system and return quickly with high application performance, while still providing users with peace of mind that data will be persisted to the chosen under storage like S3 in the background.