This post outlines a solution for building a hybrid data lake with Alluxio to leverage analytics and AI on Amazon Web Services (AWS) alongside a multi-petabyte on-premises data lake. Alluxio’s solution is called “zero-copy” hybrid cloud, indicating a cloud migration approach without first copying data to Amazon Simple Storage Service (Amazon S3).
Electronic Arts (EA) is a leading company in the gaming industry, providing over a thousand games to serve billions of users worldwide. The EA Data & AI Department builds hundreds of platforms to manage petabytes of data generated by games and users every day. These platforms consist of a wide range of data analytics, from real-time data ingestion to ETL pipelines. Formatted data produced by our department is widely adopted by executives, producers, product managers, game engineers, and designers for marketing and monetization, game design, customer engagement, player retention, and end-user experience.
When applications are only reading and writing through Alluxio, the Alluxio file system provides strong consistency. However, when clients are writing data across both Alluxio and under storage, the consistency depends on the Alluxio write type and under storage type. This article discusses what to expect in each scenario.
This blog explores an innovative platform with Presto as the computing engine and Alluxio as a data orchestration layer between Presto and S3 storage, to support online services with instantaneous response within the gaming industry. The preliminary results show that Presto with Alluxio outperforms S3 significantly in all cases.Alluxio with metadata caching shows up to 5.9x performance gain when handling large numbers of small files.
Many organizations are leveraging EMR to run big data analytics on public cloud. However, reading and writing data to S3 directly can result in slow and inconsistent performance. Alluxio is a data orchestration layer for the cloud, and in this use case it caches data for S3, ensuring high and predictable performance as well as reduced network traffic.
This talk will guide the audience on how Alluxio can greatly simplify the data preparation phase in with remote and possibly multiple data sources. We will share the lessons and benchmark from Bill Zhao an engineer led in Apple when building a Machine Learning platform using Tensorflow, NFS, DC/OS and Alluxio.
EMR has become a widely used service to run big data analytics in the public cloud. But issues around slow/inconsistent EMR performance due to S3 data lakes creates challenges for organizations.
Alluxio is a data orchestration layer for the cloud that increases performance of analytic workloads running on AWS EMR using S3 as the storage.
Join us for this webinar where we will show you how to set up EMR Spark and Hive with Alluxio so jobs can seamlessly read from and write to your S3 data lake. You’ll see the performance gains with Alluxio in your EMR/S3 stack.