Apache Spark has been widely adopted for in-memory data analytics at scale, however, efficient memory utilization is a common challenge, and users will either run out of memory or experience low and unstable performance. Many Spark users may not be aware of the differences in memory utilization between caching data directly in-memory into the Spark JVM versus storing data off-heap via an in-memory storage service like Alluxio. In this office hour, I will highlight the two approaches with a demo and open up for discussions
In this Office Hour we’ll go over:
- How to run Spark shell with Alluxio such that Spark jobs
- A demo to compare the memory usage between Spark cache and using Alluxio as the external off-heap caching service
- Open Session for discussion on any topics such as running Presto on Alluxio, and more
Bin Fan is the founding engineer of Alluxio, Inc. and the PMC member of Alluxio open source project. Prior to Alluxio, he worked for Google where he won the Technical Infrastructure Award. Bin received his Ph.D. in Computer Science from Carnegie Mellon University working on distributed systems
Questions? Slack with the speakers, users, and many other community members!
Welcome to join Alluxio Global Online Meetup Group to attend online meetups like this!