Tech Talk Slide Deck

Accelerating workloads and bursting data with Google Dataproc & Alluxio


Google Cloud Dataproc is a popular managed on-demand service to run Spark, Presto and many other compute workloads. Alluxio, an open source data orchestration technology, helps speed up Dataproc workloads by providing a distributed caching layer within the Dataproc Cluster. In addition, Alluxio enables “Zero-copy” bursting allowing users to run compute workloads even on data that’s remote on-prem or another cloud. In this session, Dipti from Alluxio and Roderick from Google Cloud will share an overview of Alluxio and Google Dataproc and the benefits the two together bring. It will include a demo of initializing a Dataproc cluster with Alluxio to run workloads on remote data.


Dipti Borkar is VP, Products at Alluxio. She has deep experience in data and database technology across relational and non-relational. Prior to Alluxio, Dipti was VP of Product Marketing at Kinetica and Couchbase. Earlier in her career Dipti managed development teams at IBM DB2 where she started her career as a database software engineer. Dipti holds a M.S. in Computer Science from the UC San Diego, and an MBA from the Haas School of Business at UC Berkeley.

Roderick Yao is a Strategic Cloud Engineer at Google. His focus is designing innovative solutions for Google Cloud customers to build and manage data pipelines and data migration to Google.

Questions? Slack with the speakers, users, and many other community members!
Join Alluxio Global Online Meetup Group to find more events.