Over the last few years, organizations have worked towards the separation of storage and compute for a number of benefits in the areas of cost, data duplication and data latency. Cloud resolves most of these issues but comes to the expense of needing a way to query data on remote storages. Alluxio and Presto are a powerful combination to address the compute problem, which is part of the strategy used by Simbiose Ventures to create a product called StorageQuery – A platform to query files in cloud storages with SQL.
Alluxio meetups, conferences, events and more
The latest Alluxio meetups, webinars, conferences and more
Join us for this webinar where Alex Ma of Alluxio, an open source data orchestration platform, will discuss how a data orchestration approach offers a solution for connecting traditional on-prem data centers with the cloud, data centers with other data centers, and clouds with other clouds. With Alluxio’s “zero-copy” burst solution, companies can bridge remote data centers with computing frameworks in other locations, enabling them to offload compute and leverage the flexibility, scalability, and power of the cloud for their remote data.
Adit Madan and Parviz Peiravi offer an overview of the Alluxio data orchestration layer that provides a unified data access layer for hybrid and multi cloud deployments, leveraging Intel® Optane™ Persistent Memory for higher performance caching at reduced cost. The data access layer enables distributed compute engines like Presto, TensorFlow, and PyTorch to transparently access data from various storage systems (including S3, HDFS, and Azure) while actively leveraging a multi-tier cache to accelerate data access.
In this talk, we will describe how we have solved an issue with large S3 API costs incurred by Presto under several usage concurrency levels by implementing Alluxio as a data orchestration layer between S3 and Presto. Also, we will show the results of an experiment with estimating the per-query S3 API costs using the TPC-DS dataset.
Alluxio 2.3 was just released at the end of June 2020. Calvin and Bin will go over the new features and integrations available and share learnings from the community. Any questions about the release and on-going community feature development are welcome.
This is a casual online video chat where all attendees are welcome to bring your own questions. Our host Bin will have suggested topics, such as the top challenges around leveraging popular compute frameworks including Presto and Spark to access remote data, and the latest developments in Alluxio open source such as Alluxio Catalog Services.
In Alluxio, an Under File System is the plugin to connect to any file systems or object stores, so users can mount different storages like AWS S3 or HDFS into Alluxio namespace. This under filesystem is designed to be modular, in order to enable users to easily extend this framework with their own Under File System implementation and connect to a new or customized storage system.
In this office hour, we demonstrate how a “zero-copy burst” solution helps to speed up Spark and Presto queries in the public cloud while eliminating the process of manually copying and synchronizing data from the on-premise data lake to cloud storage. This approach allows compute frameworks to decouple from on-premise data sources and scale efficiently by leveraging Alluxio and public cloud resources such as AWS.
Join us for this tech talk where we will show you how Alluxio can help burst your private computing environment to Google Cloud, minimizing costs and I/O overhead. Alluxio coupled with Google’s open source data and analytics processing engine, Dataproc, enables zero-copy burst for faster query performance in the cloud so you can take advantage of resources that are not local to your data, without the need for managing the copying or syncing of that data.