Alluxio Product School Webinar – Hands-on Lab: Get Started with Alluxio on Kubernetes

Shawn Sun, Alluxio’s software engineer, shares how to get started with Alluxio on Kubernetes in April’s Product School Webinar. To simplify the DevOps of the stack of Alluxio with a query engine, Alluxio has provided two ways to deploy on Kubernetes, helm and operator. They significantly simplify the deployment, configuration, and life cycle management of … Continued

Tags: , , ,

The Trino Optimization Handbook

Your 🐰 queries are slow 🐢 … you’re frustrated 😩 … Don’t let suboptimal Trino performance hold you back any longer! Unlock the full potential of Trino and transform your data analytics game. Discover the secrets behind Trino’s query engine and learn how to overcome bottlenecks to achieve⚡ blazing-fast  query performance. In this comprehensive guide, … Continued

Tags: , , , ,

Modern Data Stack in Motion

In this presentation, I will talk about the birth, the growth, and the prosperity of modern data stack. I will show you why modern data stack is more than a buzzword, and how it will possibly evolve in the next couple of years.

Tags: , ,

Real-Time Analytics: Going Beyond Stream Processing With Apache Pinot

Streaming systems form the backbone of the modern data pipeline as the stream processing capabilities provide insights on events as they arrive. But what if we want to go further than this and execute analytical queries on this real-time data? That’s where Apache Pinot comes in.

OLAP databases used for analytical workloads traditionally executed queries on yesterday’s data with query latency in the 10s of seconds. The emergence of real-time analytics has changed all this and the expectation is that we should now be able to run thousands of queries per second on fresh data with query latencies typically seen on OLTP databases.

Apache Pinot is a realtime distributed OLAP datastore, which is used to deliver scalable real time analytics with low latency. It can ingest data from streaming sources like Kafka, as well as from batch data sources (S3, HDFS, Azure Data Lake, Google Cloud Storage), and provides a layer of indexing techniques that can be used to maximize the performance of queries.

Come to this talk to learn how you can add real-time analytics capability to your data pipeline.

Tags: , , , , ,

The Architecture Overview of OceanBase DataBase

OceanBase Database, is an open-source, distributed Hybrid Transactional/Real-time Operational Analytics (HTAP) database management system that has set new world records in both the TPC-C and TPC-H benchmark tests. OceanBase Database starts from 2010, and it has been serving all of the critical systems in Alipay. Besides Alipay, OceanBase has also been serving customer from a variety of sectors, including Internet, financial services, telecommunications and retail industry.

Tags: , ,

Building data lineage; Running Spark with Alluxio; Data Mesh

Big Data Application Meetup *

Running Spark with Alluxio is a popular stack particularly for hybrid environments. In this session, Dipti will briefly introduce Alluxio, share the top 10 tips for performance tuning for real-world workloads, and demo Alluxio with Spark.