real time Archives

Real-Time Analytics: Going Beyond Stream Processing With Apache Pinot

September 15, 2022

Streaming systems form the backbone of the modern data pipeline as the stream processing capabilities provide insights on events as they arrive. But what if we want to go further than this and execute analytical queries on this real-time data? That’s where Apache Pinot comes in.

OLAP databases used for analytical workloads traditionally executed queries on yesterday’s data with query latency in the 10s of seconds. The emergence of real-time analytics has changed all this and the expectation is that we should now be able to run thousands of queries per second on fresh data with query latencies typically seen on OLTP databases.

Apache Pinot is a realtime distributed OLAP datastore, which is used to deliver scalable real time analytics with low latency. It can ingest data from streaming sources like Kafka, as well as from batch data sources (S3, HDFS, Azure Data Lake, Google Cloud Storage), and provides a layer of indexing techniques that can be used to maximize the performance of queries.

Come to this talk to learn how you can add real-time analytics capability to your data pipeline.

Tags: alluxio day, analytics, apache pinot, data, real time, startree

Building High-Performance Data Lake Using Apache Hudi and Alluxio at T3Go

November 20, 2020 By Trevor Zhang (T3Go), Vino Yang (T3Go), Jasmine Wang and Bin Fan

How T3Go’s high-performance data lake using Apache Hudi and Alluxio shortened the time for data ingestion into the lake by up to a factor of 2. Data analysts using Presto, Hudi, and Alluxio in conjunction to query data on the lake saw queries speed up by 10 times faster.

Tag: real time