analytics Archives

Trino Optimization With Distributed Caching on Data Lakes: Trino Fest 2023 Session Recap

July 21, 2023 By Hope Wang, Beinan Wang and Cole Bowden (Trino)

Originally published on trino.io: https://trino.io/blog/2023/07/21/trino-fest-2023-alluxio-recap.html By 2025, there will be 100 zetabytes stored in the cloud. That’s 100,000,000,000,000,000,000,000 bytes – a huge, eye-popping number. But only about 10% of that data is actually used on a regular basis. At Uber, for example, only 1% of their disk space is used for 50% of the data they access … Continued

Architecting Data Platform Across Regions and Clouds for Analytics and AI

October 13, 2022

Data platform teams are increasingly challenged with accessing multiple data stores that are separated from compute engines, such as Spark, Presto, TensorFlow or PyTorch. Whether your data is distributed across multiple datacenters and/or clouds, a successful heterogeneous data platform requires efficient data access.

Tags: ai, ai platform, analytics, data platform, product school

Real-Time Analytics: Going Beyond Stream Processing With Apache Pinot

September 15, 2022

Streaming systems form the backbone of the modern data pipeline as the stream processing capabilities provide insights on events as they arrive. But what if we want to go further than this and execute analytical queries on this real-time data? That’s where Apache Pinot comes in.

OLAP databases used for analytical workloads traditionally executed queries on yesterday’s data with query latency in the 10s of seconds. The emergence of real-time analytics has changed all this and the expectation is that we should now be able to run thousands of queries per second on fresh data with query latencies typically seen on OLTP databases.

Apache Pinot is a realtime distributed OLAP datastore, which is used to deliver scalable real time analytics with low latency. It can ingest data from streaming sources like Kafka, as well as from batch data sources (S3, HDFS, Azure Data Lake, Google Cloud Storage), and provides a layer of indexing techniques that can be used to maximize the performance of queries.

Come to this talk to learn how you can add real-time analytics capability to your data pipeline.

Tags: alluxio day, analytics, apache pinot, data, real time, startree

Modernize your analytics workloads with NetApp and Alluxio

June 1, 2022 By Joseph Kandatilparambil

Imagine as an IT leader having the flexibility to choose any services that are available in public cloud and on premises. And imagine being able to scale your storage for your data lakes with control over data locality and protection for your organization. With these goals in mind, NetApp and Alluxio are joining forces to help our customers adapt to new requirements for modernizing data architecture with low-touch operations for analytics, machine learning, and artificial intelligence workflows.

Simplify and Accelerate Your Geo-Distributed Analytics Platform at Scale

April 19, 2022

Today, many organizations are running a multitude of data-driven applications and data platforms that span multiple geographic regions and across heterogeneous environments – public, private, hybrid, or multi-cloud. Further, the trend of separating compute resources from storage resources makes it easier to scale compute and storage independently, allowing organizations to keep up with new trends in data analytics and AI. In response, more organizations are modernizing their data platforms to meet their needs.

Tags: analytics, netapp, storagegrid

Tag: analytics