Online Meetup: Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3

September 27, 2019

Mariusz Derela

DevOps Engineer

ING

Krzysztof Kuźnik

Product Owner

Hunt Squad

ING Bank is a multinational financial services company headquartered in Amsterdam with over $1 trillion in assets. As a leading bank, we place a great emphasis on cybersecurity. One aspect of this is the Security incident and event management (SIEM), which is the process of identifying, monitoring, recording and analyzing security events or incidents within a real-time IT environment. SIEM requires our data platform to have high and consistent performance, so we use open source technologies Presto and Alluxio for fast SQL analytics in the cloud.

In this online presentation, we are going to present how ING is leveraging Presto (interactive query), Alluxio (data orchestration & acceleration), S3 (massive storage), and DC/OS (container orchestration) to build and operate our modern Security Analytics & Machine Learning platform. We will share the challenges we encountered and how we solved them. Today we run this platform in several different data centers, and we have reduced our 10+ minutes queries to under 10 seconds!

Video:

Presentation slides:

Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3 from Alluxio, Inc.

‍

Videos:

Presentation Slides:

Online Meetup: Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3 from Alluxio, Inc.

Video:

Presentation slides:

Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3 from Alluxio, Inc.

‍

Videos:

Presentation Slides:

Online Meetup: Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3 from Alluxio, Inc.

Complete the form below to access the full overview:

Videos

Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale

Real-time OLAP databases are optimized for speed and often rely on tightly coupled storage-compute architectures using disks or SSDs. Decoupled architectures, which use cloud object storage, introduce an unavoidable tradeoff: cost efficiency at the expense of performance. This makes them unsuitable for databases that need to provide low-latency, real-time analytics, especially the new wave of LLM-powered dashboards, retrieval-augmented generation (RAG), and vector-embedding searches that thrive only when fresh data is milliseconds away. Can we achieve both cost efficiency and performance?

In this talk, we’ll explore the engineering challenges of extending Apache Pinot—a real-time OLAP system—onto cloud object storage while still maintaining sub-second P99 latencies.

We’ll dive into how we built an abstraction in Apache Pinot to make it agnostic to the location of data. We’ll explain how we can query data directly from the cloud (without needing to download the entire dataset, as with lazy-loading) while achieving sub-second latencies. We’ll cover the data fetch and optimization strategies we implemented, such as pipelining fetch and compute, prefetching, selective block fetches, index pinning, and more. We'll also share our latest work about integration with open table formats like iceberg, and how we will continue to achieve fast analytics directly on parquet files by implementing all the same techniques that apply to tiered storage.

‍

July 15, 2025

Introduction to Apache Iceberg™ & Tableflow

The data lake is a fantastic, low-cost place to put data at rest for offline analytics, but we've built it under the terms of a terrible bargain: all that cheap storage at scale was a great thing, but we gave up schema management and transactions along the way. Apache Iceberg has emerged as king of the Open Table Formats to fix this very problem.

Built on the foundation of Parquet files, Iceberg adds a simple yet flexible metadata layer and integration with standard data catalogs to provide robust schema support and ACID transactions to the once ungoverned data lake. In this talk, we'll build Iceberg up from the basics, see how the read and write path work, and explore how it supports streaming data sources like Apache Kafka™. Then we'll see how Confluent's Tableflow brings Kafka together with open table formats like Iceberg and Delta Lake to make operational data in Kafka topics instantly visible to the data lake without the usual ETL—unifying the operational/analytical divide that has been with us for decades.

July 15, 2025

Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI

Storing data as Parquet files on S3 is increasingly used not just as a data lake but also as a lightweight feature store for ML training/inference or a document store for RAG. However, querying petabyte- to exabyte-scale data lakes directly from cloud object storage remains notoriously slow (e.g., latencies ranging from hundreds of milliseconds to several seconds on AWS S3).

In this talk, we show how architecture co-design, system-level optimizations, and workload-aware engineering can deliver over 1000× performance improvements for these workloads—without changing file formats, rewriting data paths, or provisioning expensive hardware.

We introduce a high-performance, low-latency S3 proxy layer powered by Alluxio, deployed atop hyperscale data lakes. This proxy delivers sub-millisecond Time-to-First-Byte (TTFB)—on par with Amazon S3 Express—while preserving compatibility with standard S3 APIs. In real-world benchmarks, a 50-node Alluxio cluster sustains over 1 million S3 queries per second, offering 50× the throughput of S3 Express for a single account, with no compromise in latency.

Beyond accelerating access to Parquet files byte-to-byte, we also offload partial Parquet processing from query engines via a pluggable interface into Alluxio. This eliminates the need for costly index scans and file parsing, enabling point queries with 0.3 microseconds latency and up to 3,000 QPS per instance (measured using a single-thread)—a 100× improvement over traditional query paths.

July 15, 2025

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo

Alluxio Enterprise AI

Alluxio Enterprise Data

Videos:

Presentation Slides:

Videos:

Presentation Slides:

Complete the form below to access the full overview:

Videos

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer