System Design Series: Apache Flink from 10,000 Feet, and Building a Flink-powered Recommendation Engine

, I’ve had Apache Flink on my “things I really need to understand properly” list. I’d seen it mentioned alongside Kafka, heard it come up in conversations about real-time pipelines, and sort-of understood the use case. But I’d never actually sat down and learned it properly.

If you feel the same way, you’re in good company. There’s good reason to learn about Flink, it’s one of the most popular tools in software engineering right now. Netflix uses it for near-real-time anomaly detection in their streaming infrastructure. Alibaba reportedly runs one of the largest Flink deployments in the world — processing hundreds of billions of events per day across tens of thousands of machines. Uber built their analytical platform around it. Flink has become the backbone of how some of the most data-intensive companies in the world process information as it happens. So if Flink has been on your list too, this is a good time to actually understand it.

So I dove in. And I was honestly surprised, not just by what Flink is, but by why it exists and how its built. The story of Flink is really the story of a much deeper idea: the idea of how to understand high-scale, constantly streaming data. The problem statement is actually pretty simple, how do you build real-world and practical answers from massive scale of continuous data. This post is my attempt to explain that idea from the ground up, and show you where Flink fits into it.

Let’s dive in.

Before We Start

Two concepts come up constantly in this post that are worth making sure we’re on the same page about before we go further.

What is a stream? A stream is a continuous, potentially never-ending sequence of records arriving over time. Think about a user browsing a website — every page view, every click, every scroll is an event being produced. One after another, in real time. There’s no natural “end” to this — as long as the user is active, events keep coming. That’s a stream.

Image By Author

What is batch processing? Batch processing means taking a finite, bounded collection of data and processing it all at once. Instead of reacting to each event as it arrives, you collect events for a period of time — say, an hour — and then run a computation over all of them together. The computation has a clear start and a clear end.

Both are legitimate ways to process data. The tension between them is what Flink was built to resolve — and we’ll get there.

Back To The Problem: How We Actually Produce Data

Let me make this concrete with an example we’ll use throughout this post.

Imagine you’re building a recommendation engine — the kind that shows users “you might also like these” based on what they’ve been viewing. To do this well, your system needs to know things like:
What has this user been clicking on in the last few minutes?
What items are trending right now across all users?
Which products did this user view but not purchase in the last session?

Now, where does that data come from? Every time a user opens a product page, you record an event. Every click, every purchase, every search — your application is continuously writing records that look roughly like this:

{ "user_id": "u-8821", "item_id": "p-443", "event_type": "view", "timestamp": "2024–03–10T14:32:01Z" }
{ "user_id": "u-1042", "item_id": "p-117", "event_type": "purchase", "timestamp": "2024–03–10T14:32:03Z" }
{ "user_id": "u-8821", "item_id": "p-501", "event_type": "click", "timestamp": "2024–03–10T14:32:07Z" }

One record every few seconds for every user, across millions of concurrent users, continuously. That’s your data. Not a file. Not a table that refreshes once a day. A stream — an ongoing, never-ending sequence of events that describes what your users are doing right now.

Again, this is what this stream looks like —

And yet the dominant paradigm for years was to take that stream and… ignore the fact that it was a stream. Dump the events into files every hour. Wait for the batch job to run. Then serve recommendations based on what users were doing last hour.

Why? Because batch processing is conceptually simple. You know exactly what data you have. You can reason about the computation clearly — it starts, it runs, it finishes. Systems like Hadoop and MapReduce (you don’t need to know these in depth for this post) were built around this model and scaled to enormous data sizes. They worked.

But there’s a fundamental cost: latency. If your batch job runs every hour, then at worst case, a user’s behavior right now won’t influence their recommendations for up to an hour. For a recommendation engine, that means a user who just showed strong interest in hiking gear gets shown laptop accessories — because the system hasn’t caught up yet. The user searched for a hiking rucksack, and you need to show them tents and hiking poles on the next page load, not one hour later.

For fraud detection, hourly latency means fraudulent transactions go undetected for an hour. For a live dashboard, it means your “real-time” metrics can be upto 59 minutes stale. The cost of batch is that events happen in real time, but your system only learns about them on a schedule.

So as data volumes grew and latency requirements tightened, engineers started building streaming systems alongside their batch systems — systems that could process each event as it arrived, in milliseconds. Apache Storm was an early leader here. Amazon Kinesis. LinkedIn’s Samza.

But building a new streaming system, while maintaining an existing batch system, isn’t so straightforward. Now you have two systems to maintain. Your streaming pipeline computed approximate, real-time results. Your batch pipeline ran overnight and produced accurate, complete results. You had to write the same business logic twice — once for each system, in different frameworks, in different languages, kept in sync manually. When the batch job and the streaming job disagreed on a number (and they always disagreed eventually), you had to figure out which one was wrong.

Your recommendation engine in this new world now looks like this: a streaming component that updates recommendations in near-real-time based on recent events, and a batch component that rebuilds the full recommendation model every night based on historical data.

Two codebases. Two deployment pipelines. Two sets of bugs. One serving layer trying to reconcile them.

The Key Insight: Batch Is Just a Special Case of Streaming

Here’s the idea at the heart of Flink, and it’s pretty simple:

A bounded data set is just a special case of an unbounded data stream that happens to end.

Your historical database of 5 years of user events — that’s a stream that started 5 years ago and stopped today. Your log files from last month — that’s a stream with a beginning and an end. The difference between “batch data” and “streaming data” is not a fundamental distinction about the nature of the data. At the end of the day, it’s just JSON events of what the user searched and clicked on. The question is whether the stream is still flowing or has stopped.
Going back to our recommendation engine: the “historical data” you process in your nightly batch job and the “real-time events” you process in your streaming pipeline are both just records in the same sequence of user events. The only difference is when you read them. The nightly batch job reads records from 6 months ago. The streaming pipeline reads records from 6 seconds ago. Same data, different time window.

If you build a system that processes streams natively — and handles both infinite streams and finite ones — you don’t need separate systems. You don’t need to maintain two codebases. You have one engine, one set of logic, and you point it at whatever slice of the stream you need.

That’s what Flink tries to do.

So What Is Apache Flink?

Apache Flink is a distributed stream processing framework. It takes a potentially unbounded stream of data (or a bounded batch of data — same thing), processes it in parallel across a cluster of machines, and produces results continuously as data flows through.

Internally, Flink jobs are written in code, and are converted to a DAG. For example, this is how code for a Flink Job would look like (It’s not important to understand all the details, this is to just give a rough idea) –

// ── 1. SOURCES ──────────────────────────────────────────────
searches = readFromKafka("search-events")
clicks = readFromKafka("click-events")


// ── 2. PER-USER ACTIVITY (windowed aggregation) ─────────────
// group events by user, compute rolling features over last 30 min
userActivity = (searches + clicks)
.keyBy(userId)
.window(slidingWindow(size=30min, slide=1min))
.aggregate(activityAggregator)
// → { userId, recentQueries, recentClicks, categories, ... }


// ── 3. USER EMBEDDING (call user-tower model) ───────────────
// turn the activity features into a vector
userState = userActivity.asyncMap(callUserTowerModel)
// → { userId, embedding[128], features }


// ── 4. CANDIDATE GENERATION (2 sources, then merge) ─────────
annCandidates = userState.asyncMap(vectorAnnLookup) // ~500 items
trendingCandidates = userState.asyncMap(trendingLookup) // ~200 items

allCandidates = (annCandidates + trendingCandidates)
.keyBy(userId)
.window(2sec)
.reduce(mergeAndDedupe)
// → { userId, candidates: ~1000 itemIds }


// ── 5. FETCH ITEM FEATURES (batched lookup) ─────────────────
scoringInputs = allCandidates
.joinWith(userState, on=userId)
.asyncMap(fetchItemFeatures)
// → { userId, userFeatures, [(itemId, itemFeatures) × ~1000] }


// ── 6. RANKING (call ranking model) ─────────────────────────
ranked = scoringInputs.asyncMap(callRankingModel)
// → { userId, top 100 (itemId, score) pairs }


// ── 7. SINK ─────────────────────────────────────────────────
ranked.writeTo(redis)

Internally, Flink breaks down this code to a graph of physical tasks to be done, and breaks these tasks to smaller set of parallel “subtasks” —

Flink pushes tasks to worker nodes. Each worker runs its assigned tasks continuously, sends periodic heartbeats back to Flink, and reports if a task fails so Flink can restart it.

Lets break down the core concepts of Flink

Core Concepts

Streams and Operators

Let me start from the simplest possible picture and build up.

Every Flink program is a dataflow graph: a set of operators connected by data streams. Don’t worry if this sounds abstract right now — we’ll build the picture piece by piece and it’ll click quickly.

Sources produce data (reading from Kafka, a file, a database).

Operators transform it.

Sinks consume the output (writing to a database, another Kafka topic, a dashboard).

An operator is a unit of processing logic. For our recommendation engine, an operator might filter out bot traffic, or enrich an event with product metadata, or count how many times each product was viewed. Each operator receives records from one or more input streams, does something to them, and emits records to one or more output streams.

A stream is the sequence of records flowing between operators. In our case, a stream of user events: view events, click events, purchase events, one after another as they happen.

This is the basic shape of any Flink job.

Parallelism

A single machine can process events fast — but if you’re handling millions of users, a single machine isn’t enough. Flink solves this by running every operator in parallel: each operator is split into multiple subtasks that run simultaneously on different machines in your cluster.

If you have a Filter operator with parallelism 4, there are 4 instances running simultaneously, each processing a different portion of the stream. Add more machines, get more subtasks, handle higher volumes. This is how Flink scales to billions of events per day.
For our recommendation engine, this means the window aggregation for 10 million users isn’t running sequentially on one machine — it’s split across dozens of workers

State

Going back to our recommendation engine: when a user views a product, that single event on its own tells you almost nothing. You need context. What else has this user been viewing in the past few minutes? Have they been looking at products in the same category? Did they almost purchase something similar last session? To answer these questions, your system needs memory — it needs to remember what happened before.

In the early days of stream processing, most systems were stateless. Each event was processed in isolation: the operator saw the event, transformed it, moved on. No memory of what came before. This worked fine for simple pipelines — filtering out bot traffic, enriching events with metadata from a lookup table. But it was fundamentally too limited for anything that required reasoning about patterns over time.

Think about what our recommendation engine actually needs to do. For every incoming event, it needs to ask: “What has user u-8821 done in the last 10 minutes?” To answer that question, someone needs to be keeping a running list of user u-8821’s recent events. And user u-1042’s recent events. And all the other users. That’s state — data that accumulates and evolves as records flow through the operator, rather than being derived fresh from each individual record.
Flink makes state a first-class concept. An operator can declare state explicitly — a counter, a hash map keyed by user ID, a sorted list of recent events. Flink gives you that state as a managed object you can read and write during processing. For our recommendation engine, the state might be a hash map from user ID to “list of item IDs viewed in the last 10 minutes.” Every time a new view event arrives, you look up the user in the map, append the item, and trim events older than 10 minutes.

But managing state in a distributed system is genuinely hard. What happens when the machine running your operator crashes? That in-memory hash map is gone. Flink handles this: it periodically snapshots all operator state to durable storage, so on recovery it can restore everything to where it was before the failure. And it guarantees that state updates are applied exactly once — even if a machine crashes and the same events are replayed during recovery, your counts won’t be doubled.

We’ll go deep on how Flink achieves exactly-once guarantees in a future architecture post. For now, just know that Flink gives you state that feels as reliable as writing to a database, with the performance of an in-memory hash map.

Windows

We’ve got a stream of user events, operators running in parallel, and state accumulating per user. Now here’s a problem that comes up almost immediately in any real aggregation.

Let’s say you want to compute “the 10 most viewed products in the last 5 minutes” — to power a “trending now” section of your site. You have an operator that’s counting views per product. But your stream is infinite. When do you emit a result? You can’t wait until “all the events” arrive — they never stop arriving.

You need a way to slice the infinite stream into finite pieces and compute over each piece. That’s a window.

A window is a bounded chunk of your stream. You define it, Flink groups the events into that chunk, and when the chunk is “complete,” it runs your aggregation and emits a result. Flink has several window types, Tumbling Windows, Sliding Windows, Session Windows, etc. It’s not very important to understand the differences between each window type, but the gist of windows is that it looks at data for some time period.

Tidbits from the Original Paper

I spent some time reading the 2015 Apache Flink paper — ”Apache Flink: Stream and Batch Processing in a Single Engine” by Carbone, Katsifodimos, Ewen, Markl, Haridi, and Tzoumas. A few things from the paper that add useful color to what we covered above:

On Fault Tolerance and Exactly-Once Guarantees

The paper describes exactly-once semantics this way: “Flink offers strict exactly-once-processing consistency guarantees for stateful operators through a combination of distributed snapshots and partial re-execution upon recovery.” The key phrase there is *partial* re-execution — when a machine fails, Flink doesn’t restart the entire job from the beginning. It rolls back all operators to their last successful snapshot, then replays only the input from that point forward. The maximum amount of reprocessing is bounded by the gap between two consecutive checkpoints — which is a tunable parameter.

The mechanism that makes this work without pausing the computation is called Asynchronous Barrier Snapshotting (ABS) — and it’s genuinely clever. We’ll cover it in full detail in the next post. But the headline is: Flink injects special “barrier” markers into the data stream, which flow through the operators like regular records. When an operator receives a barrier, it snapshots its state to durable storage and forwards the barrier downstream — all while continuing to process records. No pause, no freeze, no missed events.

On Unified Batch and Stream Processing

One of the clearest statements in the paper is this: “A bounded data set is a special case of an unbounded data stream.” The authors are making a philosophical claim, not just a technical one. And they back it up: “Batch computations are executed by the same runtime as streaming computations. The runtime executable may be parameterized with blocked data streams to break up large computations into isolated stages that are scheduled successively.”
In plain terms: there is no separate batch engine in Flink. Batch jobs run on the exact same distributed dataflow runtime that processes your Kafka streams. The only difference is that batch jobs use “blocked” data exchange between stages — the upstream operator finishes fully before the downstream one starts. Everything else — the operator model, the state management, the serialization — is identical.

Going back to our recommendation engine: this means the job that counts real-time view trends and the job that processes 6 months of historical events for model retraining can share the same operators, the same cluster, and the same codebase. The paper’s promise is that the Lambda Architecture — with its two systems and two codebases — is simply no longer necessary.

Wrapping Up

Let’s quickly do a TLDR:

Data is produced as continuous streams, but we’ve historically forced it into batches — creating latency and the operational pain of maintaining two systems

Flink is built on the insight that batch is just a special case of streaming— and unifies both in a single engine

The core building blocks are: operators (processing logic), streams (data in motion), state (memory that persists across records), and windows (bounded slices of a stream for computation)
Fault tolerance with exactly-once guarantees is built in.

Ideally, I would have really liked to go in-depth to each of these topics (and there is a lot of depth in them), but this post has already become pretty long, so I’ll defer that to future Sanil for now. You can also follow me on LinkedIn for more byte-sized posts and to know what I am learning about right now.

We talked a lot about Apache Kafka (given its the backbone of most data architectures), but did you ever wonder how Apache Kafka works and what its architecture is like? I was surprised to learn how simple Kafka really is under the hood. I wrote a comprehensive blog post about it here —
System Design Series: Apache Kafka from 10,000 feet
Let’s look at what Kafka is, how it works and when should we use it!medium.com

If you’re looking for something more in-depth, I’d recommend checking out one of my most popular posts on Temporal, a workflow orchestration tool, with in-depth explanations on how events are scheduled, started and completed —
System Design Series: A Step-by-Step Breakdown of Temporal’s Internal Architecture
A step-by-step deep dive into Temporal’s architecture — covering workflows, tasks, shards, partitions, and how Temporal…medium.com

What's Hot

Sri Lanka discloses another missing payment, days after hackers stole $2.5M from its finance ministry

Saros reminded me how great the DualSense can be

Taylor Swift deepfakes are pushing scams on TikTok

Let the AI Do the Experimenting

Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

Correlation Doesn’t Mean Causation! But What Does It Mean?

Sri Lanka discloses another missing payment, days after hackers stole $2.5M from its finance ministry

Saros reminded me how great the DualSense can be

Taylor Swift deepfakes are pushing scams on TikTok

Quantization from the ground up

David Sacks is done as AI czar — here’s what he’s doing instead

Judge sides with Anthropic to temporarily block the Pentagon’s ban

Most Popular

Sri Lanka discloses another missing payment, days after hackers stole $2.5M from its finance ministry

Saros reminded me how great the DualSense can be

Taylor Swift deepfakes are pushing scams on TikTok

Our Picks

Quantization from the ground up

David Sacks is done as AI czar — here’s what he’s doing instead

Judge sides with Anthropic to temporarily block the Pentagon’s ban

Subscribe to Updates

What's Hot

System Design Series: Apache Flink from 10,000 Feet, and Building a Flink-powered Recommendation Engine

Before We Start

Back To The Problem: How We Actually Produce Data

The Key Insight: Batch Is Just a Special Case of Streaming

So What Is Apache Flink?

Core Concepts

Streams and Operators

Parallelism

State

Windows

Tidbits from the Original Paper

On Fault Tolerance and Exactly-Once Guarantees

On Unified Batch and Stream Processing

Wrapping Up

Related Posts

Subscribe to Updates