Learning Kafka - Kafka Fundamentals
Hello! I'm Yuvraj. I'm a Computer Science Student. I love to learn, create, and explore new things. I am currently doing a Bachelor of Computer Science from the University of Delhi.
Kafka Basic Concepts
Now that we have Kafka running, let's understand the key concepts that make Kafka work. I'll explain these in simple terms without jargon.
The Big Picture
Kafka is a system that lets different parts of your application talk to each other by passing messages. It's designed to handle huge amounts of data reliably.
Here's how it works at a high level:
Producers send messages to Kafka
Kafka stores these messages in Topics
Consumers read messages from Topics
Let's dive into each concept:
Topics: The Message Categories
A Topic is like a category or channel for your messages. Think of it like:
A folder where related messages are stored
A TV channel that broadcasts specific content
A mailbox for a specific type of mail
For example, you might have topics like:
user-signupsfor new user registrationsorder-placedfor new orderspayment-processedfor payment confirmations
Topics have these important characteristics:
They have a name (like "first-topic")
They can be split into multiple Partitions (more on this below)
They store messages in an ordered sequence
Partitions: Splitting Up Topics for Scale
A Partition is a way to divide a topic into multiple parts. This is important because:
It allows Kafka to store more data than can fit on a single server
It enables parallel processing of messages
Think of partitions like:
Multiple checkout lines at a grocery store
Multiple lanes on a highway
Multiple workers in an assembly line
Each partition:
Is an ordered sequence of messages
Is stored on a single server (called a broker)
Can be replicated to other servers for fault tolerance
Has messages identified by their position (called an offset)
Messages: The Data Being Sent
A Message is the basic unit of data in Kafka. It's what producers send and consumers read.
A message consists of:
A key (optional): Helps determine which partition the message goes to
A value: The actual data being sent (can be text, JSON, binary, etc.)
A timestamp: When the message was created
Headers (optional): Additional metadata
Messages are immutable - once they're written to Kafka, they don't change.
Producers: Sending Messages
A Producer is an application that sends messages to Kafka topics.
Producers:
Connect to Kafka brokers
Serialize messages (convert them to a format that can be transmitted)
Can choose which partition to send messages to (or let Kafka decide)
Can wait for acknowledgment that messages were received
Consumers: Reading Messages
A Consumer is an application that reads messages from Kafka topics.
Consumers:
Connect to Kafka brokers
Subscribe to one or more topics
Deserialize messages (convert them back from transmission format)
Keep track of which messages they've read using offsets
Consumer Groups: Scaling Consumption
A Consumer Group is a set of consumers that work together to process messages from topics.
Consumer groups allow you to:
Process messages in parallel (each consumer handles a subset of partitions)
Scale processing by adding more consumers
Provide fault tolerance (if one consumer fails, others take over)
The key rule: Each partition is consumed by only one consumer in a group
This means:
If you have more consumers than partitions, some consumers will be idle
If you have fewer consumers than partitions, some consumers will handle multiple partitions
Brokers: The Kafka Servers
A Broker is a Kafka server that:
Stores partitions
Handles producer and consumer requests
Manages replication of partitions
A Kafka cluster consists of multiple brokers working together.
ZooKeeper: The Coordinator
ZooKeeper is a service that helps coordinate the Kafka cluster:
Keeps track of which brokers are alive
Helps elect a controller (a broker that manages the cluster)
Stores configuration information
Putting It All Together
Here's how all these concepts work together:
Producers send messages to topics
Topics are divided into partitions for scalability
Partitions are stored on brokers (Kafka servers)
Consumers read messages from topics
Consumer groups allow parallel processing
ZooKeeper coordinates the whole system
Visual Representation
┌─────────────┐ ┌───────────────────────────────────┐ ┌─────────────┐
│ │ │ KAFKA │ │ │
│ Producers │────▶│ ┌─────────┐ ┌─────────┐ │ │ Consumers │
│ │ │ │ Topic A │ │ Topic B │ │ │ │
└─────────────┘ │ │ Part 1 │ │ Part 1 │ │ └─────────────┘
│ │ Part 2 │ │ Part 2 │ │ ▲
│ │ Part 3 │ │ Part 3 │ │ │
│ └─────────┘ └─────────┘ │ │
│ │ │
└───────────────────────────────────┘ │
│ │
│ │
▼ │
┌───────────────────────────────────┐ │
│ ZooKeeper │ │
│ (Coordinates Kafka Brokers) │ │
└───────────────────────────────────┘ │
│
┌───────────────────────────────────┐ │
│ Consumer Group │───────────┘
│ (Distributes work among │
│ multiple consumers) │
└───────────────────────────────────┘
Next Steps
Now that you understand the basic concepts, let's create our first Kafka producer in the next section.