Apache Kafka is one of the most widely used distributed event streaming platforms used by companies such as Netflix, LinkedIn, Uber, and Amazon. This guide explains Kafka architecture, internal components, and real-world usage for software engineers.
1. What is a Message Broker?
A message broker is a middleware system that enables applications to communicate asynchronously by sending messages between producers and consumers.
Message brokers decouple services and improve scalability in distributed systems.
Architecture
Producer --> Message Broker --> Consumer
Benefits
- Loose coupling between services
- Asynchronous communication
- Improved scalability
2. What is ZooKeeper?
ZooKeeper is a distributed coordination service used by Kafka to manage cluster metadata, broker information, and leader election.
ZooKeeper maintains the state of the Kafka cluster including broker registration and topic configuration.
Responsibilities
- Leader election
- Cluster metadata storage
- Configuration management
Kafka Brokers
|
ZooKeeper Cluster
3. Major Components of Apache Kafka
Kafka architecture consists of producers, brokers, topics, partitions, consumers, and ZooKeeper.
Components
| Component | Description |
|---|---|
| Producer | Sends messages to Kafka topics |
| Broker | Kafka server storing messages |
| Topic | Category of messages |
| Consumer | Reads messages from topics |
| Partition | Splits topics for scalability |
Architecture
Producer --> Topic --> Partition --> Consumer
4. What is a Partition in Kafka?
A partition is a subset of a Kafka topic that enables parallel processing and horizontal scalability.
Each partition is an ordered, immutable sequence of messages.
Example
Topic: Orders Partition 1 Partition 2 Partition 3
Benefits
- Parallel message processing
- Improved throughput
5. How Kafka Prevents Duplicates
Kafka prevents duplicates using idempotent producers and transactional messaging.
Mechanisms
- Idempotent Producer
- Producer retries with sequence numbers
- Exactly-once semantics
Producer Example
props.put("enable.idempotence", "true");
6. What is Kafka Schema Registry?
Schema Registry stores message schemas and ensures producers and consumers maintain consistent data formats.
It is commonly used with Avro, Protobuf, or JSON schemas.
Benefits
- Schema validation
- Backward compatibility
- Data governance
7. What are In-Sync Replicas (ISR)?
ISR represents replicas of a partition that are fully synchronized with the leader partition.
Kafka uses replication to ensure high availability.
Example
Leader Partition
|
Follower Replica 1
Follower Replica 2
If a leader fails, a replica from ISR becomes the new leader.
8. What is a Consumer Group?
A consumer group is a group of consumers that collectively read data from a Kafka topic.
Key Points
- Each partition is consumed by only one consumer in the group
- Provides parallel processing
Example
Topic (3 partitions) Consumer Group A Consumer 1 -> Partition 1 Consumer 2 -> Partition 2 Consumer 3 -> Partition 3
9. How Kafka Enforces Security
Kafka provides security through authentication, authorization, and encryption mechanisms.
Security Methods
- SSL Encryption
- SASL Authentication
- ACL Authorization
Example
security.protocol=SASL_SSL
10. Understanding Kafka Message Format
Kafka messages consist of key, value, timestamp, and offset metadata.
Message Structure
Message | +-- Key +-- Value +-- Timestamp +-- Offset
Explanation
- Key determines partition
- Value contains actual data
- Offset identifies message position
11. Message Size in Kafka
Kafka messages typically have a default size limit of 1MB.
Configuration
message.max.bytes=1000000
Best Practice
- Keep messages small
- Use object storage for large payloads
12. Apache Kafka vs Apache Camel
| Feature | Kafka | Camel |
|---|---|---|
| Purpose | Distributed event streaming | Integration framework |
| Architecture | Message broker | Routing engine |
| Usage | Event streaming | System integration |
0 comments