Apache Kafka

 

Apache Kafka is one of the most widely used distributed event streaming platforms used by companies such as Netflix, LinkedIn, Uber, and Amazon. This guide explains Kafka architecture, internal components, and real-world usage for software engineers.

1. What is a Message Broker?

A message broker is a middleware system that enables applications to communicate asynchronously by sending messages between producers and consumers.

Message brokers decouple services and improve scalability in distributed systems.

Architecture

Producer --> Message Broker --> Consumer

Benefits

  • Loose coupling between services
  • Asynchronous communication
  • Improved scalability

2. What is ZooKeeper?

ZooKeeper is a distributed coordination service used by Kafka to manage cluster metadata, broker information, and leader election.

ZooKeeper maintains the state of the Kafka cluster including broker registration and topic configuration.

Responsibilities

  • Leader election
  • Cluster metadata storage
  • Configuration management
Kafka Brokers
     |
ZooKeeper Cluster

3. Major Components of Apache Kafka

Kafka architecture consists of producers, brokers, topics, partitions, consumers, and ZooKeeper.

Components

Component Description
Producer Sends messages to Kafka topics
Broker Kafka server storing messages
Topic Category of messages
Consumer Reads messages from topics
Partition Splits topics for scalability

Architecture

Producer --> Topic --> Partition --> Consumer

4. What is a Partition in Kafka?

A partition is a subset of a Kafka topic that enables parallel processing and horizontal scalability.

Each partition is an ordered, immutable sequence of messages.

Example

Topic: Orders

Partition 1
Partition 2
Partition 3

Benefits

  • Parallel message processing
  • Improved throughput

5. How Kafka Prevents Duplicates

Kafka prevents duplicates using idempotent producers and transactional messaging.

Mechanisms

  • Idempotent Producer
  • Producer retries with sequence numbers
  • Exactly-once semantics

Producer Example

props.put("enable.idempotence", "true");

6. What is Kafka Schema Registry?

Schema Registry stores message schemas and ensures producers and consumers maintain consistent data formats.

It is commonly used with Avro, Protobuf, or JSON schemas.

Benefits

  • Schema validation
  • Backward compatibility
  • Data governance

7. What are In-Sync Replicas (ISR)?

ISR represents replicas of a partition that are fully synchronized with the leader partition.

Kafka uses replication to ensure high availability.

Example

Leader Partition
     |
Follower Replica 1
Follower Replica 2
If a leader fails, a replica from ISR becomes the new leader.

8. What is a Consumer Group?

A consumer group is a group of consumers that collectively read data from a Kafka topic.

Key Points

  • Each partition is consumed by only one consumer in the group
  • Provides parallel processing

Example

Topic (3 partitions)

Consumer Group A
Consumer 1 -> Partition 1
Consumer 2 -> Partition 2
Consumer 3 -> Partition 3

9. How Kafka Enforces Security

Kafka provides security through authentication, authorization, and encryption mechanisms.

Security Methods

  • SSL Encryption
  • SASL Authentication
  • ACL Authorization

Example

security.protocol=SASL_SSL

10. Understanding Kafka Message Format

Kafka messages consist of key, value, timestamp, and offset metadata.

Message Structure

Message
 |
 +-- Key
 +-- Value
 +-- Timestamp
 +-- Offset

Explanation

  • Key determines partition
  • Value contains actual data
  • Offset identifies message position

11. Message Size in Kafka

Kafka messages typically have a default size limit of 1MB.

Configuration

message.max.bytes=1000000

Best Practice

  • Keep messages small
  • Use object storage for large payloads

12. Apache Kafka vs Apache Camel

Feature Kafka Camel
Purpose Distributed event streaming Integration framework
Architecture Message broker Routing engine
Usage Event streaming System integration

0 comments

Leave a comment