Apache Kafka

Kafka Documentation

Apache Kafka is a distributed streaming platform that allows you to publish and subscribe to streams of records, store them, and process them in real-time.

Key Features

High Throughput: Kafka can handle millions of messages per second.
Scalability: Easily scales horizontally by adding more brokers.
Fault Tolerance: Replicates data across multiple brokers to ensure no data loss.
Durability: Data is persisted to disk and can be configured to be retained indefinitely.
Real-time Processing: Enables real-time stream processing with low latency.

Core Concepts

Producer: Sends records to Kafka topics.
Consumer: Reads records from Kafka topics.
Broker: A server that stores data and serves clients.
Topic: A category or feed name to which records are stored and published.
Partition: Topics are divided into partitions, enabling parallel processing.
Offset: A unique identifier for each record within a partition.
Zookeeper: Manages the Kafka cluster and maintains configuration information.

Example Usage

Producer Example

Copy

javascriptCopy codeconst { Kafka } = require('kafkajs');

const kafka = new Kafka({
  clientId: 'my-app',
  brokers: ['kafka1:9092', 'kafka2:9092']
});

const producer = kafka.producer();

const run = async () => {
  await producer.connect();
  await producer.send({
    topic: 'test-topic',
    messages: [{ key: 'key1', value: 'hello world' }]
  });
  await producer.disconnect();
};

run().catch(console.error);

Consumer Example

Copy

javascriptCopy codeconst { Kafka } = require('kafkajs');

const kafka = new Kafka({
  clientId: 'my-app',
  brokers: ['kafka1:9092', 'kafka2:9092']
});

const consumer = kafka.consumer({ groupId: 'test-group' });

const run = async () => {
  await consumer.connect();
  await consumer.subscribe({ topic: 'test-topic', fromBeginning: true });
  await consumer.run({
    eachMessage: async ({ topic, partition, message }) => {
      console.log({
        partition,
        offset: message.offset,
        value: message.value.toString()
      });
    }
  });
};

run().catch(console.error);

Benefits

High Performance: Kafka's architecture is optimized for high throughput and low latency.
Scalability: Easily scales to handle increasing amounts of data.
Reliability: Ensures data is replicated and fault-tolerant.
Real-time Processing: Ideal for applications that require real-time data processing.

Use Cases

Log Aggregation
Real-time Analytics
Event Sourcing
Messaging Systems
Data Integration