Apache Kafka

Kafka Documentation

Apache Kafka is a distributed streaming platform that allows you to publish and subscribe to streams of records, store them, and process them in real-time.

Key Features

  • High Throughput: Kafka can handle millions of messages per second.

  • Scalability: Easily scales horizontally by adding more brokers.

  • Fault Tolerance: Replicates data across multiple brokers to ensure no data loss.

  • Durability: Data is persisted to disk and can be configured to be retained indefinitely.

  • Real-time Processing: Enables real-time stream processing with low latency.

Core Concepts

  1. Producer: Sends records to Kafka topics.

  2. Consumer: Reads records from Kafka topics.

  3. Broker: A server that stores data and serves clients.

  4. Topic: A category or feed name to which records are stored and published.

  5. Partition: Topics are divided into partitions, enabling parallel processing.

  6. Offset: A unique identifier for each record within a partition.

  7. Zookeeper: Manages the Kafka cluster and maintains configuration information.

Example Usage

Producer Example

Copy

javascriptCopy codeconst { Kafka } = require('kafkajs');

const kafka = new Kafka({
  clientId: 'my-app',
  brokers: ['kafka1:9092', 'kafka2:9092']
});

const producer = kafka.producer();

const run = async () => {
  await producer.connect();
  await producer.send({
    topic: 'test-topic',
    messages: [{ key: 'key1', value: 'hello world' }]
  });
  await producer.disconnect();
};

run().catch(console.error);

Consumer Example

Copy

javascriptCopy codeconst { Kafka } = require('kafkajs');

const kafka = new Kafka({
  clientId: 'my-app',
  brokers: ['kafka1:9092', 'kafka2:9092']
});

const consumer = kafka.consumer({ groupId: 'test-group' });

const run = async () => {
  await consumer.connect();
  await consumer.subscribe({ topic: 'test-topic', fromBeginning: true });
  await consumer.run({
    eachMessage: async ({ topic, partition, message }) => {
      console.log({
        partition,
        offset: message.offset,
        value: message.value.toString()
      });
    }
  });
};

run().catch(console.error);

Benefits

  1. High Performance: Kafka's architecture is optimized for high throughput and low latency.

  2. Scalability: Easily scales to handle increasing amounts of data.

  3. Reliability: Ensures data is replicated and fault-tolerant.

  4. Real-time Processing: Ideal for applications that require real-time data processing.

Use Cases

  • Log Aggregation

  • Real-time Analytics

  • Event Sourcing

  • Messaging Systems

  • Data Integration