Apache Kafka
Kafka Documentation
Apache Kafka is a distributed streaming platform that allows you to publish and subscribe to streams of records, store them, and process them in real-time.
Key Features
High Throughput: Kafka can handle millions of messages per second.
Scalability: Easily scales horizontally by adding more brokers.
Fault Tolerance: Replicates data across multiple brokers to ensure no data loss.
Durability: Data is persisted to disk and can be configured to be retained indefinitely.
Real-time Processing: Enables real-time stream processing with low latency.
Core Concepts
Producer: Sends records to Kafka topics.
Consumer: Reads records from Kafka topics.
Broker: A server that stores data and serves clients.
Topic: A category or feed name to which records are stored and published.
Partition: Topics are divided into partitions, enabling parallel processing.
Offset: A unique identifier for each record within a partition.
Zookeeper: Manages the Kafka cluster and maintains configuration information.
Example Usage
Producer Example
Copy
javascriptCopy codeconst { Kafka } = require('kafkajs');
const kafka = new Kafka({
clientId: 'my-app',
brokers: ['kafka1:9092', 'kafka2:9092']
});
const producer = kafka.producer();
const run = async () => {
await producer.connect();
await producer.send({
topic: 'test-topic',
messages: [{ key: 'key1', value: 'hello world' }]
});
await producer.disconnect();
};
run().catch(console.error);
Consumer Example
Copy
javascriptCopy codeconst { Kafka } = require('kafkajs');
const kafka = new Kafka({
clientId: 'my-app',
brokers: ['kafka1:9092', 'kafka2:9092']
});
const consumer = kafka.consumer({ groupId: 'test-group' });
const run = async () => {
await consumer.connect();
await consumer.subscribe({ topic: 'test-topic', fromBeginning: true });
await consumer.run({
eachMessage: async ({ topic, partition, message }) => {
console.log({
partition,
offset: message.offset,
value: message.value.toString()
});
}
});
};
run().catch(console.error);
Benefits
High Performance: Kafka's architecture is optimized for high throughput and low latency.
Scalability: Easily scales to handle increasing amounts of data.
Reliability: Ensures data is replicated and fault-tolerant.
Real-time Processing: Ideal for applications that require real-time data processing.
Use Cases
Log Aggregation
Real-time Analytics
Event Sourcing
Messaging Systems
Data Integration