Apache Kafka
Introduction
In this article, we are going to talk about apache kafka which is an open-source distributed streaming platform that was developed by the Apache Software Foundation. It is designed to handle large volumes of real-time data and enables the efficient processing, storage, and analysis of data streams.
kafka Cluster
cluster :
Our cluster can be illustrated as below :
Producer :
In Kafka, a producer is a client application that is responsible for publishing data to one or more Kafka topics. Producers can be written in various programming languages such as Java, Python, or Scala using the Kafka client libraries.
Consumer :
In Kafka, a consumer is a client application that is responsible for reading data from one or more Kafka topics. Consumers can be written in various programming languages such as Java, Python, or Scala using the Kafka client libraries. Consumers can also be organized into consumer groups, where each group consists of one or more consumers that share the workload of consuming messages from a set of partitions. Kafka ensures that each partition is only consumed by one consumer within a group, which allows for load balancing and fault-tolerance.
Broker :
In Kafka, a broker is a server that is responsible for storing and managing Kafka topics and partitions. Brokers receive messages from producers and serve them to consumers, and they also replicate messages across a cluster of brokers for fault-tolerance and scalability. Topic is a category or feed name to which producers can write messages and consumers can subscribe to read messages. A topic is divided into one or more partitions, where each partition is an ordered, immutable sequence of messages.
Replication factor :
The replication factor in apache Kafka is the number of replicas (copies) of a partition that are maintained across multiple brokers in a cluster. A higher replication factor provides better fault tolerance and durability, as it ensures that the data is still available even if one or more brokers or disks fail.
Demo
For the sake of this demo, we are going to use some ready docker compose file to summon up apache kafka cluster with 1 broker and with apache zookeeper.
version: "3"
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.3.2
container_name: zookeeper
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
broker:
image: confluentinc/cp-kafka:7.3.2
container_name: broker
ports:
- "9092:9092"
depends_on:
- zookeeper
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_INTERNAL:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092,PLAINTEXT_INTERNAL://broker:29092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
This file can be found at confluent.
apache zookeeper :
Apache ZooKeeper is a distributed coordination service that is often used with Apache Kafka to manage and coordinate a cluster of brokers. ZooKeeper provides a highly available and fault-tolerant way of storing and managing configuration information, naming, synchronization, and other shared data among the nodes in a distributed system. So, we start by executing the following command the summon the broker and zookeeper containers :
bash docker compose up -d
Our containers are ready : Now we write our producer and consumer codes :
import data
import kafka
import json
import time
def data_to_json():
return json.dumps(data.data_faker()).encode("utf-8")
producer=kafka.KafkaProducer(bootstrap_servers=['localhost:9092'])
def produce():
while 1==1:
print("here")
producer.send("registred_user",json.dumps(data.data_faker()).encode())
time.sleep(3)
produce()
import kafka
consumer=kafka.KafkaConsumer('registred_user', group_id='my_favorite_group',bootstrap_servers=["localhost:9092"])
def consume():
for msg in consumer:
print(msg.value)
consume()
Now we run our scripts, producer on the left and consumer on the right :
and voila, every message generated by the producer is read by the consumer undet same topic.
Was this helpful? Confusing? If you have any questions, feel free to comment below! Make sure to follow on Linkedin : https://www.linkedin.com/in/malekzaag/ and github: https://github.com/Malek-Zaag if you’re interested in similar content and want to keep learning alongside me!