Apache Kafka

Introduction

In this article, we are going to talk about apache kafka which is an open-source distributed streaming platform that was developed by the Apache Software Foundation. It is designed to handle large volumes of real-time data and enables the efficient processing, storage, and analysis of data streams.

kafka Cluster

cluster :

Our cluster can be illustrated as below :

Producer :

In Kafka, a producer is a client application that is responsible for publishing data to one or more Kafka topics. Producers can be written in various programming languages such as Java, Python, or Scala using the Kafka client libraries.

Consumer :

In Kafka, a consumer is a client application that is responsible for reading data from one or more Kafka topics. Consumers can be written in various programming languages such as Java, Python, or Scala using the Kafka client libraries. Consumers can also be organized into consumer groups, where each group consists of one or more consumers that share the workload of consuming messages from a set of partitions. Kafka ensures that each partition is only consumed by one consumer within a group, which allows for load balancing and fault-tolerance.

Broker :

In Kafka, a broker is a server that is responsible for storing and managing Kafka topics and partitions. Brokers receive messages from producers and serve them to consumers, and they also replicate messages across a cluster of brokers for fault-tolerance and scalability. Topic is a category or feed name to which producers can write messages and consumers can subscribe to read messages. A topic is divided into one or more partitions, where each partition is an ordered, immutable sequence of messages.

Replication factor :

The replication factor in apache Kafka is the number of replicas (copies) of a partition that are maintained across multiple brokers in a cluster. A higher replication factor provides better fault tolerance and durability, as it ensures that the data is still available even if one or more brokers or disks fail.

Demo

For the sake of this demo, we are going to use some ready docker compose file to summon up apache kafka cluster with 1 broker and with apache zookeeper.

version: "3"
services:
	zookeeper:
		image: confluentinc/cp-zookeeper:7.3.2
		container_name: zookeeper
		environment:
		ZOOKEEPER_CLIENT_PORT: 2181
		ZOOKEEPER_TICK_TIME: 2000
	broker:
		image: confluentinc/cp-kafka:7.3.2
		container_name: broker
		ports:
		- "9092:9092"
		depends_on:
			- zookeeper
		environment:
			KAFKA_BROKER_ID: 1
			KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
			KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_INTERNAL:PLAINTEXT
			KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092,PLAINTEXT_INTERNAL://broker:29092
			KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
			KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
			KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1

This file can be found at confluent.

apache zookeeper :

Apache ZooKeeper is a distributed coordination service that is often used with Apache Kafka to manage and coordinate a cluster of brokers. ZooKeeper provides a highly available and fault-tolerant way of storing and managing configuration information, naming, synchronization, and other shared data among the nodes in a distributed system. So, we start by executing the following command the summon the broker and zookeeper containers :

bash docker compose up -d

Our containers are ready : imageifo Now we write our producer and consumer codes :

import  data
import  kafka
import  json
import  time

def  data_to_json():
	return  json.dumps(data.data_faker()).encode("utf-8")

producer=kafka.KafkaProducer(bootstrap_servers=['localhost:9092'])

def  produce():
	while  1==1:
		print("here")
		producer.send("registred_user",json.dumps(data.data_faker()).encode())
		time.sleep(3)

produce()

import  kafka

consumer=kafka.KafkaConsumer('registred_user', group_id='my_favorite_group',bootstrap_servers=["localhost:9092"])

def  consume():
	for  msg  in  consumer:
		print(msg.value)
consume()

Now we run our scripts, producer on the left and consumer on the right : image


and voila, every message generated by the producer is read by the consumer undet same topic.


Was this helpful? Confusing? If you have any questions, feel free to comment below! Make sure to follow on Linkedin : https://www.linkedin.com/in/malekzaag/ and github: https://github.com/Malek-Zaag if you’re interested in similar content and want to keep learning alongside me!