Introduction
Apache Kafka is a distributed streaming platform used to build real-time data pipelines and streaming applications. It is designed to handle a high throughput of data streams, providing features like fault tolerance, scalability, and durability. Kafka is widely used in industries for real-time analytics, monitoring, and event-driven architectures. This cheat sheet provides a quick reference to the most commonly used Kafka concepts and commands.
Apache Kafka Concepts
Concept | Description |
---|---|
Topic | A category or feed name to which records are published. Topics are partitioned and can have multiple consumers. |
Partition | A division of a topic, allowing data to be distributed across multiple brokers. Each partition is ordered and immutable. |
Broker | A Kafka server that stores data and serves clients. Multiple brokers form a Kafka cluster. |
Producer | An application that writes records to a Kafka topic. |
Consumer | An application that reads records from a Kafka topic. Consumers can be part of a consumer group. |
Consumer Group | A group of consumers that work together to consume data from a topic. Each partition is consumed by one consumer within a group. |
Zookeeper | Manages and coordinates the Kafka brokers. It handles leader election, configuration management, and more. |
Offset | A unique identifier for a record within a partition. The offset keeps track of which records have been consumed. |
Replication | The process of duplicating partitions across multiple brokers to ensure data availability and fault tolerance. |
Retention | The duration for which Kafka retains records in a topic before they are deleted. |
Apache Kafka Commands Cheat Sheet
Command | Description |
---|---|
kafka-topics.sh --create |
Creates a new topic with specified configurations. |
kafka-topics.sh --list |
Lists all the topics available in the Kafka cluster. |
kafka-topics.sh --describe |
Describes the details of a specific topic, including partition and replica details. |
kafka-topics.sh --delete |
Deletes a topic from the Kafka cluster. |
kafka-console-producer.sh --topic <topic-name> |
Sends data to a Kafka topic using the console producer. |
kafka-console-consumer.sh --topic <topic-name> |
Reads data from a Kafka topic using the console consumer. |
kafka-console-consumer.sh --from-beginning |
Reads all data from the beginning of the topic using the console consumer. |
kafka-console-consumer.sh --bootstrap-server <server> |
Specifies the Kafka broker to connect to when using the console consumer. |
kafka-consumer-groups.sh --list |
Lists all consumer groups available in the Kafka cluster. |
kafka-consumer-groups.sh --describe --group <group-name> |
Describes the details of a specific consumer group, including offsets and lag. |
kafka-consumer-groups.sh --reset-offsets |
Resets the offsets of a consumer group to a specific point (e.g., beginning, end, or a specific offset). |
kafka-replica-verification.sh --verify |
Verifies that all replicas in the cluster are in sync with their leaders. |
kafka-acls.sh --add |
Adds access control lists (ACLs) for users to restrict or allow access to Kafka resources. |
kafka-configs.sh --alter |
Alters the configuration of a Kafka topic, broker, or client. |
kafka-configs.sh --describe |
Describes the current configuration of a Kafka topic, broker, or client. |
kafka-configs.sh --bootstrap-server <server> |
Specifies the Kafka broker to connect to when configuring Kafka resources. |
Explanation and Examples of Apache Kafka Commands
kafka-topics.sh –create
Description: Creates a new topic with specified configurations such as partition count, replication factor, and more.
Example:
kafka-topics.sh --create --topic my-topic --partitions 3 --replication-factor 2 --bootstrap-server localhost:9092
Explanation: This command creates a topic named my-topic
with 3 partitions and a replication factor of 2.
kafka-console-producer.sh –topic <topic-name>
Description: Sends data to a Kafka topic using the console producer.
Example:
kafka-console-producer.sh --topic my-topic --bootstrap-server localhost:9092
Explanation: This command starts a console producer that sends messages to the topic my-topic
.
kafka-console-consumer.sh –topic <topic-name> –from-beginning
Description: Reads all data from the beginning of the topic using the console consumer.
Example:
kafka-console-consumer.sh --topic my-topic --from-beginning --bootstrap-server localhost:9092
Explanation: This command starts a console consumer that reads all messages from the topic my-topic
from the beginning.
kafka-consumer-groups.sh –describe –group <group-name>
Description: Describes the details of a specific consumer group, including offsets and lag.
Example:
kafka-consumer-groups.sh --describe --group my-group --bootstrap-server localhost:9092
Explanation: This command provides detailed information about the consumer group my-group
.
kafka-configs.sh –alter
Description: Alters the configuration of a Kafka topic, broker, or client.
Example:
kafka-configs.sh --alter --entity-type topics --entity-name my-topic --add-config retention.ms=604800000 --bootstrap-server localhost:9092
Explanation: This command sets the retention period of the topic my-topic
to 7 days (604800000 milliseconds).
Conclusion
Apache Kafka is a robust and versatile platform for building real-time streaming data pipelines and applications. This cheat sheet provides a quick reference to the key concepts and commands in Kafka, helping you manage and operate Kafka clusters more effectively. Keep this guide handy as you work with Kafka to ensure smooth and efficient data streaming. Happy coding!