Building Kafka cluster is crucial for the production system.
Kafka cluster gives the following advantages.
- Support for failover in case of a node down
- Queue replication
- Support for consumer scale out (standalone Kafka also supports consumer scale out)
Step 1 – Build Zookeeper ensemble
Kafka depends on Zookeeper for it’s configuration management. Therefore, Zookeeper needs to run in cluster. (Refer to Zookeeper ensemble)
Step 2 – Run multiple Kafka
Kafka doesn’t need a specific configuration except
- broker.id : each Kafka process must have a unique broker id
- zookeeper.connect : Kafka cluster must connect to the same zookeeper
For test, I’v set different listeners and log.dirs to run multiple Kafka on a PC.
Step 3 – Create cluster enabled Topics
Creating a cluster enabled Topic is not different from a standalone Topic.
${KAFKA_HOME}/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 10 --topic test-topic4
replication-factor : This sets data redundancy. The number is the total number of an item. (not that of redundancy) For example, if the value is 2, an item is saved on 2 nodes including leader node
partitions : a Topic has multiple partitions to scale out consumers
After creating a Topic, you can check the status.
[tkstone@localhost bin]$ ./kafka-topics.sh --describe --zookeeper localhost:2181 Topic:test-topic4 PartitionCount:10 ReplicationFactor:2 Configs: Topic: test-topic4 Partition: 0 Leader: 2 Replicas: 2,1 Isr: 1,2 Topic: test-topic4 Partition: 1 Leader: 0 Replicas: 0,2 Isr: 0,2 Topic: test-topic4 Partition: 2 Leader: 1 Replicas: 1,0 Isr: 0,1 Topic: test-topic4 Partition: 3 Leader: 2 Replicas: 2,0 Isr: 0,2 Topic: test-topic4 Partition: 4 Leader: 0 Replicas: 0,1 Isr: 0,1 Topic: test-topic4 Partition: 5 Leader: 1 Replicas: 1,2 Isr: 1,2 Topic: test-topic4 Partition: 6 Leader: 2 Replicas: 2,1 Isr: 1,2 Topic: test-topic4 Partition: 7 Leader: 0 Replicas: 0,2 Isr: 0,2 Topic: test-topic4 Partition: 8 Leader: 1 Replicas: 1,0 Isr: 0,1 Topic: test-topic4 Partition: 9 Leader: 2 Replicas: 2,0 Isr: 0,2
Above result shows which parition is set on which nodes. Some important points are
- ISR means “in sync replica”. It is node ids which are having (copying) the topic partition
- Leader means leader node for the partition. If the leader is shut down, one of replicas is chosen as the new leader
- Replicas is the list of defined leader and slaves. If one of Replicas is down, Replicas and Isr value don’t match
You can check the status also on Zookeeper.
[zk: localhost:2181(CONNECTED) 22] ls /brokers/topics/test-topic4/partitions [3, 2, 1, 0, 7, 6, 5, 4, 9, 8] [zk: localhost:2181(CONNECTED) 21] get /brokers/topics/test-topic4/partitions/0/state {"controller_epoch":8,"leader":2,"version":1,"leader_epoch":5,"isr":[1,2]}
Next time, I’ll write a post on managing Kafka Topic.