Zookeeper is a cluster management solution. Therefore, Zookeeper needs to be set up as a cluster. If not, it can be the Single Point of Failure. Zookeeper cluster is called “ensemble”.
How to set up ensemble (based on v3.4.10)
Basic config for a single Zookeeper is as follows.
tickTime=2000 initLimit=10 syncLimit=5 dataDir=/home/tkstone/some/path/zoo1 clientPort=2181
To change it into ensemble, add some more options. For simple test, I configured 3 Zookeepers on the same host.
tickTime=2000 initLimit=10 syncLimit=5 dataDir=/home/tkstone/some/path/zoo1 clientPort=2181 server.1=localhost:2888:3888 server.2=localhost:2889:3889 server.3=localhost:2890:3890
tickTime=2000 initLimit=10 syncLimit=5 dataDir=/home/tkstone/some/path/zoo2 clientPort=2182 server.1=localhost:2888:3888 server.2=localhost:2889:3889 server.3=localhost:2890:3890
tickTime=2000 initLimit=10 syncLimit=5 dataDir=/home/tkstone/some/path/zoo3 clientPort=2183 server.1=localhost:2888:3888 server.2=localhost:2889:3889 server.3=localhost:2890:3890
For the above configurations, only clientPort and dataDir are different. “dataDir” is where Zookeeper snapshot is written.
And the most import option is “server.id=zoo_ip:port1:port2”
- id : id for each Zookeeper. It must be unique among ensemble
- port1 : used for leader’s listening port (only enabled for leader process)
- port2 : used for leader election, enabled for all processes
This is not enough. Inside each dataDir, a file, named “myid”, must be created. The file must have id value inside it. For example, for Zookeeper#1, /home/tkstone/some/path/zoo1/myid must have value “1”
How it works
When Zookeeper starts up, it looks for myid file and recognize itself’s id. After that, it tries to connect to the other nodes to elect leader (based on server.id option)
By default, Zookeeper ensemble works only if the majority nodes are running. (i.e 2 for 3 nodes ensemble) If it is not met, Zookeeper doesn’t show znode data.
If the majority nodes are running, the leader is elected. The others are followers.
To avoid split brain issue, Zookeeper ensemble works only if the majority nodes are running. But by experimental option (readonlymode.enabled), you can enable read only mode when the majority nodes are not running. (More on the option)
How to determine which node is the leader
If a process is chosen as the leader, the following log is written
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2182:Leader@371] - LEADING - LEADER ELECTION TOOK - 217
But the other processes have the following log
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@64] - FOLLOWING - LEADER ELECTION TOOK - 3711
For the above logs, Zookeeper#2 is the leader.