Building a robust Zookeeper client : connection management

To support high available Zookeeper service, we need to configure Zookeeper ensemble. But this is not enough. we need to understand how client manages connection to Zookeeper server.

For test, I set up a Zookeeper ensemble. (port : 2181, 2182, 2183) After then, write a test client with 2 important parameters – 1) connect string 2) session timeout

Case 1) connect string is targeting one zookeeper (ex. localhost:2181)

In this case, Zookeeper connects only to the server which the connect string is targeting.

If target server is shut down,

    • “Disconnected” event is triggered, if connection watcher is registered
    • Zookeeper client library tries to reconnect in background
    • When the server resumes, the connection is recovered automatically (if within session timeout) Therefore we don’t need to write code to reconnect
    • However, “SyncConnected” event is triggered
    • But if session timeout is over, the connection is closed again. In this case, “Expired” event is triggered and we need to write code to reconnect
    • The following message is shown when recovered connection has been expired
WARN : org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x169561caf4a0000 has expired
INFO : org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x169561caf4a0000 has expired, closing socket connection
INFO : org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x169561caf4a0000

Case 2) connect string is targeting multi zookeepers (ex. localhost:2181,localhost2182,localhost:2183)

In this case, Zookeeper client connects to one of servers arbitrarily. If connected server is shutdown,

  • Zookeeper client internally connects to another server within connect string
  • The following message is shown when connection failover happens.
INFO : org.apache.zookeeper.ClientCnxn - Unable to read additional data from server sessionid 0x169565171760000, likely server has closed socket, closing socket connection and attempting reconnect
INFO : org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/127.0.0.1:2182. Will not attempt to authenticate using SASL (unknown error)
INFO : org.apache.zookeeper.ClientCnxn - Socket connection established to localhost/127.0.0.1:2182, initiating session
DEBUG: org.apache.zookeeper.ClientCnxn - Session establishment request sent on localhost/127.0.0.1:2182
  • However, “Disconnected” and “SyncConnected” event are triggered in sequence.
  • We don’t need to write code to reconnect, but need to register watchers again if needed

How Session timeout is decided?

Session timeout value is important because it decides if resumed connection is available or not.

There are 3 factors to decide session timeout

  • server’s minSessionTimeout
  • server’s maxSessionTimeout
  • clients’s session timeout param

Compromising rule is

  • if client session timeout > server maxSessionTimeout, then server maxSessionTimeout is chosen
  • if client session timeout < server minSessionTimeout, then server minSessionTimeout is chosen
  • if client session timeout between server minSessionTimeout and maxSessionTimeout, then client session timeout is chosen
Compromized value can be verified with
  • zookeeper log
INFO : org.apache.zookeeper.ClientCnxn - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x1695661ae580001, negotiated timeout = 20000
  • Zookeeper.getSessionTimeout()

Conclustion

To build a rubust Zookeeper client, we need to
  • Use connect string which defines all servers in ensemble
  • Use adequate session timeout value
  • Write code for connection watcher (especially for SyncConnected, Disconnected, Expired event)

I’m showing a test program.

package test;

import org.apache.zookeeper.ZooKeeper;
import org.apache.zookeeper.data.Stat;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.Watcher;
import org.apache.zookeeper.Watcher.Event.KeeperState;

public class ZookeeperConnectionTest3 implements Watcher{
	private ZooKeeper zooKeeper;
	private String testNode = "/zookeeper";
	private boolean checkContinue;

	public static void main(String[] args) throws Exception{
		//String zooConnStr = "127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183";
		String zooConnStr = "127.0.0.1:2181";
		ZookeeperConnectionTest3 test = new ZookeeperConnectionTest3();
		test.zooKeeper = new ZooKeeper(zooConnStr, 20000, test);
		synchronized(test) {
			test.wait();
		}
	}

	@Override
	public void process(WatchedEvent event) {
		try {
			if(event.getState().equals(KeeperState.SyncConnected)) {
				System.out.println("********* Session Connected");
				startChecker();
			}
			else if(event.getState().equals(KeeperState.Disconnected)) {
				System.out.println("********* Session Disconnected");
				stopChecker();
			}
			else if(event.getState().equals(KeeperState.Expired)) {
				System.out.println("********* Session expired");
				synchronized(this) {
					this.notify();
				}
			}
		}
		catch(Exception e) {
			e.printStackTrace();
		}
	}

	private void startChecker() {
		this.checkContinue = true;
		Thread testThread = new Thread(new NodeCheckerRunnable());
		testThread.start();
	}

	private void stopChecker() {
		this.checkContinue = false;
	}

	private void checkNode() throws Exception{
		Stat stat = this.zooKeeper.exists(this.testNode, false);
		System.out.println("Check node successful : " + Boolean.toString(stat != null));
	}

	private class NodeCheckerRunnable implements Runnable{

		@Override
		public void run() {
			while(true) {
				if(checkContinue == false) break;
				try {
					checkNode();
				}
				catch(Exception e) {
					System.out.println(e.getMessage());
				}
				sleep(1000);
			}
			System.out.println("Node checker stopped");
		}

		private void sleep(long time) {
			try {
				Thread.sleep(time);
			}
			catch(Exception e) {

			}
		}
	}
}

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.