To support high available Zookeeper service, we need to configure Zookeeper ensemble. But this is not enough. we need to understand how client manages connection to Zookeeper server.
For test, I set up a Zookeeper ensemble. (port : 2181, 2182, 2183) After then, write a test client with 2 important parameters – 1) connect string 2) session timeout
Case 1) connect string is targeting one zookeeper (ex. localhost:2181)
In this case, Zookeeper connects only to the server which the connect string is targeting.
If target server is shut down,
-
- “Disconnected” event is triggered, if connection watcher is registered
- Zookeeper client library tries to reconnect in background
- When the server resumes, the connection is recovered automatically (if within session timeout) Therefore we don’t need to write code to reconnect
- However, “SyncConnected” event is triggered
- But if session timeout is over, the connection is closed again. In this case, “Expired” event is triggered and we need to write code to reconnect
- The following message is shown when recovered connection has been expired
WARN : org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x169561caf4a0000 has expired INFO : org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x169561caf4a0000 has expired, closing socket connection INFO : org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x169561caf4a0000
Case 2) connect string is targeting multi zookeepers (ex. localhost:2181,localhost2182,localhost:2183)
In this case, Zookeeper client connects to one of servers arbitrarily. If connected server is shutdown,
- Zookeeper client internally connects to another server within connect string
- The following message is shown when connection failover happens.
INFO : org.apache.zookeeper.ClientCnxn - Unable to read additional data from server sessionid 0x169565171760000, likely server has closed socket, closing socket connection and attempting reconnect INFO : org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/127.0.0.1:2182. Will not attempt to authenticate using SASL (unknown error) INFO : org.apache.zookeeper.ClientCnxn - Socket connection established to localhost/127.0.0.1:2182, initiating session DEBUG: org.apache.zookeeper.ClientCnxn - Session establishment request sent on localhost/127.0.0.1:2182
- However, “Disconnected” and “SyncConnected” event are triggered in sequence.
- We don’t need to write code to reconnect, but need to register watchers again if needed
How Session timeout is decided?
Session timeout value is important because it decides if resumed connection is available or not.
There are 3 factors to decide session timeout
- server’s minSessionTimeout
- server’s maxSessionTimeout
- clients’s session timeout param
Compromising rule is
- if client session timeout > server maxSessionTimeout, then server maxSessionTimeout is chosen
- if client session timeout < server minSessionTimeout, then server minSessionTimeout is chosen
- if client session timeout between server minSessionTimeout and maxSessionTimeout, then client session timeout is chosen
- zookeeper log
INFO : org.apache.zookeeper.ClientCnxn - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x1695661ae580001, negotiated timeout = 20000
- Zookeeper.getSessionTimeout()
Conclustion
- Use connect string which defines all servers in ensemble
- Use adequate session timeout value
- Write code for connection watcher (especially for SyncConnected, Disconnected, Expired event)
I’m showing a test program.
package test; import org.apache.zookeeper.ZooKeeper; import org.apache.zookeeper.data.Stat; import org.apache.zookeeper.WatchedEvent; import org.apache.zookeeper.Watcher; import org.apache.zookeeper.Watcher.Event.KeeperState; public class ZookeeperConnectionTest3 implements Watcher{ private ZooKeeper zooKeeper; private String testNode = "/zookeeper"; private boolean checkContinue; public static void main(String[] args) throws Exception{ //String zooConnStr = "127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183"; String zooConnStr = "127.0.0.1:2181"; ZookeeperConnectionTest3 test = new ZookeeperConnectionTest3(); test.zooKeeper = new ZooKeeper(zooConnStr, 20000, test); synchronized(test) { test.wait(); } } @Override public void process(WatchedEvent event) { try { if(event.getState().equals(KeeperState.SyncConnected)) { System.out.println("********* Session Connected"); startChecker(); } else if(event.getState().equals(KeeperState.Disconnected)) { System.out.println("********* Session Disconnected"); stopChecker(); } else if(event.getState().equals(KeeperState.Expired)) { System.out.println("********* Session expired"); synchronized(this) { this.notify(); } } } catch(Exception e) { e.printStackTrace(); } } private void startChecker() { this.checkContinue = true; Thread testThread = new Thread(new NodeCheckerRunnable()); testThread.start(); } private void stopChecker() { this.checkContinue = false; } private void checkNode() throws Exception{ Stat stat = this.zooKeeper.exists(this.testNode, false); System.out.println("Check node successful : " + Boolean.toString(stat != null)); } private class NodeCheckerRunnable implements Runnable{ @Override public void run() { while(true) { if(checkContinue == false) break; try { checkNode(); } catch(Exception e) { System.out.println(e.getMessage()); } sleep(1000); } System.out.println("Node checker stopped"); } private void sleep(long time) { try { Thread.sleep(time); } catch(Exception e) { } } } }