Some people said that Tomcat session cluster is not fit for high traffic web services. So I did performance test. (both DeltaManager and BackupManager)
An ideal cluster is whose total TPS shows linear graph. But Tomcat session cluster showed log graph. (it’s coefficient is dependent on usage pattern)
I analyzed why it happened. The following is what I investigated.
- Tomcat session manager shares session data with all the other nodes. Therefore, as node count increases, background communication overhead increases lineally.
- A : node count
- B : total incoming traffic
- Total traffic = B + B/A * (A – 1) * A = B + (A – 1) * B
- If cluster is not used, total traffic is equal to B. But with session cluster, (A – 1) * B is additional traffic which increases as node count (A) increases.
To mitigate this issue, Tomcat introduced 1) DeltaManager and 2) BackupManager.
DeltaManager copies incremental data instead of all. And BackupManager copies session value only to one additional node (Backup node). But all the other nodes get session id and master/slave location. Therefore, total traffic count does not change. (Even though network bandwidth decreases.)
When I put clustered servers into high traffic, total TPS saturated, average response time got long, but OS cpu usage decreased. This showed that IO blocking is causing the bottleneck.
The alternatives are 1) to speed up Tribe or 2) to remove node to node communication.
The way for removing node to node communication is to use fast shared session storage. (RDBMS is not good for session storage)
Next time, I’ll explain how to develop custom session manager which is using shared session storage.