"ZooKeeper: Because coordinating distributed systems is a Zoo"
This post is short and not useful in that it doesn't give you any code snippets or technical recommendations. I would like just to cite the SolrCloud's wiki. The SolrCloud is "the set of Solr features that take Solr's distributed search to the next level, enabling and simplifying the creation and use of Solr clusters." They use Apache ZooKeeper project (a subproject of Hadoop) as a distributed system of keeping cluster of SOLRs state updates. In a distributed system every component can potentially crash, yet the system is expected to provide it's service to its users. If a single SOLR crashes, its replica will take over, but if ZooKeeper crashes, the system will still continue serving the user requests, but no updates of the system are visible to the system (sounds interesting, I know). To improve that, this is what is possible:
"Running multiple zookeeper servers in concert (a zookeeper ensemble) allows for high availability of the zookeeper service. Every zookeeper server needs to know about every other zookeeper server in the ensemble, and a majority of servers are needed to provide service. For example, a zookeeper ensemble of 3 servers allows any one to fail with the remaining 2 constituting a majority to continue providing service. 5 zookeeper servers are needed to allow for the failure of up to 2 servers at a time."
So, if you have a big zoo with variety of animals in it, make sure you have 5 zoo keepers for at least 3 of them take care of your pigs and elephants, when 2 others got stuck somewhere else.