6.2. Monitoring Overall Cluster Health


VoltDB Home » Documentation » Enterprise Manager Guide

6.2. Monitoring Overall Cluster Health

In addition to information about database activity, the Enterprise Manager also provides a quick view of the overall health of your database clusters. In the list of databases to the left of the dashboard, each database is represented by a colored icon, with different colors indicating the health of that entity. When the database is not running, the icon is gray. When the database is running properly, the icon is green.

If communication with a server fails while the database is running (for example, if the network fails or the VoltDB process stops on that node) the icon turns yellow or red, depending upon the consequences. If it is a "K-safe" database (that is, the K-safety value allows for the remaining nodes of the cluster to continue), the database's icon turns yellow, indicating there is a problem but the database is still operational. If there are insufficient nodes to continue, the database stops and the icon turns red.

By clicking on the icon or name of the database in the list and choosing View from the popup menu, you can switch the dashboard to show that database and examine the situation more closely. In the dashboard, not only is the database icon color coded, but the servers in the server list are as well. So you can determine which server is in trouble.

Within the dashboard, if a server's icon is gray, it indicates that the server is stopped. After determining and fixing the problem, you can choose Live Rejoin or Rejoin from the server's popup menu to have the server restart and rejoin the cluster. If the problem is hardware related, you can choose Replace to replace the current server with another server from the Enterprise Manager's list of servers.

Once the problems are resolved and the database cluster is back to its full complement of nodes, the database icon will turn green again. See Chapter 8, Maintaining and Repairing the Cluster for more information on handling error conditions and performing maintenance activities on a running VoltDB database using the Enterprise Manager.