Chapter 7. What to Do When Problems Arise

Documentation

VoltDB Home » Documentation » Administrator's Guide

Chapter 7. What to Do When Problems Arise

As with any high performance application, events related to the database process, the operating system, and the network environment can impact how well or poorly VoltDB performs. When faced with performance issues, or outright failures, the most important task is identifying and resolving the root cause. VoltDB and the server produce a number of log files and other artifacts that can help you in the diagnosis. This chapter explains:

  • Where to look for log files and other information about the VoltDB server process

  • What to do when recovery fails

  • How to collect the log files and other system information when reporting a problem to VoltDB

7.1. Where to Look for Answers

The first place to look when an unrecognized problem occurs with your VoltDB database is the console where the database process was started. VoltDB echoes key messages and errors to the console. For example, if a server becomes unreachable, the other servers in the cluster will report an error indicating which node has failed. Assuming the cluster is K-safe, the remaining nodes will then re-establish a quorum and continue, logging this event to the console as well.

However, not all messages are echoed on the console.[] A more complete record of errors, warnings, and informational messages is written to a log file, log/volt.log, inside the voltdbroot directory. So, for example, if you start the database using the command voltdb start --dir=~/db, the log file is ~/db/voltdbroot/log/volt.log.) The volt.log file can be extremely helpful for identifying unexpected but non-fatal events that occurred earlier and may identify the cause of the current issue.

If VoltDB encounters a fatal error and exits, shutting down the database process, it also attempts to write out a crash file in the current working directory. The crash file name has the prefix "voltdb_crash" followed by a timestamp identifying when the file is created. Again, this file can be useful in diagnosing exactly what caused the crash, since it includes the last error message, a brief profile of the server and a dump of the Java threads running in the server process before it crashed.

To summarize, when looking for information to help analyze system problems, three places to look are:

  1. The console where the server process was started.

  2. The log file in log/volt.log

  3. The crash file named voltdb_crash{timestamp}.txt in the server process's working directory



[] Note that you can change which messages are echoed to the console and which are logged by modifying the Log4j configuration file. See the chapter on logging in the Using VoltDB manual for details.