Monitoring is an important aspect of systems administration. This is true of both databases and the infrastructure they run on. The goals for database monitoring include ensuring the database meets its expected performance target as well as identifying and resolving any unexpected changes or infrastructure events (such as server failure or network outage) that can impact the database. This chapter explains:
How to monitor overall database health and performance using VoltDB
How to automatically pause the database when resource limits are exceeded
How to integrate VoltDB monitoring with other enterprise monitoring infrastructures
VoltDB provides several tools for monitoring overall database activity. The following sections describe the two primary monitoring tools within VoltDB:
The VoltDB Management Center provides a graphical display of key aspects of database performance, including throughput, memory usage, query latency, and partition usage. To use the Management Center, connect to one of the cluster nodes using a web browser, specifying the HTTP port (8080 by default) as shown in the example URL above. The Management Center shows graphs for cluster throughput and latency as well as CPU and memory usage for the current server. You can also use the Management Center to examine the database schema and to issue ad hoc SQL queries.
VoltDB provides callable system procedures that return detailed information about the usage and performance of the database. In particular, the @Statistics system procedure provides a wide variety of information depending on the selector keyword you give it. Some selectors that are particularly useful for monitoring include the following:
MEMORY — Provides statistics about memory usage for each node in the cluster. Information includes the resident set size (RSS) for the server process, the Java heap size, heap usage, available heap memory, and more. This selector provides the type of information displayed by the Process Memory Report, except that it returns information for all nodes of the cluster in a single call.
PROCEDUREPROFILE — Summarizes the performance of individual stored procedures. Information includes the minimum, maximum, and average execution time as well as the number of invocations, failures, and so on. The information is summarized from across the cluster as whole. This selector returns information similar to the latency graph in VoltDB Management Center.
TABLE — Provides information about the size, in number of tuples and amount of memory consumed, for each table in the database. The information is segmented by server and partition, so you can use it to report the total size of the database contents or to evaluate the relative distribution of data across the servers in the cluster.
When using the @Statistics system procedure with the PROCEDUREPROFILE selector for monitoring, it is a good idea to set the second parameter of the call to "1" so each call returns information since the last call. In other words, statistics for the interval since the last call. Otherwise, if the second parameter is "0", the procedure returns information since the database started and the aggregate results for minimum, maximum, and average execution time will have little meaning.
When calling @Statistics with the MEMORY or TABLE selectors, you can set the second parameter to "0" since the results are always a snapshot of the memory usage and table volume at the time of the call. For example, the following Python script uses @Statistics with the MEMORY and PROCEDUREPROFILE selectors to check for memory usage and latency exceeding certain limits. Note that the call to @Statistics uses a second parameter of 1 for the PROCEDUREPROFILE call and a parameter value of 0 for the MEMORY call.
import sys from voltdbclient import * nano = 1000000000.0 memorytrigger = 4 * (1024*1024) # 4gbytes avglatencytrigger = .01 * nano # 10 milliseconds maxlatencytrigger = 2 * nano # 2 seconds server = "localhost" if (len(sys.argv) > 1): server = sys.argv client = FastSerializer(server, 21212) stats = VoltProcedure( client, "@Statistics", [ FastSerializer.VOLTTYPE_STRING, FastSerializer.VOLTTYPE_INTEGER ] ) # Check memory response = stats.call([ "memory", 0 ]) for t in response.tables: for row in t.tuples: print 'RSS for node ' + row + "=" + str(row) if (row > memorytrigger): print "WARNING: memory usage exceeds limit." # Check latency response = stats.call([ "procedureprofile", 1 ]) avglatency = 0 maxlatency = 0 for t in response.tables: for row in t.tuples: if (avglatency < row): avglatency = row if (maxlatency < row): maxlatency = row print 'Average latency= ' + str(avglatency) print 'Maximum latency= ' + str(maxlatency) if (avglatency > avglatencytrigger): print "WARNING: Average latency exceeds limit." if (maxlatency > maxlatencytrigger): print "WARNING: Maximum latency exceeds limit." client.close()
The @Statistics system procedure is the the source for many of the monitoring options discussed in this chapter. Two other system procedures, @SystemCatalog and @SystemInformation, provide general information about the database schema and cluster configuration respectively and can be used in monitoring as well.
The system procedures are useful for monitoring because they let you customize your reporting to whatever level of detail you wish. The other advantage is that you can automate the monitoring through scripts or client applications that call the system procedures. The downside, of course, is that you must design and create such scripts yourself. As an alternative for custom monitoring, you can consider integrating VoltDB with existing third party monitoring applications, as described in Section 6.3, “Integrating VoltDB with Other Monitoring Systems”. You can also set the database to automatically pause if certain system resources run low, as described in the next section.