Monitoring is an important aspect of systems administration. This is true of both databases and the infrastructure they run on. The goals for database monitoring include ensuring the database meets its expected performance target as well as identifying and resolving any unexpected changes or infrastructure events (such as server failure or network outage) that can impact the database. This chapter explains:
How to monitor overall database health and performance using VoltDB
How to integrate VoltDB monitoring with other enterprise monitoring infrastructures
VoltDB provides several tools for monitoring overall database activity. The following sections describe the three main monitoring tools within VoltDB:
A separate book, the VoltDB Enterprise Manager Guide, describes the VoltDB Enterprise Manager, a complete database management dashboard that includes monitoring of its own.
Web Studio provides a graphical display of key aspects of database performance, including throughput, memory usage, query latency, and partition usage. To use Web Studio, connect to one of the cluster nodes using a web browser, specifying the httpd port (8080 by default) and the page /studio/, as shown in the example URL above. Once the studio page loads, click on the icon of a graph in the menu bar to bring up the monitor display. The monitor shows two graphs, side by side. You can use the pull down menus above each graph to switch between the four available metrics.
The web studio graphs update automatically. Note that Web Studio also lets you issue ad hoc queries to the database and review the database schema.
The Process Memory Report is another graphical display focused specifically on memory usage. The report separates out Java heap from non-Java memory usage, so you can better understand what aspects of memory utilization are being exercised. To view the process memory report for a specific VoltDB server, connect to the server using a web browser, specifying the httpd port (8080 by default) and the page /memory/, as shown in the example URL above. Use the browser's refresh button to update the graph.
Note that the process memory report is server-specific, unlike the web studio monitor which shows statistics for the cluster as a whole. If you think you have memory issues with specific servers — particularly if the servers are not all the same size or your data is not evenly distributed across partitions — it is a good idea to examine the process memory report for all of the servers to determine where the issue lies.
VoltDB provides callable system procedures that return detailed information about the usage and performance of the database. In particular, the @Statistics system procedure provides a wide variety of information depending on the selector keyword you give it. Some selectors that are particularly useful for monitoring include the following:
MEMORY — Provides statistics about memory usage for each node in the cluster. Information includes the resident set size (RSS) for the server process, the Java heap size, heap usage, available heap memory, and more. This selector provides the type of information displayed by the Process Memory Report, except that it returns information for all nodes of the cluster in a single call.
PROCEDUREPROFILE — Summarizes the performance of individual stored procedures. Information includes the minimum, maximum, and average execution time as well as the number of invocations, failures, and so on. The information is summarized from across the cluster as whole. This selector returns information similar to the latency graph in Web Studio
TABLE — Provides information about the size, in number of tuples and amount of memory consumed, for each table in the database. The information is segmented by server and partition, so you can use it to report the total size of the database contents or to evaluate the relative distribution of data across the servers in the cluster.
When using the @Statistics system procedure with the PROCEDUREPROFILE selector for monitoring, it is a good idea to set the second parameter of the call to "1" so each call returns information since the last call. In other words, statistics for the interval since the last call. Otherwise, if the second parameter is "0", the procedure returns information since the database started and the aggregate results for minimum, maximum, and average execution time will have little meaning.
When calling @Statistics with the MEMORY or TABLE selectors, you can set the second parameter to "0" since the results are always a snapshot of the memory usage and table volume at the time of the call. For example, the following Python script uses @Statistics with the MEMORY and PROCEDUREPROFILE selectors to check for memory usage and latency exceeding certain limits. Note that the call to @Statistics uses a second parameter of 1 for the PROCEDUREPROFILE call and a parameter value of 0 for the MEMORY call.
import sys from voltdbclient import * nano = 1000000000.0 memorytrigger = 4 * (1024*1024) # 4gbytes avglatencytrigger = .01 * nano # 10 milliseconds maxlatencytrigger = 2 * nano # 2 seconds server = "localhost" if (len(sys.argv) > 1): server = sys.argv client = FastSerializer(server, 21212) stats = VoltProcedure( client, "@Statistics", [ FastSerializer.VOLTTYPE_STRING, FastSerializer.VOLTTYPE_INTEGER ] ) # Check memory response = stats.call([ "memory", 0 ]) for t in response.tables: for row in t.tuples: print 'RSS for node ' + row + "=" + str(row) if (row > memorytrigger): print "WARNING: memory usage exceeds limit." # Check latency response = stats.call([ "procedureprofile", 1 ]) avglatency = 0 maxlatency = 0 for t in response.tables: for row in t.tuples: if (avglatency < row): avglatency = row if (maxlatency < row): maxlatency = row print 'Average latency= ' + str(avglatency) print 'Maximum latency= ' + str(maxlatency) if (avglatency > avglatencytrigger): print "WARNING: Average latency exceeds limit." if (maxlatency > maxlatencytrigger): print "WARNING: Maximum latency exceeds limit." client.close()
The @Statistics system procedure is the the source for many of the monitoring options discussed in this chapter. Two other system procedures, @SystemCatalog and @SystemInformation, provide general information about the database schema and cluster configuration respectively and can be used in monitoring as well.
The system procedures are useful for monitoring because they let you customize your reporting to whatever level of detail you wish. The other advantage is that you can automate the monitoring through scripts or client applications that call the system procedures. The downside, of course, is that you must design and create such scripts yourself. As an alternative for custom monitoring, you can consider integrating VoltDB with existing third party monitoring applications, as described in next section.