Chapter 6. Monitoring the Cluster

Documentation

VoltDB Home » Documentation » Enterprise Manager Guide

Chapter 6. Monitoring the Cluster

The goal of the VoltDB Enterprise Manager is not only to simplify basic administrative functions such as stopping and starting the database. The dashboard also helps you understand the performance characteristics of your application so you can identify issues and make informed decisions about configuration changes and tuning. Once a database is running, the Enterprise Manager dashboard helps you monitor:

  • Database activity and performance

  • Cluster health

6.1. Monitoring Database Activity

The right side of the dashboard provides real-time statistics on the currently selected database. There are four graphs showing you key aspects of database performance.

Latency

The latency graph shows you the latency for transactions being processed by the database. The graph shows latency for the 99th percentile of the transactions. (That is, 99% of the transactions complete within the time indicated.)

Latency measures the length of time (in milliseconds) between when the stored procedure request is received by the server and when the response is queued for return to the client. (Note that latency measurements do not include network latency between the client and the VoltDB server). Latency tends to go up when the servers receive more requests than they can process in a given amount of time.

Transactions

The transactions graph shows the number of transactions processed per second (TPS), a common measurement of database throughput. The goal for most OLTP applications is to maximize the number of TPS. There are a number of conditions that might make the transactions graph go down, including increased latency from too many multi-partition stored procedures, topping out system resources such as CPU and memory, or simply reduced load from the clients. Therefore, the transactions graph is always most informative when viewed in conjunction with the other graphs.

CPU

The CPU graph shows the percentage of the system's computational power being utilized by VoltDB on each server. As with any application, it is important to keep CPU usage below the total available CPU to ensure peak performance. (Each graph is cumulative for all cores on a processor. So, for example, if there are four cores, the total CPU available for that server is 400%.) The CPU graph provides an overall view of the amount of raw processing power that is being used by the VoltDB database process.

Memory

The memory graph shows the resident set size (RSS) of the VoltDB process on each server. Since VoltDB is an in-memory database, memory capacity is critical. The memory graph helps you understand current usage and trending of memory consumption.

Depending on your application, you may wish to see more or less data at a time. For example, during development you may be interested in short runs for testing, whereas for long-running applications, you will want to see extended graphs. You can change the horizontal scale of the graphs using the pulldown View menu on the top right. Set the view to "minutes" to see up to 30 minutes of data in the graph or to "day" to see a maximum of 24 hours displayed.

The graphs give you an overview of the database performance and resource utilization, which can be very helpful in detecting issues or performance regression in an application. The latency and transactions graphs show average statistics across the entire database cluster. The CPU and memory graphs show statistics per server, with multiple server statistics "stacked" in the graph. The list of servers on the left side of the dashboard include a color swatch to the right of each server name that acts as a legend to the color coding of the CPU and memory graphs.

However, the graphs alone are not necessarily sufficient for identifying the root cause of any issues you detect. To help you further diagnose problems or discrepancies, the dashboard provides detailed tabular statistics below the graphs. (The data charts are collapsed by default. Click on the label Data or the triangle next to it to expand the data tables.)

There are two types of tables provided: volume and invocation. Click on the headings in the data section to switch between the two types of data table.

  • The volume table shows you the number of rows in each table, the type of table (partitioned, replicated, or view), and the maximum and minimum number of rows per partition. Large discrepancies between the minimum and maximum volume usually indicates a problem with the partitioning, either due to not enough partitions or a partitioning key value that is not well distributed. This could result in serious differences in memory usage between servers.

  • The invocation table shows the total number of invocations for each stored procedure. In other words, the table shows you how often each stored procedure is called as well as the maximum, minimum, and average execution time in each case. This table is useful in determining if a particular stored procedure is taking longer than expected to execute or creating a bottleneck for the application. The invocation table is also useful in validating the expected distribution of transactions during normal operations.

>