5.2. Setting the Database to Read-Only Mode When System Resources Run Low

Documentation

VoltDB Home » Documentation » Administrator's Guide

5.2. Setting the Database to Read-Only Mode When System Resources Run Low

VoltDB, like all software, uses system resources to perform its tasks. First and foremost, as an in-memory database, VoltDB relies on having sufficient memory available for storing the data and processing queries. However, it also makes use of disk resources for snapshots and caching data for other features, such as export and database replication.

If system resources run low, one or more nodes may fail impacting availability, or worse, causing a service interruption. The best solution for this situation is to plan ahead and provision sufficient resources for your needs. The goal of the VoltDB Planning Guide is to help you do this.

However, even with the best planning, unexpected conditions can result in resource shortages or overuse. In these situations, you want the database to protect itself against all-out failure.

You can do this by setting resource limits in the configuration. System resource limits are set within the deployment.systemsettings.resourcemonitor property. For example:

deployment:
  systemsettings:
    resourcemonitor:
      frequency: 30
      memorylimit:
        size: "70%"
        alert: "60%"
      disklimit:
        feature:
        - name: snapshots
          size: "75%"
          alert: "60%"
        - name: droverflow
          size: "60%"

The deployment file lets you set limits on two types of system resources:

For each resource type you can set the maximum size and, optionally, the level at which an alert is sent if SNMP is enabled. In all cases, the allowable amount of the resource to be used can be specified as either a value representing a number of gigabytes or a percentage of the total available. If the limit set by the alert property is exceeded and SNMP is enabled, an SNMP alert is sent. If the limit set by the size property is exceeded, the database will be "paused", putting it into read-only mode to avoid using any further resources or possibly failing when the resource becomes exhausted. When the database pauses, an error message is written to the log file (and the console) reporting the event. This allows you as the system administrator to correct the situation by reducing memory usage or deleting unnecessary files. Once sufficient resources are freed up, you can return the database to normal operation using the voltadmin resume command.

The resource limits are checked every 60 seconds by default. However, you can adjust how frequently they are checked — to accommodate the relative stability or volatility of your resource usage — using the deployment.systemsettings.resourcemonitor.frequency property. In the preceding example, the frequency has been reduced to 30 seconds.

Of course, the ideal is to catch excessive resource use before the database is forced into read-only mode. Use of SNMP and system monitors such as Nagios and New Relic to generate alerts at limits lower than the VoltDB resource monitor are strongly recommended. And you can integrate other VoltDB monitoring with these monitoring utilities as described in Section 5.3, “Integrating VoltDB with Prometheus”. But the resource monitor size limit is provided as a last resort to ensure the database does not completely exhaust resources and crash before the issue can be addressed.

The following sections describe how to set limits for the individual resource types.

5.2.1. Monitoring Memory Usage

You specify a memory limit in the configuration using the memorylimit property and specifying the maximum allowable resident set size (RSS) for the VoltDB process in the size subproperty. You can express the limit as a fixed number of gigabytes or as a percentage of total available memory. Use a percent sign to specify a percentage.

In addition to pausing the database, you can specify that it runs a full compaction of table data to recover whatever unused space is available due to fragmentation. This is the equivalent of running the voltadmin defrag --full command manually. By setting the compact property to true, when the memory limit is exceeded, the database will pause, defragment all table data on the affected node, and if enough space is recovered to bring memory usage down under the limit, the database will automatically resume normal operation. See the chapter on "Understanding Memory Usage" in the Volt Performance and Customization guide for more information about memory compaction.

For example, the following setting will cause the VoltDB database to go into read-only mode and perform a full compaction if the RSS size exceeds 10 gigabytes on any of the cluster nodes.

deployment:
  systemsettings:
    resourcemonitor:
      memorylimit:
        size: 10
        compact: true"

Whereas the following example sets the limit at 70% of total available memory but does not automatically compact memory used for table data.

deployment:
  systemsettings:
    resourcemonitor:
      memorylimit:
        size: "70%"

You can also set a trigger value for SNMP alerts — assuming SNMP is enabled — using the alert property. For instance, the following example sets the SNMP trigger value to 60%.

deployment:
  systemsettings:
    resourcemonitor:
      memorylimit:
        size: "70%"
        alert: "60%"

If you do not specify a limit in the configuration

file, VoltDB automatically sets a maximum size limit of 80% and an SNMP alert level of 70% by default.

5.2.2. Monitoring Disk Usage

You specify disk usage limits in the configuration using the disklimit property. Within disklimit you use the feature subproperty to identify a list of limits for a device based on the Volt feature that utilizes it. For example, to set a limit on the amount of space used on the device where automatic snapshots are stored, you identify the feature as "snapshots" and specify the limit as a number of gigabytes or as a percentage of total space on the disk. The following configuration sets the disk limit for snapshots at 200 gigabytes and the limit for command logs at 70% of the total available space:

deployment:
  systemsettings:
    resourcemonitor:
      disklimit:
        feature:
        - name: snapshots
          size: 200
        - name: commandlog
          size: "70%"

You can also set a trigger value for SNMP alerts — assuming SNMP is enabled — using the alert subproperty. For instance, the following example sets the SNMP trigger value to 150 gigabytes for the snapshots disk and 60% for the commandlog disk.

deployment:
  systemsettings:
    resourcemonitor:
      disklimit:
        feature:
        - name: snapshots
          size: 200
          alert: 150
        - name: commandlog
          size: "70%"
          alert: "60%"

Note that you specify the device based on the feature that uses it. However, the limits applies to all data on that device, not just the space used by that feature. If you specify limits for two features that use the same device, the lower of the two limits will be applied. So, in the previous example, if snapshots and command logs both use a device with 250 gigabytes of total space, the database will be set to read-only mode if the total amount of used space exceeds the command logs limit of 70%, or 175 gigabytes.

It is also important to note that there are no default resource limits or alerts for disks. If you do not explicitly specify a disk limit, there is no protection against running out of disk space. Similarly, unless you explicitly set an SNMP alert level, no alerts will be sent for the associated device.

You can identify disk limits and alerts for any of the following Volt features, using the specified keywords:

  • Automated snapshots (snapshots)

  • Command logs (commandlog)

  • Command log snapshots (commandlogsnapshot)

  • Database replication overflow (droverflow)

  • Export overflow (exportoverflow)