Chapter 4. Sizing Memory

An important aspect of system sizing is planning for the memory required to support the application. Because VoltDB is an in-memory database, allocating sufficient memory is vital.

Section 3.3, “Sizing for Capacity” provides some simple equations for estimating the memory requirements of a prospective application. If you are already at the stage where the database schema is well-defined and want more precise measurements of potential memory usage, this chapter provides details about how memory gets allocated.

For VoltDB databases, there are three aspects of memory sizing the must be considered:

Understanding how much memory is required to store the data itself; that is, the contents of the database
Evaluating how that data is distributed across the cluster, based on the proportion of partitioned to replicated tables and the K-safety value
Determining the memory requirements of the server process

The sum of the estimated data requirements per server and the java heap size required by the server process provide the total memory requirement for each server.

4.1. Planning for Database Capacity

To plan effectively for database capacity, you must know in advance both the structure of the data and the projected volume. This means you must have at least a preliminary database schema, including tables, columns, and indexes, as well as the expected number of rows for each table.

It is often useful to write this initial sizing information down, for example in a spreadsheet. Your planning may even allow for growth, assigning values for both the initial volume and projected long-term growth. For example, here is a simplified example of a spreadsheet for a database supporting a flight reservation system:

Name	Type	Initial Volume	Future Volume
Flight	Replicated table	5,000	20,000
- FlightByID	Index	5,000	20,000
- FlightByDepartTime	Index	5,000	20,000
Airport	Replicated table	10,000	10,000
- AirportByCode	Index	10,000	10,000
Reservation	Table	100,000	200,000
- ReservByFlight	Index	100,000	200,000
Customer	Table	200,000	1,000,000
- CustomerByID	Index	200,000	1,000,000
- CustomerByName	Index	200,000	1,000,000

Using the database schema, it is possible to calculate the size of the individual table records and indexes, which when multiplied by the volume projections gives a good estimate the the total memory needed to store the database contents. The following sections explain how to calculate the size column for individual table rows and indexes.

4.1.1. Sizing Database Tables

The size of individual table rows depends on the number and datatype of the columns in the table. For fixed-size datatypes, such as INTEGER and TIMESTAMP, the column is stored inline in the table record using the specified number of bytes. Table 4.1, “Memory Requirements For Tables By Datatype” specifies the length (in bytes) of fixed size datatypes.

For variable length datatypes, such as VARCHAR and VARBINARY, how the data is stored and, consequently, how much space it requires, depends on both the actual length of the data and the maximum possible length. If the maximum length is less than 64 bytes, the data is stored inline in the tuple as fixed-length data consuming the maximum number of bytes plus one for the length. So, for example, a VARCHAR(32 BYTES) column takes up 33 bytes, no matter how long the actual data is.

Note that VARCHAR columns can be declared in characters (the default) or bytes. For storage calculations, variable-length strings declared in characters are considered to consume 4 bytes for every character. In other words, a variable length string declared as VARCHAR(8) in characters consume the same amount of space as a string declared as VARCHAR(32 BYTES).

If the maximum length is 64 bytes or more, the data is stored in pooled memory rather than inline. To do this, there is an 8-byte pointer stored inline in the tuple, a 24-byte string reference object, and the space required to store the data itself in the pool. Within the pool, the data is stored as a 4-byte length, an 8-byte reverse pointer to the string reference object, and the data.

To complicate the calculation somewhat, data stored in pooled memory is not stored as arbitrary lengths. Instead, data is incremented to the smallest appropriate "pool size", where pool sizes are powers of 2 and intermediary values. In other words, pool sizes include 2, 4, 6 (2+4), 8, 12 (8+4), 16, 24 (8+16), 32 and so on up to a maximum of 1 megabyte for data plus 12 bytes for the pointer and length. For example, if the LastName column in the Customer table is defined as VARCHAR(32) (that is, a maximum length of 128 bytes) and the actual content is 95 bytes, the column consumes 160 bytes:

           8   Inline pointer
          24   String reference object
     4         Data length
     8         Reverse pointer
    95         Data
   107   128   Pool total / incremented to next pool size
         160   Total

Note that if a variable length column is defined with a maximum length greater than or equal to 64 bytes, it is not stored inline, even if the actual contents is less than 64 bytes. Variable length columns are stored inline only if the maximum length is less than 64 bytes.

Table 4.1, “Memory Requirements For Tables By Datatype” summarizes the memory requirements for each datatype.

Table 4.1. Memory Requirements For Tables By Datatype

Datatype	Size (in bytes)	Notes
TINYINT	1
SMALLINT	2
INTEGER	4
BIGINT	8
DOUBLE	8
DECIMAL	16
BOOLEAN	1
DATE	4
TIMESTAMP	8
VARCHAR (<64 bytes)	maximum size + 1	Stored inline
VARBINARY (<64 bytes)	maximum size + 1	Stored inline
VARCHAR (>= 64 bytes)	32 + (actual size + 12 + padding)	Pooled resource. Total size includes an 8-byte inline pointer, a 24-byte reference pointer, plus the pooled resource itself.
VARBINARY (>=64 bytes)	32 + (actual size + 12 + padding)	Same as VARCHAR.

For tables with variable length columns less than 64 bytes, memory usage can be sized very accurately using the preceding table. However, for tables with variable length columns greater than 64 bytes, sizing is approximate at best. Besides the variability introduced by pooling, any sizing calculation must be based on an estimate of the average length of data in the variable columns.

For the safest and most conservative estimates, you can use the maximum length when calculating variable length columns. If, on the other hand, there are many variable length columns or you know the data will vary widely, you can use an estimated average or 90th percentile figure, to avoid over-estimating memory consumption.

4.1.2. Sizing Database Indexes

Indexes are sized in a way similar to tables, where the size and number of the index columns determine the size of the index.

VoltDB uses tree indexes. You can calculate the size of the individual index entries by adding up the size for each column in the index plus 40 bytes for overhead (pointers and lengths). The size of the columns are identical to the sizes when sizing tables, as described in Table 4.1, “Memory Requirements For Tables By Datatype”, with the exception of non-inlined binary data. For variable length columns equal to or greater than 64 bytes in length, the index only contains an 8-byte pointer; the data itself is not replicated.

So, for example, the CustomerByName index on the Customer table, which is an index containing the VARCHAR(32) fields LastName and FirstName, has a length of 56 bytes for each entry:

           8   Pointer to LastName
           8   Pointer to FirstName
          40   Overhead
          56   Total

The following equation summarizes how to calculate the size of an index.

 (sum-of-column-sizes + 8 + 32) * rowcount

4.1.3. An Example of Database Sizing

Using the preceding formulas it is possible to size the sample flight database mentioned earlier. For example, it is possible to size the individual rows of the Flight table based on the schema columns and datatypes. The following table demonstrates the sizing of the Flight table.

Column	Datatype	Size in Bytes
FlightID	INTEGER	4
Carrier	VARCHAR(32)	160
DepartTime	TIMESTAMP	8
ArrivalTime	TIMESTAMP	8
Origin	VARCHAR(3 BYTES)	4
Destination	VARCHAR(3 BYTES)	4
Destination	VARCHAR(3 BYTES)	4
Total:		192

The same calculations can be done for the other tables and indexes. When combined with the expected volumes (described in Section 4.1, “Planning for Database Capacity”), you get a estimate for the total memory required for storing the database content of approximately 500 megabytes, as shown in the following table.

Name	Type	Size	Final Volume	Total Size
Flight	Replicated table	184	20,000	3,840,000
- FlightByID	Index	36	20,000	1,040,008
- FlightByDepartTime	Index	48	20,000	960,000
Airport	Replicated Table	484	10,000	4,840,000
- AirportByCode	Index	44	10,000	440,000
Reservation	Table	243	200,000	48,600,000
- ReservByFlight	Index	36	200,000	10,400,008
Customer	Table	324	1,000,000	324,000,000
- CustomerByID	Index	36	1,000,000	52,000,008
- CustomerByName	Index	56	1,000,000	56,000,000
Total:				502,120,024

Planning Guide

Chapter 4. Sizing Memory

4.1. Planning for Database Capacity

4.1.1. Sizing Database Tables

4.1.2. Sizing Database Indexes

4.1.3. An Example of Database Sizing