Part 4: Schema Updates and Durability

Thus far in the tutorial we have re-initialized and restarted the database from scratch and reloaded the schema and data manually each time we changed the schema. This is sometimes the easiest way to make changes when you are first developing your application and making frequent changes. However, as your application — and the data it uses — becomes more complex, you want to maintain your database state across sessions.

You may have noticed that in the previous section of the tutorial we defined the States table but did not add it to the running database yet. That is because we want to demonstrate ways of modifying the database without having to start from scratch each time.

Preserving the Database

First let's talk about durability. VoltDB is an in-memory database. Each time you re-initialize and start the database with the init and start commands, it starts a new, empty database. Obviously, in real business situations you want the data to persist. VoltDB has several features that preserve the database contents across sessions.

The easiest way to preserve the database is to use command logging, which is enabled by default for the VoltDB Enterprise Edition. Command logging logs all of the database activity, including schema and data changes, to disk. If the database ever stops, you can recover the command log simply by restarting the database with the voltdb start command.

If you are using the Enterprise Edition, try it now. Stop the database process with voltadmin shutdown, then use voltdb start (without voltdb init) to restore the database to its previous state:

$ voltadmin shutdown
$ voltdb start

Command logging makes saving and restoring your database easy and automatic. Alternately, you can save and restore your database using snapshots. Snapshots are a complete disk-based representation of a VoltDB database, including everything needed to reproduce the database after a shutdown. You can create a snapshot of a running VoltDB database at anytime using the voltadmin save command. For example:

 voltadmin save

By default, the snapshot is saved to a subfolder of the database root directory. Alternately, you can specify the location and name of the snapshot files as arguments to the voltadmin save command. But there is an advantage to saving the snapshot to the default location. Because if there are any snapshots in the root directory, the voltdb start command automatically restores the most recent snapshot when the database restarts.

To make it even easier, VoltDB let's you create a final snapshot when you shutdown the database simply by adding the --save argument to the shutdown command. This is the recommended way to shutdown the database when not using command logging. Let's try it:

$ voltadmin shutdown --save
$ voltdb start

We can verify that the database was restored by doing some simple SQL queries in our other terminal session:

$ sqlcmd
SQL Command :: localhost:21212
1> select count(*) from towns;
C1     
-------
 193297


(1 row(s) affected)
2> select count(*) from people;
C1    
------
 81691


(1 row(s) affected)

Adding and Removing Tables

Now that we know how to save and restore the database, we can add the States table we defined in Part Three. Adding and dropping tables can be done "on the fly", while the database is running, using the sqlcmd utility. To add tables, you simply use the CREATE TABLE statement, like we did before. When modifying existing tables you can use the ALTER TABLE statement. Alternately, if you are not concerned with preserving existing data in the table, you can do a DROP TABLE followed by CREATE TABLE to replace the table definition.

In the case of the States table we are adding a new table so we can simply type (or copy & paste) the CREATE TABLE statement into the sqlcmd prompt. We can also use the show tables directive to verify that our new table has been added.

$ sqlcmd
SQL Command :: localhost:21212
1> CREATE TABLE states (
2>    abbreviation VARCHAR(20),
3>    state_num TINYINT,
4>    name VARCHAR(20),
5>    PRIMARY KEY (state_num)
6> );
Command successful
7> show tables;

--- User Tables --------------------------------------------
PEOPLE
STATES
TOWNS

--- User Views --------------------------------------------

--- User Export Streams --------------------------------------------

8> exit

Next we can load the state information from the data file. Finally, we can use the voltadmin save command to save a complete copy of the database.

$ csvloader --skip 1 -f data/states.csv states
$ voltadmin save

Updating Existing Tables

Now that we have a definitive lookup table for information about the states, we no longer need the redundant columns in the Towns and People tables. We want to keep the FIPS column, State_num, but can remove the State column from each table. Our updated schema for the two tables looks like this:

CREATE TABLE towns (
   town VARCHAR(64),
--   state VARCHAR(2),
   state_num TINYINT NOT NULL,
   county VARCHAR(64),
   county_num SMALLINT NOT NULL,
   elevation INTEGER
);
CREATE TABLE people (
  state_num TINYINT NOT NULL,
  county_num SMALLINT NOT NULL,
--  state VARCHAR(20),
  town VARCHAR(64),
  population INTEGER
);

It is good to have the complete schema on file in case we want to restart from scratch. (Or if we want to recreate the database on another server.) However, to modify an existing schema under development it is often easier just to use ALTER TABLE statements. So to modify the running database to match our new schema, we can use ALTER TABLE with the DROP COLUMN clause from the sqlcmd prompt:

$ sqlcmd
SQL Command :: localhost:21212
1> ALTER TABLE towns DROP COLUMN state;
Command successful
2> ALTER TABLE people DROP COLUMN state;
Command successful

Many schema changes, including adding, removing, and modifying tables, columns, and indexes can be done on the fly. However, there are a few limitations. For example, you cannot add new unique constraints to a table that already has data in it. In this case, you can DROP and CREATE the table with the new constraints if you do not need to save the data. Or, if you need to preserve the data and you know that it will not violate the new constraint, you can save the data to a snapshot using an explicit directory, re-initialize the database root directory, restart, and reload the new schema, then restore the data from the snapshot into the updated schema using the voltadmin restore command.

This ends Part Four of the tutorial.

Welcome to VoltDB

Part 4: Schema Updates and Durability

Preserving the Database

Adding and Removing Tables

Updating Existing Tables