5.3. Installing Stored Procedures into the Database

Documentation

VoltDB Home » Documentation » Using VoltDB

5.3. Installing Stored Procedures into the Database

When your stored procedure code is ready, you need to get the procedures into the database and ready to use. You first compile the procedure code, create a jar file, and load the resulting jar file into the database. Then you need to declare in the schema which procedures are stored procedures. Finally, depending on which table each stored procedure accesses, you need to partition each procedure to match the table partitioning. These processes are covered in the following sections:

These sections show how to use DDL to declare and partition stored procedures in the database schema. If you find you need to modify the schema, see Section 4.6, “Modifying the Schema”.

5.3.1. Compiling, Packaging, and Loading Stored Procedures

The VoltDB stored procedures are written as Java classes, so you compile them using the Java compiler. Anytime you update your stored procedure code, remember to recompile, package, and reload it into the database using the following steps:

$ javac -classpath "./:/opt/voltdb/voltdb/*" \         1
        -d ./obj  \
         *.java
$ jar  cvf  myproc.jar –C obj .                        2
$ sqlcmd                                               3
1> load classes myproc.jar;
2> show classes;

The steps are described as follows:

1

Use the javac command to compile the procedure Java code.

You include libraries by using the -classpath argument on the command line or by defining the environment variable CLASSPATH. You must include the VoltDB libraries in the classpath so Java can resolve references to the VoltDB classes and methods. This example assumes that the VoltDB software has been installed in the folder /opt/voltdb. If you installed VoltDB in a different directory, you need to include your installation path. Also, if your client application depends on other libraries, they need to be included in the classpath as well.

Use the -d flag to specify an output directory in which to create the resulting class files.

2

Use the jar command to package your Java classes into a Java archive, or JAR file.

The JAR file must have the same Java package structure as the classes in the JAR file. For example, if a class has a structure such as myapp.procedures.ProcedureFoo, then the JAR file has to have myapp/procedures/ProcedureFoo.class as the class structure for this file.

The JAR file must include any inner classes or other dependent classes used by the stored procedures.

3

Use the sqlcmd load classes directive to load the stored procedure classes into the database.

You can use the show classes command to display information about the classes installed in the cluster.

Before a stored procedure can be called by a client application, you need to declare it in the schema, which is described next.

5.3.2. Declaring Stored Procedures in the Schema

To make your stored procedures accessible in the database, you must declare them in the schema using the CREATE PROCEDURE statement. Be sure to identify all of your stored procedures or they will not be available to the client applications at runtime. Also, before you declare a procedure, ensure the tables and columns the procedure accesses are in the schema.

The following DDL statements declare five stored procedures, identifying them by their class name:

$ sqlcmd
1> CREATE PROCEDURE FROM CLASS fadvisor.procedures.LookupFlight;
2> CREATE PROCEDURE FROM CLASS fadvisor.procedures.HowManySeats;
3> CREATE PROCEDURE FROM CLASS fadvisor.procedures.MakeReservation;
4> CREATE PROCEDURE FROM CLASS fadvisor.procedures.CancelReservation;
5> CREATE PROCEDURE FROM CLASS fadvisor.procedures.RemoveFlight;

For some situations, you can create stored procedures directly in the schema using SQL instead of loading Java code. See how to use the CREATE PROCEDURE AS statement in Section 7.2, “Shortcut for Defining Simple Stored Procedures”.

For more about modifying a schema with DDL, see Section 4.6, “Modifying the Schema”.

5.3.3. Partitioning Stored Procedures in the Schema

We want the most frequently used stored procedures to be single-partitioned. This means that the procedure executes in the one partition that also has the data it needs. Single-partitioned stored procedures do not have the overhead of processing across multiple partitions and servers, wasting time searching through the data of the entire table. To ensure single-partitioned efficiency, the parameter the stored procedure uses to identify its required data must be the same as the column on which the table rows are partitioned.

Remember that in our sample application the RESERVATION table is partitioned on FLIGHTID. Let's say you create a stored procedure, MakeReservation(), with two arguments, flight_id and customer_id. The following figure shows how the stored procedure will automatically execute in the partition that has the requested row.

Figure 5.3. Stored Procedures Execute in the Appropriate Partition Based on the Partitioned Parameter Value

Stored Procedures Execute in the Appropriate Partition Based on the Partitioned Parameter Value

If you do not declare a procedure as single-partitioned, it is assumed to be multi-partitioned by default. The advantage of multi-partitioned stored procedures is that they have full access to all of the data in the database, across all partitions. However, the real focus of VoltDB, and the way to achieve maximum throughput for your application, is through the use of single-partitioned stored procedures.

5.3.3.1. How to Declare Single-Partition Procedures

Before declaring a single-partitioned procedure, ensure the following prerequisites:

  1. The table that the stored procedure accesses has been partitioned in the schema. See Section 4.4, “Partitioning Database Tables”.

  2. If the procedure is implemented with Java code, it is loaded into the database. See Section 5.3.1, “Compiling, Packaging, and Loading Stored Procedures”.

  3. The stored procedure has been declared in the schema with either of the CREATE PROCEDURE statements. See Section 5.3.2, “Declaring Stored Procedures in the Schema”.

When you declare a stored procedure as single-partitioned, you must specify both the associated table and the column on which it is partitioned using the PARTITION PROCEDURE statement in the schema. The following example uses the RESERVATION table and the FLIGHTID column as the partitioning column. For example:

PARTITION PROCEDURE MakeReservation ON TABLE Reservation COLUMN FlightID;

The PARTITION PROCEDURE statement assumes that the partitioning column value is also the first parameter to the stored procedure. Suppose you wish to partition a stored procedure on the third parameter such as the procedure GetCustomerDetails(), where the third parameter is a customer_id. You must specify the partitioning parameter using the PARAMETER clause and an index for the parameter position. The index is zero-based so the third parameter would be "2" and the PARTITION PROCEDURE statement would be as follows:

PARTITION PROCEDURE GetCustomerDetails
    ON TABLE Customer COLUMN CustomerID
    PARAMETER 2;

5.3.3.2. Queries in Single-Partitioned Stored Procedures

Single-partitioned stored procedures are special because they operate independently of other partitions, which is why they are so fast. At the same time, single-partitioned stored procedures operate on only a subset of the entire data, that is, only the data within the specified partition.

Caution

It is the application developer's responsibility to ensure that the queries in a single-partitioned stored procedure are truly single-partitioned. VoltDB does not warn you about SELECT or DELETE statements that might return incomplete results. For example, if your single-partitioned procedure attempts to operate on a range of values for the partitioning column, the range is incomplete and includes only a subset of the table data that is in the current partition.

VoltDB does generate a runtime error if you attempt to INSERT a row that does not belong in the current partition.

After you partition a procedure, your stored procedure can operate on only those records in the partitioned table that are identified by the partitioning column, in this example the RESERVATION table identified by a FLIGHTID. Your stored procedure can operate on records in replicated tables because all partitions have the same copy of a replicated table. However, for other partitioned tables, the stored procedure can only operate on those records if both tables are partitioned on the same attribute. In this example that would be FLIGHTID.

In other words, the following rules apply:

  • Any SELECT, UPDATE, or DELETE queries must use the constraint, WHERE identifier=?

    The question mark is replaced at runtime by the input value that identifies the row of data in the table. In our example, queries on the RESERVATION table must use the constraint, WHERE FLIGHTID=?

  • SELECT statements can join the partitioned table to replicated tables, as long as the preceding WHERE constraint is also applied.

  • SELECT statements can join the partitioned table to other partitioned tables as long as the following are true:

    • The two tables are partitioned on the same attribute or column (in our example, FLIGHTID).

    • The tables are joined on the shared partitioning column.

    • The following WHERE constraint is also used: WHERE partitioned-table. identifier=? In this example, WHERE RESERVATION.FLIGHTID=?

For example, the RESERVATION table can be joined with the FLIGHT table (which is replicated). However, the RESERVATION table cannot be joined with the CUSTOMER table in a single-partitioned stored procedure because the two tables use different partitioning columns. (CUSTOMER is partitioned on the CUSTOMERID column.)

The following are examples of invalid SQL queries for a single-partitioned stored procedure partitioned on FLIGHTID:

  • INVALID: SELECT * FROM reservation WHERE reservationid=?

    The RESERVATION table is being constrained by a column (RESERVATIONID) which is not the partitioning column.

  • INVALID: SELECT c.lastname FROM reservation AS r, customer AS c WHERE r.flightid=? AND c.customerid = r.customerid

    The correct partitioning column is being used in the WHERE clause, but the tables are being joined on a different column. As a result, not all CUSTOMER rows are available to the stored procedure since the CUSTOMER table is partitioned on a different column than RESERVATION.