SELECT

Documentation

VoltDB Home » Documentation » Using VoltDB

SELECT

SELECT — Fetches the specified rows and columns from the database.

Synopsis

[common-table-expression] Select-statement [{set-operator} Select-statement ] ...

Select-statement:
SELECT [ TOP integer-value ]
{ * | [ ALL | DISTINCT ] { column-name | selection-expression } [AS alias] [,...] }
FROM { table-reference } [ join-clause ]...
[WHERE [NOT] boolean-expression [ {AND | OR} [NOT] boolean-expression]...]
[clause...]

table-reference:
{ table-name [AS alias] | view-name [AS alias] | sub-query AS alias }

sub-query:
(Select-statement)

join-clause:
, table-reference
[INNER | {LEFT | RIGHT | FULL } [OUTER]] JOIN [{table-reference}] [join-condition]

join-condition:
ON conditional-expression
USING (column-reference [,...])

clause:
ORDER BY { column-name | alias } [ ASC | DESC ] [,...]
GROUP BY { column-name | alias } [,...]
HAVING boolean-expression
LIMIT integer-value [OFFSET row-count]

set-operator:
UNION [ALL]
INTERSECT [ALL]
EXCEPT

common-table-expression:
WITH common-table-name [(column-name [,...])] AS ( Select-statement )
WITH RECURSIVE common-table-name [(column-name [,...])] AS (
      Select-statement UNION ALL Select-statement
)

Description

The SELECT statement retrieves the specified rows and columns from the database, filtered and sorted by any clauses that are included in the statement. In its simplest form, the SELECT statement retrieves the values associated with individual columns. However, the selection expression can be a function such as COUNT and SUM.

The following features and limitations are important to note when using the SELECT statement with VoltDB:

  • See Appendix C, SQL Functions for a full list of the SQL functions that VoltDB supports.

  • VoltDB supports the following operators in expressions: addition (+), subtraction (-), multiplication (*), division (*) and string concatenation (||).

  • TOP n is a synonym for LIMIT n.

  • The WHERE expression supports the boolean operators: equals (=), not equals (!= or <>), greater than (>), less than (<), greater than or equal to (>=), less than or equal to (<=), LIKE, STARTS WITH, IS NULL, IS DISTINCT, IS NOT DISTINCT, AND, OR, and NOT. Note, however, although OR is supported syntactically, VoltDB does not optimize these operations and use of OR may impact the performance of your queries.

  • The boolean expression LIKE provides text pattern matching in a VARCHAR column. The syntax of the LIKE expression is {string-expression} LIKE '{pattern}' where the pattern can contain text and wildcards, including the underscore (_) for matching a single character and the percent sign (%) for matching zero or more characters. The string comparison is case sensitive.

    Where an index exists on the column being scanned and the pattern starts with a text prefix (rather than starting with a wildcard), VoltDB will attempt to use the index to maximize performance, For example, a query limiting the results to rows from the EMPLOYEE table where the primary index¸ the JOB_CODE column, begins with the characters "Temp" looks like this:

    SELECT * from EMPLOYEE where JOB_CODE like 'Temp%';
  • The STARTS WITH clause is useful in stored procedures because it uses indexed scans where the LIKE clause cannot. The expression STARTS WITH '{string-expression}' is syntactically identical to LIKE '{string-expression}%' in that it matches any string value that starts with string-expression. The difference is that in a stored procedure, use of the STARTS WITH clause with a placeholder (that is, "START WITH ?") utilizes available indexes, whereas LIKE ? requires a sequential scan, since the compiler cannot tell if the replacement text ends in a percent sign or not and must plan for any possible string value. For example, if KEYWORD is the primary key for the ENTRY table, then VoltDB can use the primary key index to optimize the following stored procedure:

    CREATE PROCEDURE SimpleSearch AS
       SELECT keyword FROM entry WHERE keyword STARTS WITH ?;
  • The boolean expression IN determines if a given value is found within a list of alternatives. For example, in the following code fragment the IN expression looks to see if a record is part of Hispaniola by evaluating whether the column COUNTRY is equal to either "Dominican Republic" or "Haiti":

    WHERE Country IN ('Dominican Republic', 'Haiti')

    Note that the list of alternatives must be enclosed in parentheses. The result of an IN expression is equivalent to a sequence of equality conditions separated by OR. So the preceding code fragment produces the same boolean result as:

    WHERE Country='Dominican Republic' OR Country='Haiti'

    The advantages are that the IN syntax provides more compact and readable code and can provide improved performance by using an index on the initial expression where available.

  • The boolean expression BETWEEN determines if a value falls within a given range. The evaluation is inclusive of the end points. In this way BETWEEN is a convenient alias for two boolean expressions determining if a value is greater than or equal to (>=) the starting value and less than or equal to (<=) the end value. For example, the following two WHERE clauses are equivalent:

    WHERE salary BETWEEN ? AND ?
    WHERE salary >= ? AND salary <= ?
  • The boolean expressions IS DISTINCT FROM and IS NOT DISTINCT FROM are similar to the equals ("=") and not equals ("<>") operators respectively, except when evaluating null operands. If either or both operands are null, the equals and not equals operators return a boolean null value, or false. IS DISTINCT FROM and IS NOT DISTINCT FROM consider null a valid operand. So if only one operand is null IS DISTINCT FROM returns true and IS NOT DISTINCT FROM returns false. If both operands are null IS DISTINCT FROM returns false and IS NOT DISTINCT FROM returns true.

  • When using placeholders in SQL statements involving the IN list expression, you can either do replacement of individual values within the list or replace the list as a whole. For example, consider the following statements:

    SELECT * from EMPLOYEE where STATUS IN (?, ?,?);
    SELECT * from EMPLOYEE where STATUS IN ?;

    In the first statement, there are three parameters that replace individual values in the IN list, allowing you to specify exactly three selection values. In the second statement the placeholder replaces the entire list, including the parentheses. In this case the parameter to the procedure call must be an array and allows you to change not only the values of the alternatives but the number of criteria considered.

    The following Java code fragment demonstrates how these two queries can be used in a stored procedure, resulting in equivalent SQL statements being executed:

    String arg1 = "Salary";
    String arg2 = "Hourly";
    String arg3 = "Parttime";
    voltQueueSQL( query1, arg1, arg2, arg3);
    
    String listargs[] = new String[3];
    listargs[0] = arg1;
    listargs[1] = arg2;
    listargs[2] = arg3;
    voltQueueSQL( query2, (Object) listargs);

    Note that when passing arrays as parameters in Java, it is a good practice to explicitly cast them as an object to avoid the array being implicitly expanded into individual call parameters.

  • VoltDB supports the use of CASE-WHEN-THEN-ELSE-END for conditional operations. For example, the following SELECT expression uses a CASE statement to return different values based on the contents of the price column:

    SELECT Prod_name, 
        CASE WHEN price > 100.00 
              THEN 'Expensive'
              ELSE 'Cheap'
        END 
    FROM products ORDER BY Prod_name;                      

    For more complex conditional operations with multiple alternatives, use of the DECODE() function is recommended.

  • VoltDB supports both inner and outer joins.

  • The SELECT statement supports subqueries as a table reference in the FROM clause. Subqueries must be enclosed in parentheses and must be assigned a table alias.

  • You can only join two or more partitioned tables if those tables are partitioned on the same value and joined on equality of the partitioning column. Joining two partitioned tables on non-partitioned columns or on a range of values is not supported. However, there are no limitations on joining to replicated tables.

  • Extremely large result sets (greater than 50 megabytes in size) are not supported. If you execute a SELECT statement that generates a result set of more than 50 megabytes, VoltDB will return an error.

Window Functions

Window functions, which can appear in the selection list, allow you to perform more selective calculations on the statement results than you can do with plain aggregation functions such as COUNT() or SUM(). Window functions execute the specified operation on a subset of the total selection results, controlled by the PARTITION BY and ORDER BY clauses. The overall syntax for a window function is as follows:

function-name( [expression] ) OVER ( [ PARTITION BY {expression [,...]} ] [ORDER BY { expression [,...]} ] )

Where:

  • The PARTITION BY[1] clause defines how the selection results are grouped.

  • The ORDER BY clause defines the order in which the rows are evaluated within each group.

An example may help explain the behavior of the two clauses. Say you have a database table that lists the population of individual cities and includes columns for country and state. You can use the window function COUNT(city) OVER (PARTITION BY state) to include a count of all of the cities within each state as part of each city record. You can also control the order the records are evaluated using the ORDER BY clause. Note, however, when you use the ORDER BY clause the window function results are calculated sequentially. So rather than show the count of all cities in the state each time, the window function will return the count of cities incrementally up to the current record in the group. So rather than use COUNT() you can use RANK() to more accurately indicate the values being returned. For example, RANK() OVER (PARTITION BY state, ORDER BY city_population) lists the cities for each state with a rank value showing their ranking in order of their population.

Please be aware of the following limitations when using the window functions:

  • There can be only one window function per SELECT statement.

  • You cannot use a window function and GROUP BY in the same SELECT statement.

  • The argument(s) to the ORDER BY clause can be either integer or TIMESTAMP expressions only.

The following list describes the operation and constraints for each window function separately.

RANK() OVER ( [ PARTITION BY {expression [,...]} ] ORDER BY {expression [,...]} )

The RANK() window function generates a BIGINT value (starting at 1) representing the ranking of the current result within the group defined by the PARTITION BY expression(s) or of the entire result set if PARTITION BY is not specified. No function argument is allowed and the ORDER BY clause is required.

For example, if you rank a column (say, city_population) and use the country column as the partitioning column for the ranking, the cities of each country will be ranked separately. If you use both state and country as partitioning columns, then the cities for each state in each country will be ranked separately.

DENSE_RANK() OVER ( [ PARTITION BY {expression [,...]} ] ORDER BY {expression [,...]} )

The DENSE_RANK() window function generates a BIGINT value (starting at 1) representing the ranking of the current result, in the same way the RANK() window function does. The difference between RANK() and DENSE_RANK() is how they handle ranking when there is more than one row with the same ORDER BY value.

If more than one row has the same ORDER BY value, those rows receive the same rank value in both cases. However, with the RANK() function, the next rank value is incremented by the number of preceding rows. For example, if the ORDER BY values of four rows are 100, 98, 98, and 73 the respective rank values using RANK() will be 1, 2, 2, and 4. Whereas, with the DENSE_RANK() function, the next rank value is always only incremented by one. So, if the ORDER BY values are 100, 98, 98, and 73, the respective rank values using DENSE_RANK() will be 1, 2, 2, and 3.

As with the RANK() window function, no function argument is allowed for the DENSE_RANK() function and the ORDER BY clause is required.

ROW_NUMBER() OVER ( [ PARTITION BY {expression [,...]} ] [ ORDER BY {expression [,...]} ] )

The ROW_NUMBER() window function generates a BIGINT value representing the ordinal order of the current result within the group defined by the PARTITION BY expression(s) or of the entire result set if PARTITION BY is not specified. No function argument is allowed.

For example, if you order a column (say, animal) and use the class column as the partitioning column, the animals in each class will be ordered separately. So "angelfish" might receive row number 1 in the type "finned fish" while "aardvark" is row number 1 in the type "mammal". But if you do not specify PARTITION BY, "angelfish" would be numbered after "aardvark".

Note that an ORDER BY clause is not required. However, use of ORDER BY is strongly recommended, preferably with sufficient columns to make the ordering unique. Without the ORDER BY clause the results of the query are nondeterministic.

COUNT({expression}) OVER ( [PARTITION BY {expression [,...]}] [ ORDER BY {expression [,...]} ] )

The COUNT() window function generates a sub-count of the number of rows within the current result set, where the PARTITION BY clause defines how the rows are grouped. The function argument is required.

SUM({expression}) OVER ( [PARTITION BY {expression [,...]}] [ ORDER BY {expression [,...]} ] )

The SUM() window function generates a sub-total of the specified column within the current result set, where the PARTITION BY clause defines how the rows are grouped. The function argument is required.

MAX({expression}) OVER ( [PARTITION BY {expression [,...]}] [ ORDER BY {expression [,...]} ] )

The MAX() window function reports the maximum value of a column within the current result set, where the PARTITION BY clause defines how the rows are grouped. If the ORDER BY clause is specified, the maximum value is calculated incrementally over the rows in the order specified. The function argument is required.

MIN({expression}) OVER ( [PARTITION BY {expression [,...]}] [ ORDER BY {expression [,...]} ] )

The MIN() window function reports the minimum value of a column within the current result set, where the PARTITION BY clause defines how the rows are grouped. If the ORDER BY clause is specified, the minimum value is calculated incrementally over the rows in the order specified. The function argument is required.

Subqueries

The SELECT statement can include subqueries. Subqueries are separate SELECT statements, enclosed in parentheses, where the results of the subquery are used as values, expressions, or arguments within the surrounding SELECT statement.

Subqueries, like any SELECT statement, are extremely flexible and can return a wide array of information. A subquery might return:

  • A single row with a single column — this is sometimes known as a scalar subquery and represents a single value

  • A single row with multiple columns — this is also known as a row value expression

  • Multiple rows with one or more columns

In general, VoltDB supports subqueries in the FROM clause, in the selection expression, and in boolean expressions in the WHERE clause or in CASE-WHEN-THEN-ELSE-END operations. However, different types of subqueries are allowed in different situations, depending on the type of data returned.

  • In the FROM clause, the SELECT statement supports all types of subquery as a table reference. The subquery must be enclosed in parentheses and must be assigned a table alias.

  • In the selection expression, scalar subqueries can be used in place of a single column reference.

  • In the WHERE clause and CASE operations, both scalar and non-scalar subqueries can be used as part of boolean expressions. Scalar subqueries can be used in place of any single-valued expression. Non-scalar subqueries can be used in the following situations:

    • Row value comparisons — Boolean expressions that compare one row value expression to another can use subqueries that resolve to one row with multiple columns. For example:

      select * from t1 
         where (a,c) > (select a, c from t2 where b=t1.b);
    • IN and EXISTS — Subqueries that return multiple rows can be used as an argument to the IN or EXISTS predicate to determine if a value (or set of values) exists within the rows returned by the subquery. For example:

      select * from t1 
         where a in (select a from t2);
      select * from t1
         where (a,c) in (select a, c from t2 where b=t1.b);
      select * from t1 where c > 3 and 
         exists (select a, b from t2 where a=t1.a);
    • ANY and ALL — Multi-row subqueries can also be used as the target of an ANY or ALL comparison, using either a scalar or row expression comparison. For example:

      select * from t1 
         where a > ALL (select a from t2);
      select * from t1
         where (a,c) = ANY (select a, c from t2 where b=t1.b);

Note that VoltDB does not support subqueries in the HAVING, ORDER BY, or GROUP BY clauses. Subqueries are also not supported for any of the data manipulation language (DML) statements: DELETE, INSERT, UPDATE, and UPSERT.

For the initial release of subqueries in selection and boolean expressions, only replicated tables can be used in the subquery. Both replicated and partitioned tables can be used in subqueries in place of table references in the FROM clause.

Set Operations

VoltDB also supports the set operations UNION, INTERSECT, and EXCEPT. These keywords let you perform set operations on two or more SELECT statements. UNION includes the combined results sets from the two SELECT statements, INTERSECT includes only those rows that appear in both SELECT statement result sets, and EXCEPT includes only those rows that appear in one result set but not the other.

Normally, UNION and INTERSECT provide a set including unique rows. That is, if a row appears in both SELECT results, it only appears once in the combined result set. However, if you include the ALL modifier, all matching rows are included. For example, UNION ALL will result in single entries for the rows that appear in only one of the SELECT results, but two copies of any rows that appear in both.

The UNION, INTERSECT, and EXCEPT operations obey the same rules that apply to joins:

  • You cannot perform set operations on SELECT statements that reference the same table.

  • All tables in the SELECT statements must either be replicated tables or partitioned tables partitioned on the same column value, using equality of the partitioning column in the WHERE clause.

Common Table Expressions

Common table expressions let you declare a named subquery that can be used in the main query the same way regular tables and columns are used. Common expressions are useful for simplifying queries that use an expression multiple times or for separating out two distinct aspects of a larger query. You declare a common table expression by placing the WITH clause before the main SELECT query. The WITH clause:

  • Defines the name of the common table expression

  • Optionally, renames the resulting columns

  • Declares the expression itself using standard SELECT statement syntax

VoltDB supports two forms of common table expressions:

  • Basic common expressions, with a name, optional column names, and the expression itself

  • Recursive expressions, using the WITH RECURSIVE keywords and merging two expressions with a UNION ALL set operation

You can use the results of the common table expression in the subsequent SELECT statement the same way you would reference regular tables in the database. For example, the following common table expression determines how many members live in each city, then uses that information to return a list of members who live in a city with fewer than the specified number of members:

WITH city_count (city,membercount) AS (
  SELECT cityname, count(*) FROM members 
    GROUP BY cityname
)
SELECT m.fullname, m.cityname FROM members AS m 
  JOIN city_count AS cc ON m.city = cc.city
  WHERE membercount < ?
  ORDER BY m.cityname,m.fullname;

Recursive common expressions are like regular table expressions, except they are self-referencing, so you can iterate over the results in a recursive fashion. Recursive common expressions are particularly useful for evaluating tree or graph structures that cannot be natively represented in flat database records or queries.

You declare a recursive expression with the WITH RECURSIVE keywords followed by:

  • The table name and, optionally, alias names for the columns

  • A base query that defines the starting condition

  • A UNION ALL set operator

  • A second, recursive query that iterates over the common table expression results

For example, assume you wanted to know all the employees in a specific branch of the company's organizational structure. However, organizational charts are hierarchical. Each employee record may only record that employee's direct manager. Recursive common expressions let you start at the top of a branch of the organizational "tree" and iteratively look for any employee reporting to that manager, then employees reporting to that person, and so on. The common table expression might look like this.

WITH RECURSIVE org (id) AS (
   SELECT mgr_id AS mgr FROM department
     WHERE dept_name=?
   UNION ALL
     SELECT  emp_id FROM employee, org
         WHERE employee.mgr_id = org.id   
)
SELECT e.emp_id, e.emp_name, e.emp_address
   FROM employee AS e, org
     WHERE e.emp_id = org.id;

Warning

As with any recursive programming, you are responsible for ensuring the common table expression does not result in an infinite loop. VoltDB cannot determine at compile time whether the expression is sufficiently bounded. The preceding example succeeds because the application ensures all employee/manager relationships are hierarchical — no manager reports to a employee lower in the tree. If evaluation of a common table expression results in a loop, VoltDB will eventually exceed some limit (such as the query timeout or maximum temporary table space) and fail the transaction. In certain cases, an infinite loop could use up so much memory it exceeds the resource limit and pauses the database.

Common table expressions in VoltDB have the following limitations:

  • There can be only one common table expression per query.

  • In multi-partition transactions, the common expression can reference replicated tables only.

  • In single-partition transactions, the common expression can reference both replicated and partitioned table, with the caveat that as in any partitioned transaction partitioned tables have access to only that data in the current partition.

  • For basic (non-recursive) common table expressions, the common expression cannot be self-referencing. That is, the SELECT statement within the WITH clause can reference actual database table and view names only, it cannot reference the common expression name itself.

Examples

The following example retrieves all of the columns from the EMPLOYEE table where the last name is "Smith":

SELECT * FROM employee WHERE lastname = 'Smith';

The following example retrieves selected columns for two tables at once, joined by the employee_id using an implicit inner join and sorted by last name:

SELECT lastname, firstname, salary 
    FROM employee AS e, compensation AS c
    WHERE e.employee_id = c.employee_id
    ORDER BY lastname DESC;

The following example includes both a simple SQL query defined in the schema and a client application to call the procedure repeatedly. This combination uses the LIMIT and OFFSET clauses to "page" through a large table, 500 rows at a time.

When retrieving very large volumes of data, it is a good idea to use LIMIT and OFFSET to constrain the amount of data in each transaction. However, to perform LIMIT OFFSET queries effectively, the database must include a tree index that encompasses all of the columns of the ORDER BY clause (in this example, the lastname and firstname columns).

Schema:

CREATE PROCEDURE EmpByLimit AS
       SELECT lastname, firstname FROM employee
       WHERE company = ?
       ORDER BY lastname ASC, firstname ASC
       LIMIT 500 OFFSET ?;

PARTITION PROCEDURE EmpByLimit ON TABLE Employee COLUMN Company;

Java Client Application:

long offset = 0;
String company = "ACME Explosives";
boolean alldone = false;
while ( ! alldone ) {
   VoltTable results[] = client.callProcedure("EmpByLimit",
                         company,offset).getResults();
   if (results[0].getRowCount() < 1) {
        // No more records.
        alldone = true; 
   } else {
        // do something with the results.
   }
   offset += 500;
}


[1] Use of the keyword PARTITION is for compatibility with SQL syntax from other databases and is unrelated to the columns used to partition single-partitioned tables. You can use the RANK() functions with either partitioned or replicated tables and the ranking column does not need to be the same as the partitioning column for VoltDB partitioned tables.