SELECT

Documentation

VoltDB Home » Documentation » Using VoltDB

SELECT

SELECT — Fetches the specified rows and columns from the database.

Synopsis

Select-statement [{set-operator} Select-statement ] ...

Select-statement:
SELECT [ TOP integer-value ]
{ * | [ ALL | DISTINCT ] { column-name | selection-expression } [AS alias] [,...] }
FROM { table-reference } [ join-clause ]...
[WHERE [NOT] boolean-expression [ {AND | OR} [NOT] boolean-expression]...]
[clause...]

table-reference:
{ table-name [AS alias] | view-name [AS alias] | sub-query AS alias }

sub-query:
(Select-statement)

join-clause:
, table-reference
[INNER | {LEFT | RIGHT | FULL } [OUTER]] JOIN [{table-reference}] [join-condition]

join-condition:
ON conditional-expression
USING (column-reference [,...])

clause:
ORDER BY { column-name | alias } [ ASC | DESC ] [,...]
GROUP BY { column-name | alias } [,...]
HAVING boolean-expression
LIMIT integer-value [OFFSET row-count]

set-operator:
UNION [ALL]
INTERSECT [ALL]
EXCEPT

Description

The SELECT statement retrieves the specified rows and columns from the database, filtered and sorted by any clauses that are included in the statement. In its simplest form, the SELECT statement retrieves the values associated with individual columns. However, the selection expression can be a function such as COUNT and SUM.

The following features and limitations are important to note when using the SELECT statement with VoltDB:

  • See Appendix C, SQL Functions for a full list of the SQL functions that VoltDB supports.

  • VoltDB supports the following operators in expressions: addition (+), subtraction (-), multiplication (*), division (*) and string concatenation (||).

  • TOP n is a synonym for LIMIT n.

  • The WHERE expression supports the boolean operators: equals (=), not equals (!= or <>), greater than (>), less than (<), greater than or equal to (>=), less than or equal to (<=), LIKE, IS NULL, IS DISTINCT, IS NOT DISTINCT, AND, OR, and NOT. Note, however, although OR is supported syntactically, VoltDB does not optimize these operations and use of OR may impact the performance of your queries.

  • The boolean expression LIKE provides text pattern matching in a VARCHAR column. The syntax of the LIKE expression is {string-expression} LIKE '{pattern}' where the pattern can contain text and wildcards, including the underscore (_) for matching a single character and the percent sign (%) for matching zero or more characters. The string comparison is case sensitive.

    Where an index exists on the column being scanned and the pattern starts with a text prefix (rather than starting with a wildcard), VoltDB will attempt to use the index to maximize performance, For example, a query limiting the results to rows from the EMPLOYEE table where the primary index¸ the JOB_CODE column, begins with the characters "Temp" looks like this:

    SELECT * from EMPLOYEE where JOB_CODE like 'Temp%';
  • The boolean expression IN determines if a given value is found within a list of alternatives. For example, in the following code fragment the IN expression looks to see if a record is part of Hispaniola by evaluating whether the column COUNTRY is equal to either "Dominican Republic" or "Haiti":

    WHERE Country IN ('Dominican Republic', 'Haiti')

    Note that the list of alternatives must be enclosed in parentheses. The result of an IN expression is equivalent to a sequence of equality conditions separated by OR. So the preceding code fragment produces the same boolean result as:

    WHERE Country='Dominican Republic' OR Country='Haiti'

    The advantages are that the IN syntax provides more compact and readable code and can provide improved performance by using an index on the initial expression where available.

  • The boolean expression BETWEEN determines if a value falls within a given range. The evaluation is inclusive of the end points. In this way BETWEEN is a convenient alias for two boolean expressions determining if a value is greater than or equal to (>=) the starting value and less than or equal to (<=) the end value. For example, the following two WHERE clauses are equivalent:

    WHERE salary BETWEEN ? AND ?
    WHERE salary >= ? AND salary <= ?
  • The boolean expressions IS DISTINCT FROM and IS NOT DISTINCT FROM are similar to the equals ("=") and not equals ("<>") operators respectively, except when evaluating null operands. If either or both operands are null, the equals and not equals operators return a boolean null value, or false. IS DISTINCT FROM and IS NOT DISTINCT FROM consider null a valid operand. So if only one operand is null IS DISTINCT FROM returns true and IS NOT DISTINCT FROM returns false. If both operands are null IS DISTINCT FROM returns false and IS NOT DISTINCT FROM returns true.

  • When using placeholders in SQL statements involving the IN list expression, you can either do replacement of individual values within the list or replace the list as a whole. For example, consider the following statements:

    SELECT * from EMPLOYEE where STATUS IN (?, ?,?);
    SELECT * from EMPLOYEE where STATUS IN ?;

    In the first statement, there are three parameters that replace individual values in the IN list, allowing you to specify exactly three selection values. In the second statement the placeholder replaces the entire list, including the parentheses. In this case the parameter to the procedure call must be an array and allows you to change not only the values of the alternatives but the number of criteria considered.

    The following Java code fragment demonstrates how these two queries can be used in a stored procedure, resulting in equivalent SQL statements being executed:

    String arg1 = "Salary";
    String arg2 = "Hourly";
    String arg3 = "Parttime";
    voltQueueSQL( query1, arg1, arg2, arg3);
    
    String listargs[] = new String[3];
    listargs[0] = arg1;
    listargs[1] = arg2;
    listargs[2] = arg3;
    voltQueueSQL( query2, (Object) listargs);

    Note that when passing arrays as parameters in Java, it is a good practice to explicitly cast them as an object to avoid the array being implicitly expanded into individual call parameters.

  • VoltDB supports the use of CASE-WHEN-THEN-ELSE-END for conditional operations. For example, the following SELECT expression uses a CASE statement to return different values based on the contents of the price column:

    SELECT Prod_name, 
        CASE WHEN price > 100.00 
              THEN 'Expensive'
              ELSE 'Cheap'
        END 
    FROM products ORDER BY Prod_name;                      

    For more complex conditional operations with multiple alternatives, use of the DECODE() function is recommended.

  • VoltDB supports both inner and outer joins.

  • The SELECT statement supports subqueries as a table reference in the FROM clause. Subqueries must be enclosed in parentheses and must be assigned a table alias. Note that subqueries are only supported in the SELECT statement; they cannot be used in data manipulation statements such UPDATE or DELETE.

  • You can only join two or more partitioned tables if those tables are partitioned on the same value and joined on equality of the partitioning column. Joining two partitioned tables on non-partitioned columns or on a range of values is not supported. However, there are no limitations on joining to replicated tables.

  • Extremely large result sets (greater than 50 megabytes in size) are not supported. If you execute a SELECT statement that generates a result set of more than 50 megabytes, VoltDB will return an error.

Window Functions

Window functions, which can appear in the selection list, allow you to perform more selective calculations on the statement results than you can do with plain aggregation functions such as COUNT() or SUM(). Window functions execute the specified operation on a subset of the total selection results, controlled by the PARTITION BY and ORDER BY clauses. The overall syntax for a window function is as follows:

function-name( [expression] ) OVER ( [ PARTITION BY {expression [,...]} ] [ORDER BY { expression [,...]} ] )

Where:

  • The PARTITION BY[1] clause defines how the selection results are grouped.

  • The ORDER BY clause defines the order in which the rows are evaluated within each group.

An example may help explain the behavior of the two clauses. Say you have a database table that lists the population of individual cities and includes columns for country and state. You can use the window function COUNT(city) OVER (PARTITION BY state) to include a count of all of the cities within each state as part of each city record. You can also control the order the records are evaluated using the ORDER BY clause. Note, however, when you use the ORDER BY clause the window function results are calculated sequentially. So rather than show the count of all cities in the state each time, the window function will return the count of cities incrementally up to the current record in the group. So rather than use COUNT() you can use RANK() to more accurately indicate the values being returned. For example, RANK() OVER (PARTITION BY state, ORDER BY city_population) lists the cities for each state with a rank value showing their ranking in order of their population.

Please be aware of the following limitations when using the window functions:

  • There can be only one window function per SELECT statement.

  • You cannot use a window function and GROUP BY in the same SELECT statement.

  • The argument(s) to the ORDER BY clause can be either integer or TIMESTAMP expressions only.

The following list describes the operation and constraints for each window function separately.

RANK() OVER ( [ PARTITION BY {expression [,...]} ] ORDER BY {expression [,...]} )

The RANK() window function generates a BIGINT value (starting at 1) representing the ranking of the current result within the group defined by the PARTITION BY expression(s) or of the entire result set if PARTITION BY is not specified. No function argument is allowed and the ORDER BY clause is required.

For example, if you rank a column (say, city_population) and use the country column as the partitioning column for the ranking, the cities of each country will be ranked separately. If you use both state and country as partitioning columns, then the cities for each state in each country will be ranked separately.

DENSE_RANK() OVER ( [ PARTITION BY {expression [,...]} ] ORDER BY {expression [,...]} )

The DENSE_RANK() window function generates a BIGINT value (starting at 1) representing the ranking of the current result, in the same way the RANK() window function does. The difference between RANK() and DENSE_RANK() is how they handle ranking when there is more than one row with the same ORDER BY value.

If more than one row has the same ORDER BY value, those rows receive the same rank value in both cases. However, with the RANK() function, the next rank value is incremented by the number of preceding rows. For example, if the ORDER BY values of four rows are 100, 98, 98, and 73 the respective rank values using RANK() will be 1, 2, 2, and 4. Whereas, with the DENSE_RANK() function, the next rank value is always only incremented by one. So, if the ORDER BY values are 100, 98, 98, and 73, the respective rank values using DENSE_RANK() will be 1, 2, 2, and 3.

As with the RANK() window function, no function argument is allowed for the DENSE_RANK() function and the ORDER BY clause is required.

COUNT({expression}) OVER ( [PARTITION BY {expression [,...]}] [ ORDER BY {expression [,...]} ] )

The COUNT() window function generates a sub-count of the number of rows within the current result set, where the PARTITION BY clause defines how the rows are grouped. The function argument is required.

SUM({expression}) OVER ( [PARTITION BY {expression [,...]}] [ ORDER BY {expression [,...]} ] )

The SUM() window function generates a sub-total of the specified column within the current result set, where the PARTITION BY clause defines how the rows are grouped. The function argument is required.

MAX({expression}) OVER ( [PARTITION BY {expression [,...]}] [ ORDER BY {expression [,...]} ] )

The MAX() window function reports the maximum value of a column within the current result set, where the PARTITION BY clause defines how the rows are grouped. If the ORDER BY clause is specified, the maximum value is calculated incrementally over the rows in the order specified. The function argument is required.

MIN({expression}) OVER ( [PARTITION BY {expression [,...]}] [ ORDER BY {expression [,...]} ] )

The MIN() window function reports the minimum value of a column within the current result set, where the PARTITION BY clause defines how the rows are grouped. If the ORDER BY clause is specified, the minimum value is calculated incrementally over the rows in the order specified. The function argument is required.

Subqueries

The SELECT statement can include subqueries. Subqueries are separate SELECT statements, enclosed in parentheses, where the results of the subquery are used as values, expressions, or arguments within the surrounding SELECT statement.

Subqueries, like any SELECT statement, are extremely flexible and can return a wide array of information. A subquery might return:

  • A single row with a single column — this is sometimes known as a scalar subquery and represents a single value

  • A single row with multiple columns — this is also known as a row value expression

  • Multiple rows with one or more columns

In general, VoltDB supports subqueries in the FROM clause, in the selection expression, and in boolean expressions in the WHERE clause or in CASE-WHEN-THEN-ELSE-END operations. However, different types of subqueries are allowed in different situations, depending on the type of data returned.

  • In the FROM clause, the SELECT statement supports all types of subquery as a table reference. The subquery must be enclosed in parentheses and must be assigned a table alias.

  • In the selection expression, scalar subqueries can be used in place of a single column reference.

  • In the WHERE clause and CASE operations, both scalar and non-scalar subqueries can be used as part of boolean expressions. Scalar subqueries can be used in place of any single-valued expression. Non-scalar subqueries can be used in the following situations:

    • Row value comparisons — Boolean expressions that compare one row value expression to another can use subqueries that resolve to one row with multiple columns. For example:

      select * from t1 
         where (a,c) > (select a, c from t2 where b=t1.b);
    • IN and EXISTS — Subqueries that return multiple rows can be used as an argument to the IN or EXISTS predicate to determine if a value (or set of values) exists within the rows returned by the subquery. For example:

      select * from t1 
         where a in (select a from t2);
      select * from t1
         where (a,c) in (select a, c from t2 where b=t1.b);
      select * from t1 where c > 3 and 
         exists (select a, b from t2 where a=t1.a);
    • ANY and ALL — Multi-row subqueries can also be used as the target of an ANY or ALL comparison, using either a scalar or row expression comparison. For example:

      select * from t1 
         where a > ALL (select a from t2);
      select * from t1
         where (a,c) = ANY (select a, c from t2 where b=t1.b);

Note that subqueries are only supported in the SELECT statement; they cannot be used in data manipulation statements such UPDATE or DELETE or in CREATE VIEW statements or index definitions. Also, VoltDB does not support subqueries in the HAVING, ORDER BY, or GROUP BY clauses.

For the initial release of subqueries in selection and boolean expressions, only replicated tables can be used in the subquery. Both replicated and partitioned tables can be used in subqueries in place of table references in the FROM clause.

Set Operations

VoltDB also supports the set operations UNION, INTERSECT, and EXCEPT. These keywords let you perform set operations on two or more SELECT statements. UNION includes the combined results sets from the two SELECT statements, INTERSECT includes only those rows that appear in both SELECT statement result sets, and EXCEPT includes only those rows that appear in one result set but not the other.

Normally, UNION and INTERSECT provide a set including unique rows. That is, if a row appears in both SELECT results, it only appears once in the combined result set. However, if you include the ALL modifier, all matching rows are included. For example, UNION ALL will result in single entries for the rows that appear in only one of the SELECT results, but two copies of any rows that appear in both.

The UNION, INTERSECT, and EXCEPT operations obey the same rules that apply to joins:

  • You cannot perform set operations on SELECT statements that reference the same table.

  • All tables in the SELECT statements must either be replicated tables or partitioned tables partitioned on the same column value, using equality of the partitioning column in the WHERE clause.

Examples

The following example retrieves all of the columns from the EMPLOYEE table where the last name is "Smith":

SELECT * FROM employee WHERE lastname = 'Smith';

The following example retrieves selected columns for two tables at once, joined by the employee_id using an implicit inner join and sorted by last name:

SELECT lastname, firstname, salary 
    FROM employee AS e, compensation AS c
    WHERE e.employee_id = c.employee_id
    ORDER BY lastname DESC;

The following example includes both a simple SQL query defined in the schema and a client application to call the procedure repeatedly. This combination uses the LIMIT and OFFSET clauses to "page" through a large table, 500 rows at a time.

When retrieving very large volumes of data, it is a good idea to use LIMIT and OFFSET to constrain the amount of data in each transaction. However, to perform LIMIT OFFSET queries effectively, the database must include a tree index that encompasses all of the columns of the ORDER BY clause (in this example, the lastname and firstname columns).

Schema:

CREATE PROCEDURE EmpByLimit AS
       SELECT lastname, firstname FROM employee
       WHERE company = ?
       ORDER BY lastname ASC, firstname ASC
       LIMIT 500 OFFSET ?;

PARTITION PROCEDURE EmpByLimit ON TABLE Employee COLUMN Company;

Java Client Application:

long offset = 0;
String company = "ACME Explosives";
boolean alldone = false;
while ( ! alldone ) {
   VoltTable results[] = client.callProcedure("EmpByLimit",
                         company,offset).getResults();
   if (results[0].getRowCount() < 1) {
        // No more records.
        alldone = true; 
   } else {
        // do something with the results.
   }
   offset += 500;
}


[1] Use of the keyword PARTITION is for compatibility with SQL syntax from other databases and is unrelated to the columns used to partition single-partitioned tables. You can use the RANK() functions with either partitioned or replicated tables and the ranking column does not need to be the same as the partitioning column for VoltDB partitioned tables.