CREATE STREAM — Creates an output stream in the database.
CREATE STREAM stream-name
[PARTITION ON COLUMN column-name]
[EXPORT TO TARGET export-target-name] (
column-definition [,...]
);
column-definition: column-name datatype [DEFAULT value ] [ NOT NULL ]
The CREATE STREAM statement defines a stream and its associated columns in the database. A stream can be thought of as a virtual table. It has the same structure as a table, consisting of a list of columns and supporting all the same datatypes (Table A.1, “Supported SQL Datatypes”) as tables. The columns have the same rules in terms of naming and size. You can also use the INSERT statement to insert data into the stream once it is defined.
The three differences between streams and tables are:
No data is stored in the database for a stream, it is only used as a passthrough.
Because no data is stored, you cannot SELECT, UPDATE, or DELETE the stream contents.
No indexes or constraints (such as primary keys) are allowed on a stream.
Data inserted into the stream is not stored in the database. The stream is an ephemeral container used only for analysis and/or passing data through VoltDB to other systems via the export function.
Combining streams with views lets you perform summary analysis on data passing through VoltDB without having to store all of the underlying data. For example, you might want to know how many times users access a website and their most recent visit. But you do not need to store a record for each visit. In this case, you can create a stream, visits, to capture the event and a view, visit_by_user, to capture the cumulative data:
CREATE STREAM visits PARTITION ON COLUMN user_id ( user_id BIGINT NOT NULL, login TIMESTAMP ); CREATE VIEW visit_by_user ( user_id, total_visits, last_visit ) AS SELECT user_id, COUNT(*), MAX(login) FROM visits GROUP BY user_id;
When creating a view on a stream, the stream must be partitioned and the partition column must appear in the view. Another special feature of views on streams is that, because there is no underlying data stored for the view, VoltDB lets you modify the views content manually by issuing UPDATE and DELETE statements on the view. (This ability to manipulate the view is only available for views on streams. You cannot UPDATE or DELETE a view on a table; you must modify the data in the underlying table instead.)
For example, if you only care about a daily rollup of visits, you can use DELETE with the stream name to clear the data at midnight every night:
DELETE FROM visit_by_user;
Or if you need to adjust the cumulative analysis to, say, "reset" the entry for a specific user, you can use UPDATE:
UPDATE visit_by_user SET total_visits = 0, last_visit = NULL WHERE user_id = ?;
Streams can also be used to export data out of VoltDB into other systems, such as Kafka, CSV files, and so on. To export data into another system, you start by declaring one or more streams defining the data that will be sent to the external system. In the CREATE STREAM statement you also specify the named target for the export:
CREATE STREAM visits
EXPORT TO TARGET archive (
user_id BIGINT NOT NULL,
login TIMESTAMP
);
As soon as you declare the EXPORT TO TARGET clause for a stream, any data inserted into the stream is queued for export. If the export target is not defined in the database configuration, then the data waits in the queue. Once the export target is configured, the export connector begins sending the queued data to the configured destination. See Chapter 15, Streaming Data: Import, Export, and Migration for more information on configuring export targets.
Finally, you can combine analysis with export by creating a stream with an export target and also creating a view on that stream. So in our earlier example, if we want to warehouse data about each visit but use VoltDB to perform the real-time summary analysis, we would add an export definition, along with the partitioning clause, to the CREATE STREAM statement for the visits stream:
CREATE STREAM visits PARTITION ON COLUMN user_id EXPORT TO TARGET warehouse ( user_id BIGINT NOT NULL, login TIMESTAMP );
The following example defines a stream and a view on that stream. Note the use of the PARTITION ON clause to ensure the stream is partitioned, since it is being used in a view.
CREATE STREAM flightdata PARTITION ON COLUMN airport ( flight_id BIGINT NOT NULL, airport VARCHAR(3) NOT NULL, passengers INTEGER, eta TIMESTAMP ); CREATE VIEW all_flights (airport, flight_count, passenger_count) AS SELECT airport, count(*),sum(passengers) FROM flightdata GROUP BY airport;