Ingest additional fields with INCLUDE clause

Syntax

To add additional columns, use the INCLUDE clause.

INCLUDE { header | key | offset | partition | timestamp } [AS <column_name>]

If <column_name> is not specified, a default one will be generated in the format _rw_{connector}_{col}, where connector is the name of the source connector used (Kafka, Pulsar, Kinesis, etc.), and col is the type of column being generated (key, offset, timestamp, etc.). For instance, if an offset column is added to a Kafka source, the default column name would be _rw_kafka_offset.

This clause should be included in your CREATE SOURCE or CREATE TABLE command after the schema is defined.

CREATE { SOURCE | TABLE } name [(
    schema_definition )]
[INCLUDE ... [AS ...]]
WITH ( ... )
FORMAT data_format ENCODE data_encode (
    ...
);

Note that for UPSERT types of sources and tables, INCLUDE KEY is required as RisingWave will use this column to perform upsert semantics. A primary key cannot be defined as multiple columns in this case.

Supported connectors

The INCLUDE clause can be used with the following source connectors.

Kafka

When ingesting data from Kafka, the following additional fields can be included.

Allowed components	Default type	Note
key	`bytea`	Can be overwritten by `ENCODE` and `KEY ENCODE`.
timestamp	`timestamp with time zone` (i64 in millis)	Refer to `CreateTime` rather than `LogAppendTime`.
partition	`varchar`	The partition the message is from.
offset	`varchar`	The offset in the partition.
headers	`struct<varchar, bytea>[]`	Key-value pairs along with the message.

In the case of headers, there are two ways to define it.

You can choose to generate headers with type List[Struct<Varchar, Bytea>].

INCLUDE header [AS kafka_header]

Or you can generate a type bytea header, where the column content will be specified as the value associated with the specified key, header_col. The header_col field can only be defined when including a header. In this case, the generated column name will have the format _rw_kafka_header_{header col name}_{col type}, where col type is the data type of the header column.

INCLUDE header 'header_col' [AS kafka_header]

Kinesis

When ingesting data from Kinesis, here are some things to note when including the following fields.

Allowed components	Default type	Note
key	bytea	Can be overwritten by encode and key encode.
timestamp	timestamp with time zone	See the approximate_arrival_timestamp field at Struct aws_sdk_kinesis::types::Record.
partition	varchar	The partition the message is from.
offset	varchar	The offset in the partition, which corresponds to Kinesis sequence numbers.

For more components, see Struct aws_sdk_kinesis::types::Record.

Pulsar

When ingesting data from Pulsar, here are some things to note when including the following fields.

Allowed components	Default type	Note
key	bytea	Can be overwritten by ENCODE and KEY ENCODE.
partition	varchar	The partition the message is from.
offset	varchar	The offset in the partition.

For more components, see Struct pulsar::message::proto::MessageMetadata.

S3 and GCS

When ingesting data from AWS S3 or GCS, the following additional fields can be included.

Allowed components	Default type	Note
file	varchar	The file the record is from.
offset	varchar	The offset in the file.

MQTT

When ingesting data from MQTT, the following additional fields can be included.

Allowed components	Default type	Note
partition	varchar	The topic the record is from.

Examples

Here we create a table, additional_columns, that ingests data from a Kafka broker. Aside from the a column, which is part of the message payload, the additional fields key, partition, offset, timestamp, and header, are also added to the table.

CREATE TABLE additional_columns (
  a int,
  primary key (key_col)
)
INCLUDE key AS key_col
INCLUDE partition AS partition_col
INCLUDE offset AS offset_col
INCLUDE timestamp AS timestamp_col
INCLUDE header AS header_col
WITH (
	connector = 'kafka',
  properties.bootstrap.server = 'message_queue:29092',
	topic = 'kafka_additional_columns')
FORMAT UPSERT ENCODE JSON;

Get started

Manage data

Deploy & operate

Optimize performance

Troubleshooting

Reference

Cloud

Ingest additional fields with INCLUDE clause

Syntax

Supported connectors

Kafka

Kinesis

Pulsar

S3 and GCS

MQTT

Examples

Get started

Manage data

Deploy & operate

Optimize performance

Troubleshooting

Reference

Cloud

​Syntax

​Supported connectors

​Kafka

​Kinesis

​Pulsar

​S3 and GCS

​MQTT

​Examples

Syntax

Supported connectors

Kafka

Kinesis

Pulsar

S3 and GCS

MQTT

Examples