Deliver data to Iceberg tables

Try our SQL generator tool! We’ve built an interactive tool to help you generate SQL statements for creating Iceberg sinks. Check it out at https://sql.risingwave.com/sink/iceberg.

This guide explains how to deliver processed data from RisingWave into existing Iceberg tables. Use this when you have Iceberg tables managed by external systems and want RisingWave to deliver processed results into them.

Prerequisites

An upstream source, table, or materialized view in RisingWave to output data from.
Existing Iceberg tables that you can deliver to, or the ability to create them via external systems.
Appropriate permissions to deliver to the target Iceberg catalog and storage.
Access credentials for the underlying object storage (e.g., S3 access key and secret key).

Create an Iceberg sink

To write data to an external Iceberg table, create a SINK. This statement defines how data from an upstream object should be formatted and delivered to the target Iceberg table.

CREATE SINK my_iceberg_sink FROM processed_events
WITH (
    connector = 'iceberg',
    type = 'append-only',
    warehouse.path = 's3://my-data-lake/warehouse',
    database.name = 'analytics',
    table.name = 'processed_user_events',
    catalog.type = 'glue',
    catalog.name = 'my_glue_catalog',
    s3.access.key = 'your-access-key',
    s3.secret.key = 'your-secret-key',
    s3.region = 'us-west-2',
    partition_by = 'partition_by_column_name'
);

Configuration parameters

Parameter	Required	Description
`connector`	Yes	Must be `'iceberg'`.
`type`	Yes	Sink mode. `'append-only'` for new records only; `'upsert'` to handle updates and deletes.
`database.name`	Yes	The name of the target Iceberg database.
`table.name`	Yes	The name of the target Iceberg table.
`primary_key`	Yes, if `type` is `upsert`	A comma-separated list of columns that form the primary key.
`force_append_only`	No	If `true`, converts an `upsert` stream to `append-only`. Updates become inserts and deletes are ignored. Default: `false`.
`is_exactly_once`	No	If `true`, enables exactly-once delivery semantics. This provides stronger consistency but may impact performance. Default: `false`.
`commit_checkpoint_interval`	No	The number of checkpoints between commits. The approximate time to commit is `barrier_interval_ms` × `checkpoint_frequency` × `commit_checkpoint_interval`. Default: `60`.
`commit_retry_num`	No	The number of times to retry a failed commit. Default: `8`.
`partition_by`	No	Specify partitioning using column names or transformations. Supported transformations include `identity`, `truncate(n)`, `bucket(n)`, `year`, `month`, `day`, `hour`, and `void`. Multiple columns can be separated by commas. Example: `partition_by = 'truncate(4,v2),bucket(5,v1)'`. For more details on Iceberg partitioning, see Partition transforms.

For detailed storage and catalog configuration:

Object storage: Object storage configuration
Catalogs: Catalog configuration

You can configure commit_checkpoint_interval and commit_retry_num to manage commit frequency and retry behavior. The approximate time to commit is calculated as:

time = barrier_interval_ms × checkpoint_frequency × commit_checkpoint_interval

For details about barrier_interval_ms and checkpoint_frequency, see ALTER DATABASE.

Table maintenance

When you continuously sink data to an Iceberg table, it is important to perform periodic maintenance, including compaction and snapshot expiration, to maintain good query performance and manage storage costs. RisingWave provides both automatic and manual maintenance options. For complete details, see the Iceberg table maintenance guide.

Monitoring

-- Check sink status
SHOW SINKS;

-- View sink details
DESCRIBE SINK my_iceberg_sink;

Interact with Apache Iceberg

​Prerequisites

​Create an Iceberg sink

​Configuration parameters

​Table maintenance

​Monitoring

Prerequisites

Create an Iceberg sink

Configuration parameters

Table maintenance

Monitoring