Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.risingwave.com/llms.txt

Use this file to discover all available pages before exploring further.

Need help generating SQL? Use Claude Code or Cursor with the RisingWave MCP server to generate and run SQL interactively. This guide explains how to deliver processed data from RisingWave into existing Iceberg tables. Use this when you have Iceberg tables managed by external systems and want RisingWave to deliver processed results into them.

Prerequisites

  • An upstream source, table, or materialized view in RisingWave to output data from.
  • Existing Iceberg tables that you can deliver to, or the ability to create them via external systems.
  • Appropriate permissions to deliver to the target Iceberg catalog and storage.
  • Access credentials for the underlying object storage (e.g., S3 access key and secret key).

Create an Iceberg sink

To write data to an external Iceberg table, create a SINK. This statement defines how data from an upstream object should be formatted and delivered to the target Iceberg table.
CREATE SINK my_iceberg_sink FROM processed_events
WITH (
    connector = 'iceberg',
    type = 'append-only',
    warehouse.path = 's3://my-data-lake/warehouse',
    database.name = 'analytics',
    table.name = 'processed_user_events',
    create_table_if_not_exists = 'true',
    catalog.type = 'glue',
    catalog.name = 'my_glue_catalog',
    s3.access.key = 'your-access-key',
    s3.secret.key = 'your-secret-key',
    s3.region = 'us-west-2',
    partition_by = 'partition_by_column_name'
);

Configuration parameters

ParameterRequiredDescription
connectorYesMust be 'iceberg'.
typeYesSink mode. 'append-only' for new records only; 'upsert' to handle updates and deletes.
database.nameYesThe name of the target Iceberg database.
table.nameYesThe name of the target Iceberg table.
primary_keyYes, if type is upsertA comma-separated list of columns that form the primary key.
write_modeNoWrite mode for the sink. Options: 'merge-on-read' (default) or 'copy-on-write'. Important: Copy-on-write is only supported for upsert sinks. Append-only sinks must use merge-on-read. See Write modes for details.
force_append_onlyNoIf true, converts an upsert stream to append-only. Updates become inserts and deletes are ignored. Default: false.
is_exactly_onceNoSet to true to enable exactly-once delivery semantics. This provides stronger consistency but may impact performance. Default: true.

Exactly-once delivery requires sink decoupling to be enabled (the default behavior). If you SET sink_decouple = false;, exactly-once semantics will be automatically disabled for the sink.
commit_checkpoint_intervalNoControls how often RisingWave commits to Iceberg. Default: 60 (about every 60 seconds in the default configuration).
commit_checkpoint_size_threshold_mbNoThe buffered write size threshold in MB that triggers an early commit at the next checkpoint barrier. When uncommitted data exceeds this value, RisingWave commits on the next checkpoint instead of waiting for commit_checkpoint_interval. Must be greater than 0. Default: 128. Useful for large backfills to avoid accumulating too much uncommitted data.
commit_retry_numNoThe number of times to retry a failed commit. Default: 8.
create_table_if_not_existsNoIf true, RisingWave creates the external Iceberg table automatically when it does not exist. Default: false.
partition_byNoSpecify partitioning using column names or transformations. Supported transformations include identity, truncate(n), bucket(n), year, month, day, hour, and void. Multiple columns can be separated by commas. Example: partition_by = 'truncate(4,v2),bucket(5,v1)'. For more details on Iceberg partitioning, see Partition transforms.
auto.schema.changeNoSet to true to enable automatic schema evolution. When enabled, RisingWave automatically propagates schema changes (for example, adding columns) from the upstream source to the Iceberg table. Automatic schema evolution is only supported when exactly-once semantics are active (is_exactly_once = 'true' and sink decoupling is enabled); if exactly-once is disabled for the sink, this option has no effect and schema changes are not propagated. Currently supports ADD COLUMN operations. Default: false.
For detailed storage and catalog configuration: You can configure commit_checkpoint_interval, commit_checkpoint_size_threshold_mb, and commit_retry_num to manage commit frequency and retry behavior.

Schema evolution

RisingWave supports automatic schema evolution for Iceberg sinks, allowing schema changes from upstream sources to be automatically propagated to the target Iceberg table. This feature works with exactly-once semantics. To enable schema evolution, set auto.schema.change = 'true' when creating the sink:
CREATE SINK my_iceberg_sink FROM upstream_source
WITH (
    connector = 'iceberg',
    type = 'append-only',
    warehouse.path = 's3://my-bucket/warehouse',
    database.name = 'my_database',
    table.name = 'my_table',
    catalog.type = 'glue',
    s3.access.key = 'your-access-key',
    s3.secret.key = 'your-secret-key',
    auto.schema.change = 'true',
    is_exactly_once = 'true'
);

Supported operations

Currently, RisingWave supports the following schema change operations:
  • ADD COLUMN: Automatically adds new columns to the Iceberg table when they are added to the upstream source.

How it works

When schema evolution is enabled:
  1. RisingWave detects schema changes in the upstream source
  2. Schema changes are applied to the Iceberg table as atomic schema updates in the catalog
  3. The sink coordinates schema commits with data commits to ensure consistency
  4. In exactly-once mode, RisingWave ensures that schema changes are applied exactly once, even in the presence of failures
Schema changes use separate commit operations from data writes, but the sink coordinates them so that the required schema updates are durably committed and not duplicated during recovery. The system checks whether schema changes have already been applied to avoid duplicate schema updates during recovery.

Table maintenance

When you continuously sink data to an Iceberg table, it is important to perform periodic maintenance, including compaction and snapshot expiration, to maintain good query performance and manage storage costs. RisingWave provides both automatic and manual maintenance options. For complete details, see the Iceberg table maintenance guide.

Monitoring

-- Check sink status
SHOW SINKS;

-- View sink details
DESCRIBE SINK my_iceberg_sink;