Prerequisites
- Ensure you have existing Iceberg tables that you can deliver to, or the ability to create them via external systems.
- Access credentials for the underlying object storage (e.g., S3 access key and secret key).
- Appropriate permissions to deliver to the target Iceberg catalog and storage.
- An upstream source, table, or materialized view in RisingWave to output data from.
Basic connection example
Data update modes
Append-only
Use append-only mode when you only want to add new records to the target table:Upsert
Use upsert mode when you need to handle updates and deletes:Force append-only
Convert upsert streams to append-only by ignoring deletes and converting updates to inserts:Catalog configurations
AWS Glue catalog
For tables managed by AWS Glue:REST catalog
For REST catalog services including AWS S3 Tables:JDBC catalog
For JDBC-based catalogs:Advanced features
Exactly-once delivery
Enable exactly-once delivery semantics for critical data pipelines:Enabling exactly-once delivery provides stronger consistency guarantees but may impact performance due to additional coordination overhead.
Commit configuration
Control commit frequency and retry behavior:Compaction for Iceberg sink
Added in v2.5.0.
PREMIUM FEATUREThis is a premium feature. For a comprehensive overview of all premium features and their usage, please see RisingWave premium features.
WITH
clause:
Parameter | Description |
---|---|
enable_compaction | Whether to enable Iceberg compaction (true /false ). |
compaction_interval_sec | Interval (in seconds) between two compaction runs. Defaults to 3600 seconds. |
enable_snapshot_expiration | Whether to enable snapshot expiration. By default, it removes snapshots older than 5 days. |
Example
Iceberg compaction also requires a dedicated Iceberg compactor.Currently, please contact us via RisingWave Slack workspace to allocate the necessary resources. We are working on a self-service feature that will let you allocate Iceberg compactors directly from the cloud portal.
Use with different storage backends
Amazon S3
Google Cloud Storage
gcs.credential
is base64-encoded credential key obtained from the GCS service account key JSON file. To get this JSON file, refer to the guides of GCS documentation. - To encode it in base64, run the following command:
cat ~/Downloads/rwc-byoc-test-464bdd851bce.json | base64 -b 0 | pbcopy
, and then paste the output as the value for this parameter. - If this field is not specified, ADC (application default credentials) will be used.
Azure Blob Storage
Amazon S3 Tables integration
AWS S3 Tables provides automatic compaction and optimization:Data type mapping
RisingWave data types map to Iceberg types as follows:RisingWave Type | Iceberg Type | Notes |
---|---|---|
BOOLEAN | boolean | |
SMALLINT | int | |
INT | int | |
BIGINT | long | |
REAL | float | |
DOUBLE PRECISION | double | |
VARCHAR | string | |
BYTEA | binary | |
DECIMAL(p,s) | decimal(p,s) | |
TIME | time | |
DATE | date | |
TIMESTAMP | timestamp | |
TIMESTAMPTZ | timestamptz | |
INTERVAL | string | Serialized as string |
JSONB | string | Serialized as JSON string |
ARRAY | list | |
STRUCT | struct | |
MAP | map |
Configuration parameters
Required parameters
Parameter | Description |
---|---|
connector | Must be 'iceberg' |
type | Sink mode: 'append-only' or 'upsert' |
database.name | Target Iceberg database name |
table.name | Target Iceberg table name |
Optional parameters
Parameter | Description | Default |
---|---|---|
primary_key | Primary key for upsert sinks | None |
force_append_only | Force append-only mode from upsert source | false |
is_exactly_once | Enable exactly-once delivery | false |
commit_checkpoint_interval | Commit interval in checkpoints | 60 |
commit_retry_num | Number of commit retries | 8 |
- Object storage: Object storage configuration
- Catalogs: Catalog configuration
Integration patterns
Real-time analytics pipeline
Stream aggregated results to analytics tables:Change data capture
Stream database changes to data lake:Best practices
- Choose appropriate sink mode: Use append-only for event logs, upsert for dimensional data.
- Configure commit intervals: Balance latency vs file size based on your requirements.
- Enable exactly-once for critical data: Use for financial transactions or other critical data.
- Monitor sink lag: Track how far behind your sink is from the source data.
- Design proper partitioning: Ensure target tables are properly partitioned for query performance.
- Handle backpressure: Monitor sink performance and adjust resources as needed.
Monitoring and troubleshooting
Monitor sink performance
Limitations
- Schema evolution: Limited support for automatic schema changes.
- Concurrent writers: Coordinate with other systems writing to the same tables.
Next steps
- Ingest from Iceberg: Set up sources with Ingest from Iceberg.
- Configure catalogs: Review Catalog configuration for your setup.
- Storage setup: Configure your object storage in Object storage configuration.