- Native management: RisingWave manages the table’s lifecycle. You can interact with it like any other RisingWave table (querying, inserting, and using it in materialized views).
- Open storage format: Data is physically stored according to the Iceberg specification, ensuring compatibility with the broader Iceberg ecosystem.
- Simplified pipelines: You don’t need a separate
CREATE SINK
step to export data into Iceberg format. Data ingested or computed can land directly in these Iceberg tables. - Interoperability: Tables you create are standard Iceberg tables and can be read by external Iceberg-compatible query engines (like Spark, Trino, Flink, and Dremio) using the same catalog and storage configuration.
Setup and usage
1. Create an Iceberg connection
The Iceberg connection contains information about the catalog and object storage. For syntax and properties, seeCREATE CONNECTION
.
The following examples show how to create an Iceberg connection using different catalog types.
These examples use S3 for object storage. You can also use Google Cloud Storage (GCS) or Azure Blob Storage by replacing the S3 parameters with the appropriate parameters for your chosen storage backend. See the Object storage guide for details.
For enhanced security, you can store credentials like access keys as secrets instead of providing them directly. If you wish to use this feature, see Manage secrets.
JDBC catalog
2. Set connection as default (optional)
To simplify table creation, you can set a default connection for your session. This allows you to create Iceberg tables without specifying the connection each time.3. Create an Iceberg table
Create a table using the Iceberg engine:4. Work with your table
Once created, Iceberg tables work like any other RisingWave table:Stream data into Iceberg tables
You can stream data directly from sources into Iceberg tables:Time travel
Query historical snapshots of your Iceberg tables:External access
Tables created with the Iceberg engine are standard Iceberg tables that can be accessed by external tools: Spark:Partition strategy
RisingWave’s Iceberg table engine supports table partitioning using thepartition_by
option when creating tables. Partitioning helps organize data for efficient storage and query performance.
The supported partition_by
formats and examples are as follows:
-
'column'
— single column -
'column1,column2'
— multiple columns -
'bucket(n, column), column2'
— bucket partitioning -
'column1, truncate(n, column2)'
— truncate partitioning
Example
The partition key must be a prefix of the primary key.
partition_by
= 'c2,c3'
with PRIMARY KEY(c1, c2, c3)
will fail.Compaction for native Iceberg tables
Added in v2.5.0.
PREMIUM FEATUREThis is a premium feature. For a comprehensive overview of all premium features and their usage, please see RisingWave premium features.
WITH
clause:
Parameter | Description |
---|---|
enable_compaction | Whether to enable Iceberg compaction (true /false ). |
compaction_interval_sec | Interval (in seconds) between two compaction runs. Defaults to 3600 seconds. |
enable_snapshot_expiration | Whether to enable snapshot expiration. By default, it removes snapshots older than 5 days. |
Example
Iceberg compaction also requires a dedicated Iceberg compactor.Currently, please contact us via RisingWave Slack workspace to allocate the necessary resources. We are working on a self-service feature that will let you allocate Iceberg compactors directly from the cloud portal.
Use Amazon S3 Tables with native Iceberg tables
Amazon S3 Tables provides an AWS-native Iceberg catalog service. When using S3 Tables as the catalog for your native Iceberg tables, you get the benefit of automatic compaction.Create S3 Tables connection
Create Iceberg table with S3 Tables
Configuration options
Commit intervals
Control how frequently data is committed to the Iceberg table:barrier_interval_ms
and checkpoint_frequency
are system parameters that define the base checkpointing rate.
Limitations
Current limitations of creating native Iceberg tables:- Limited DDL operations: Some schema changes may require recreating the table.
- Single writer: Only RisingWave should write to tables created with this engine to ensure data consistency.
Best practices
- Use hosted catalog for simple setups: Start with
hosted_catalog = true
for quick development. - Configure appropriate commit intervals: Balance between latency and file size.
- Consider S3 Tables for production: Automatic compaction and AWS-native management.
- Design proper partitioning: Plan your partition strategy for query performance.
- Monitor file sizes: Be aware of small file accumulation and plan a compaction strategy.
Next steps
- Learn about the hosted catalog: See Hosted Iceberg Catalog for the simplest setup.
- Set up an external catalog: Review Catalog configuration for production deployments.
- Configure storage: See Object storage for details on configuring your object store.