When RisingWave streams data into Iceberg, it generates many small data files and creates frequent snapshots. Over time, this can degrade query performance and increase storage costs. To maintain healthy tables, RisingWave provides both automatic and manual maintenance features for compaction and snapshot expiration.Documentation Index
Fetch the complete documentation index at: https://docs.risingwave.com/llms.txt
Use this file to discover all available pages before exploring further.
- Compaction: Merges small data files and delete files into larger, optimized files to improve read performance.
- Snapshot Expiration: Removes old, unneeded snapshots and their associated data files to reclaim storage space.
Automatic maintenance
Version notes
- RisingWave introduced Iceberg automatic maintenance (default) in v2.5.0 and added
compaction.typewith thesmall-filesandfiles-with-deletecompaction types in v2.7.0. compaction.write_parquet_compressionandcompaction.write_parquet_max_row_group_rowswere added in v2.9.0. Thecompaction.target_file_size_mbparameter now also controls the output file size for sink writes (previously only applied to compaction).- Parameters prefixed with
compactionare currently in technical preview stage and may change in future releases.
Compaction types
RisingWave supports three compaction types for Iceberg tables. You can specify the type using thecompaction.type parameter.
| Compaction type | Description |
|---|---|
full | Rewrites all data files. This is the default type. |
small-files | Only compacts files smaller than a specified threshold. Use the compaction.small_files_threshold_mb parameter to set the threshold. |
files-with-delete | Only compacts data files that have associated delete files. Use the compaction.delete_files_count_threshold parameter to set the minimum number of delete files to trigger compaction. |
The
small-files and files-with-delete compaction types are only supported in Merge-on-Read mode. Copy-on-Write mode only supports the full compaction type.Parameters
Configure automatic maintenance by specifying the following parameters in theWITH clause of a CREATE SINK or CREATE TABLE ... ENGINE = iceberg statement.
General parameters
| Parameter | Description |
|---|---|
enable_compaction | Required. Set to true to enable automatic compaction and snapshot expiration. |
compaction_interval_sec | Optional. The interval in seconds between maintenance runs. Default: 3600. |
enable_snapshot_expiration | Optional. Set to true to enable snapshot expiration. By default, it removes snapshots older than 5 days. |
snapshot_expiration_max_age_millis | Optional. The maximum age (in milliseconds) for a snapshot to be retained. To keep only the latest snapshot, set this to 0. |
snapshot_expiration_retain_last | Optional. The minimum number of snapshots to retain, regardless of their age. |
Compaction parameters
| Parameter | Description |
|---|---|
compaction.type | Optional. The compaction strategy: full, small-files, or files-with-delete. Default: full. |
compaction.max_snapshots_num | Optional. The maximum number of snapshots allowed since the last rewrite operation. If set, the sink will pause if this number is exceeded until compaction completes. |
compaction.trigger_snapshot_count | Optional. The minimum number of snapshots since the last compaction required to trigger a new compaction. Both this threshold and the time interval must be met. |
compaction.target_file_size_mb | Optional. The target output file size in MB. Applies to both sink writes and compaction. Default: 1024. |
compaction.write_parquet_compression | Optional. The Parquet compression codec for output files. Accepted values: uncompressed, snappy, gzip, lzo, brotli, lz4, zstd. Default: snappy. Supports dynamic updates via ALTER SINK. |
compaction.write_parquet_max_row_group_rows | Optional. The maximum number of rows per Parquet row group. Must be greater than 0. Default: 122880. Supports dynamic updates via ALTER SINK. |
compaction.small_files_threshold_mb | Optional. For small-files compaction type, the threshold size in MB below which files will be compacted. |
compaction.delete_files_count_threshold | Optional. For files-with-delete compaction type, the minimum number of delete files associated with a data file required to trigger compaction. |
Examples
Full compaction (default)
The following example enables automatic compaction with the defaultfull compaction type:
Small files compaction
For Merge-on-Read tables with many small files, use thesmall-files compaction type to only compact files smaller than a threshold:
Files with delete compaction
For Merge-on-Read tables with accumulated delete files, use thefiles-with-delete compaction type to only compact data files that have associated delete files:
Manual maintenance
In addition to automatic background maintenance, you can trigger compaction and snapshot expiration manually at any time using theVACUUM command.
This gives you on-demand control over table optimization and storage cleanup.