Interact with Iceberg
Learn about different ways to use Iceberg with RisingWave
Apache Iceberg is an open table format for huge analytic tables stored typically on object storage. RisingWave offers comprehensive support for working with Iceberg tables, enabling you to leverage Iceberg’s capabilities within your streaming pipelines.
RisingWave provides two distinct approaches for working with Iceberg, each designed for different use cases and architectural patterns.
Choose your approach
RisingWave Managed Iceberg
RisingWave creates and manages Iceberg tables natively
Choose this when you want RisingWave to be the primary owner of your Iceberg tables. RisingWave handles table creation, schema management, and the complete lifecycle while storing data in the standard Iceberg format.
Key benefits:
- Simplified architecture: No external catalog setup required with hosted catalog option
- Streaming-first: Direct path from streaming sources to Iceberg format
- Native management: Tables work like any other RisingWave table for queries and operations
- Ecosystem compatibility: Standard Iceberg tables readable by Spark, Trino, Flink, etc
Use cases:
- New streaming applications where RisingWave is the primary data platform.
- Quick start with Iceberg without external infrastructure.
- Streaming data directly into analytical storage format.
➡️ Get started with RisingWave Managed Iceberg →
Bring Your Own Iceberg (BYOI)
Connect to existing Iceberg tables managed by external systems
Choose this when you have existing Iceberg tables (created by Spark, Flink, batch jobs, etc.) and want RisingWave to read from or write to them as part of a larger data ecosystem.
Key benefits:
- Integration: Work with existing data lakes and multi-system architectures
- Flexibility: Support for various external catalog systems (AWS Glue, JDBC, REST, etc.)
- Hybrid processing: Combine batch and stream processing on the same tables
- Data lake patterns: Implement lambda/kappa architectures with existing infrastructure
Use cases:
- Existing Iceberg data lakes that need streaming capabilities.
- Multi-engine environments where different systems share Iceberg tables.
- Integration into existing ETL/ELT pipelines.
- Adding real-time processing to batch-oriented data workflows.
➡️ Get started with Bring Your Own Iceberg →
Understanding RisingWave’s Iceberg integration
Storage architecture
It’s important to understand that RisingWave’s own internal storage system (Hummock) also uses object storage (like S3) to persist data, but it uses a row-based format optimized for RisingWave’s internal operations.
When working with Iceberg, you are storing or accessing data in the columnar Iceberg format on object storage, which is designed for analytical workloads and ecosystem interoperability.
Interaction methods
RisingWave provides three core methods for interacting with Iceberg:
- Iceberg Source: Read data from existing Iceberg tables managed externally
- Iceberg Sink: Write data to existing Iceberg tables managed externally
- Iceberg Table Engine: Create and manage tables natively in RisingWave using Iceberg format
The first two methods are part of the Bring Your Own Iceberg approach, while the third is part of RisingWave Managed Iceberg.
Quick decision guide
Choose RisingWave Managed Iceberg if:
- ✅ You’re building new applications with RisingWave as the primary platform.
- ✅ You want the simplest setup possible.
- ✅ You don’t have existing Iceberg infrastructure.
- ✅ You want streaming data to flow directly into analytical storage.
Choose Bring Your Own Iceberg if:
- ✅ You have existing Iceberg tables from other systems.
- ✅ Multiple applications need to share the same Iceberg tables.
- ✅ You have existing catalog infrastructure (AWS Glue, etc.).
- ✅ You’re adding streaming capabilities to existing batch workflows.
Advanced features
Both approaches support advanced Iceberg features:
- Time travel: Query historical snapshots of your data
- Schema evolution: Handle changing table schemas over time
- Partitioning: Optimize query performance with table partitioning
- Multiple storage backends: S3, Google Cloud Storage, Azure Blob Storage
- Various catalog types: Hosted, JDBC, AWS Glue, REST, Storage, Hive, Snowflake
Next steps
- Identify your use case: Review the descriptions above to choose your approach.
- Follow the appropriate guide:
- RisingWave Managed Iceberg for native table management
- Bring Your Own Iceberg for external table integration
- Explore advanced configurations: Both approaches share common configuration patterns for storage and catalogs.