What gets stored (and what doesn’t)
RisingWave persists data for:- Tables
- Materialized views (MVs)
CREATE TABLE ... WITH (connector=...)).
For a practical comparison, see CREATE SOURCE vs. CREATE TABLE and Source, Table, MV, and Sink.
Where the data is persisted
When you create tables or MVs, RisingWave persists their internal state in your configured object store (for example, Amazon S3). Compute nodes may cache hot data locally for performance, but the durable copy is stored in the object store.Two storage options
RisingWave offers two ways to persist data:- Row-based storage (default) via the Hummock storage engine
- Columnar storage using Apache Iceberg
Quick guide: which one should I use?
| Workload / requirement | Recommended storage |
|---|---|
| Low-latency point lookups, “latest state”, frequent updates/deletes | Row-based (Hummock) |
| Streaming pipelines with MVs that need fast incremental maintenance | Row-based (Hummock) |
| Large scans, long-range analytics, and lakehouse interoperability | Iceberg (columnar) |
| Need to query the same dataset from other engines (Spark/Trino/etc.) | Iceberg (columnar) |
Row-based storage (Hummock)
By default, tables and materialized views are stored in row-based storage using Hummock, a storage engine designed for streaming updates. Best for:- Serving up-to-date results with low latency.
- Workloads dominated by point queries and short-range scans.
- Pipelines with frequent incremental updates (CDC, upserts, streaming aggregations).
- Not optimized for very large full-table scans compared with columnar formats.
Columnar storage (Apache Iceberg)
RisingWave can store analytical datasets in Apache Iceberg, a widely adopted columnar table format in the lakehouse ecosystem. Best for:- Analytical queries that scan lots of data (reporting, dashboards, ad-hoc BI).
- Sharing the same tables with external engines that understand Iceberg.
Common patterns
- Retain raw data + compute derived results: Ingest into a table for durable retention, then build MVs for continuously maintained aggregations.
- Explore first, persist later: Start with
CREATE SOURCEfor quick exploration; switch to a connector-backed table when you need durability or performance. - Hybrid analytics: Keep operational/serving state in Hummock, and keep large analytical datasets (or shared lakehouse tables) in Iceberg.
What’s next?
- If you’re choosing between connector objects, start with CREATE SOURCE vs. CREATE TABLE.
- If you want to expose results to tools and applications, see Access overview.