Skip to main content
This page explains what it means to “store data” in RisingWave, where the persisted data lives, and how to choose between RisingWave’s two storage options.

What gets stored (and what doesn’t)

RisingWave persists data for:
  • Tables
  • Materialized views (MVs)
By contrast, a Source is just a connection to an external system and does not store data inside RisingWave. If you want RisingWave to keep a durable copy of ingested data, use a connector-backed table (CREATE TABLE ... WITH (connector=...)). For a practical comparison, see CREATE SOURCE vs. CREATE TABLE and Source, Table, MV, and Sink.

Where the data is persisted

When you create tables or MVs, RisingWave persists their internal state in your configured object store (for example, Amazon S3). Compute nodes may cache hot data locally for performance, but the durable copy is stored in the object store.

Two storage options

RisingWave offers two ways to persist data:
  • Row-based storage (default) via the Hummock storage engine
  • Columnar storage using Apache Iceberg

Quick guide: which one should I use?

Workload / requirementRecommended storage
Low-latency point lookups, “latest state”, frequent updates/deletesRow-based (Hummock)
Streaming pipelines with MVs that need fast incremental maintenanceRow-based (Hummock)
Large scans, long-range analytics, and lakehouse interoperabilityIceberg (columnar)
Need to query the same dataset from other engines (Spark/Trino/etc.)Iceberg (columnar)

Row-based storage (Hummock)

By default, tables and materialized views are stored in row-based storage using Hummock, a storage engine designed for streaming updates. Best for:
  • Serving up-to-date results with low latency.
  • Workloads dominated by point queries and short-range scans.
  • Pipelines with frequent incremental updates (CDC, upserts, streaming aggregations).
Trade-offs:
  • Not optimized for very large full-table scans compared with columnar formats.

Columnar storage (Apache Iceberg)

RisingWave can store analytical datasets in Apache Iceberg, a widely adopted columnar table format in the lakehouse ecosystem. Best for:
  • Analytical queries that scan lots of data (reporting, dashboards, ad-hoc BI).
  • Sharing the same tables with external engines that understand Iceberg.
To learn how Iceberg storage works in RisingWave and how to manage it, see:

Common patterns

  • Retain raw data + compute derived results: Ingest into a table for durable retention, then build MVs for continuously maintained aggregations.
  • Explore first, persist later: Start with CREATE SOURCE for quick exploration; switch to a connector-backed table when you need durability or performance.
  • Hybrid analytics: Keep operational/serving state in Hummock, and keep large analytical datasets (or shared lakehouse tables) in Iceberg.

What’s next?