Conceptual comparison: Table, Source, MV, and Sink
To quickly grasp the differences, the following table illustrates how these four objects vary in terms of storage, queryability, updates, and typical use cases.Object | Stores Data | Directly Queryable | Continuously Updated | Typical Use Case |
---|---|---|---|---|
Table | ✅ Stored within RisingWave | ✅ Directly queryable | Supports updates (from inserts or external sources) | Persisting data, querying, serving as input for MVs or Sinks |
Source | ❌ Does not store; only connects to external systems | Depends on the connector (Kafka/S3/Iceberg are queryable, CDC is not) | Reads from upstream in real-time | Data ingress, ad-hoc exploration, building Tables or MVs |
Materialized View (MV) | ✅ Stores query results | ✅ Directly queryable | Refreshes automatically | Real-time aggregation, cleaning, analysis; recommended as Sink input |
Sink | ❌ Does not store; writes results to external systems | ❌ Not directly queryable | Outputs automatically | Writing results back to Kafka, databases, or object storage |

Connector support matrix
A common question from users is, “Can this Source be queried directly? Does a specific CDC connector require creating a table?” The matrix below summarizes the support for mainstream connectors.Connector Type | Direct Source Query | Direct Table Creation | Primary Key Required | Can Create MV | Can Be Sink Input |
---|---|---|---|---|---|
Kafka / Pulsar / Kinesis / NATS / MQTT / PubSub | ✅ | ✅ | ❌ | ✅ | ✅ |
S3 / GCS / Azure Blob | ✅ | ✅ | ❌ | ✅ | ✅ |
PostgreSQL CDC | ❌ (Requires Table) | ✅ | ✅ | ✅ | ✅ |
MySQL CDC | ❌ (Requires Table) | ✅ | ✅ | ✅ | ✅ |
SQL Server CDC | ❌ | ❌ (Must create Source then Table) | ✅ | ✅ | ✅ |
MongoDB CDC | ❌ | ❌ (Must create Source then Table) | ✅ | ✅ | ✅ |
Iceberg | ✅ | ✅ | ❌ (Unless declared for writes) | ✅ | ✅ |
Table: the foundation of storage in RisingWave
A Table in RisingWave is a persistent data object that stores a collection of rows, behaving much like a traditional database table. It can store, query, and update data. There are two ways to use a Table: First, as an internal table where users manually insert data.Source: a lightweight ingress definition
Unlike a Table, a Source only defines how to connect to an external system and does not store any data within RisingWave. For Sources like Kafka, S3, and Iceberg, you can execute SELECT queries directly:CDC scenarios: why a table is mandatory
What makes a CDC (Change Data Capture) stream special is that it includes updates and deletes. RisingWave must rely on a Table to process these events correctly. If a user executesUPDATE users SET name='Bob' WHERE id=1
in PostgreSQL, RisingWave needs to know which row to update. Only a Table, which stores the data’s state and has a primary key, can support this semantic. Furthermore, a CDC stream is a continuous log of changes. If it were just a Source, consistency could not be restored after a task interruption. A Table, on the other hand, can resume from the last checkpoint thanks to its persistence and offset tracking. Finally, a single database transaction might update multiple tables. RisingWave relies on Tables to correctly apply transactional boundaries.
Therefore, for CDC, you must create a table. You can either create a CDC table directly or create a Source first and then derive a Table from it.
Shared Sources and consistency
For CDC scenarios, RisingWave provides a shared source mechanism. A user can create a single CDC Source and then derive multiple Tables from it:
Iceberg: queries, MVs, and time travel
Iceberg is a key table format supported by RisingWave. An Iceberg Source can be queried directly or used to build an MV. For example:S3 and object storage
RisingWave supports reading data from object storage services like S3, GCS, and Azure Blob. Similar to Kafka or Iceberg, you can create a Source for direct querying or create a Table to persistently import the data. However, S3 Sources have a specific constraint: they do not guarantee the order of reads, nor do they guarantee resuming from the same position upon recovery. The system only ensures that every file will eventually be read completely. Therefore, if your application relies on strict ordering, you will need to handle it either in the upstream writing process or downstream consumption logic.Materialized View: the result table of real-time computation
The Materialized View (MV) is the core of real-time computation in RisingWave. It is defined by a SQL query, and the system automatically maintains the result table, refreshing it in real-time as the underlying data updates.Sink: multiple ways to output data
A sink is the egress point of RisingWave, used to write data to external systems. The input for a sink can be:- An existing object such as a source, table, or materialized view, which is defined by the FROM clause in a
CREATE SINK
statement. - A query, which is defined by
CREATE SINK AS SELECT <select_query>
syntax.
CREATE SINK INTO
command. A common use case for this is to union multiple sources and write them into a single table.