CREATE SOURCE
and CREATE TABLE ... WITH (connector=...)
. While their syntax is similar, they serve fundamentally different purposes related to data storage and availability.
For a high-level comparison of all core RisingWave objects, see the Source, Table, MV, and Sink guide.
CREATE SOURCE
: A connection without storage
The CREATE SOURCE
command establishes a direct, non-persistent connection to an external data source.
- Data Storage: It does not store the ingested data within RisingWave’s internal storage. It simply defines how to access the data from the external system.
- Use Cases:
- Quick data exploration: Inspecting data from a source without the overhead of storing it.
- Pure streaming pipelines: Building real-time materialized views or sinks where you only need to process data as it arrives, without needing to retain the raw data.
- Ad-hoc queries: Running one-off queries directly against a source like Kafka or S3.
Data ingested via
CREATE SOURCE
is not stored in RisingWave. The persistence of the data depends entirely on the external source’s retention policies.Syntax
Example (Kafka)
This example creates a direct connection to a Kafka topic. You can query it, but the data is not stored in RisingWave.CREATE TABLE
: A connection with persistent storage
The CREATE TABLE
command, when used with a WITH
clause specifying a connector, connects to an external source and continuously ingests its data into RisingWave’s internal storage.
- Data Storage: It stores the data within RisingWave, making it durable and persistently available.
- Use Cases:
- Change Data Capture (CDC): This is mandatory for CDC sources (e.g., Postgres, MySQL) to correctly apply updates and deletes.
- Data Retention: When you need to keep a durable copy of the source data for historical analysis, regardless of the source’s retention policy.
- Improved Query Performance: Querying a local table is generally faster as it avoids network latency to the external system.
- Indexes and Primary Keys: You can create indexes on tables to speed up queries and define primary keys to enforce uniqueness.
Syntax
Example (Kafka)
This creates a table that is continuously populated from a Kafka topic. The data is stored in RisingWave and a primary key is defined.Side-by-side comparison
Feature | CREATE SOURCE | CREATE TABLE ... WITH (connector=...) |
---|---|---|
Data Storage | ❌ No (Data remains in the external source) | ✅ Yes (Data is durably stored in RisingWave) |
Primary Use Case | Ad-hoc queries, pure streaming pipelines | CDC, data retention, performance-sensitive queries |
Required for CDC? | ❌ No (Cannot handle updates/deletes) | ✅ Yes (Mandatory for CDC sources) |
Query Performance | Dependent on the external source and network | Generally faster due to local data access |
Indexes | ❌ Not supported | ✅ Supported |
Primary Keys | Semantic meaning only, no enforcement | ✅ Supported and enforced |