What is change data capture?
Change data capture (CDC) is a technique for tracking row-level changes (inserts, updates, deletes) in a database and delivering those changes as a real-time event stream to downstream systems. CDC eliminates the need for batch ETL — instead of periodically scanning entire tables for changes, CDC captures each change as it happens and delivers it immediately.How CDC works in RisingWave
RisingWave provides native CDC connectors that connect directly to source databases without requiring Kafka, Debezium, or any other middleware. This simplifies your architecture and reduces operational overhead.- Performs an initial snapshot of the existing data in the source table.
- Streams ongoing changes in real time using the database’s replication protocol (WAL for PostgreSQL, binlog for MySQL).
- Maintains transactional consistency — changes are applied in the same order as the source database.
- Recovers automatically from failures using checkpoint-based restoration.
Supported CDC sources
| Database | Connector | Replication method |
|---|---|---|
| PostgreSQL | postgres-cdc | Logical replication (WAL) |
| MySQL | mysql-cdc | Binlog replication |
| SQL Server | sqlserver-cdc | SQL Server CDC tables |
| MongoDB | mongodb-cdc | Change streams |
Shared source for multi-table CDC
When you need to replicate multiple tables from the same database, RisingWave supports shared sources to avoid creating a separate replication connection for each table:CDC vs. Kafka-based ingestion
You can also ingest CDC events through Kafka using Debezium or other CDC tools. RisingWave supports Debezium JSON and Maxwell JSON formats from Kafka topics. However, native CDC has significant advantages:| Native CDC | Kafka + Debezium | |
|---|---|---|
| Infrastructure | RisingWave + source database only | Kafka + Debezium + source database |
| Operational complexity | Low — no middleware to manage | High — manage Kafka, Connect, Debezium |
| Latency | Lower — direct connection | Higher — additional hop through Kafka |
| Shared source support | Yes | No (requires separate Kafka topics per table) |
| Multi-table ingestion | Single replication slot | Separate connector per table |
Common CDC use cases
- Real-time analytics: Replicate operational database changes into RisingWave for real-time dashboards and monitoring.
- Streaming ETL: Transform and enrich CDC data with materialized views, then sink results to a data warehouse or data lake.
- Cache invalidation: Track database changes and update caches or search indexes in real time.
- Event-driven architectures: Convert database changes into events for downstream microservices.
- Data synchronization: Keep multiple systems in sync by replicating changes across databases.
Related topics
- CDC with RisingWave — Connector setup guides
- PostgreSQL CDC — PostgreSQL-specific setup
- MySQL CDC — MySQL-specific setup
- Source, Table, MV, and Sink — Why CDC requires tables