Monitoring progress
RisingWave provides two main ways to monitor CDC progress:- Checking backfill progress (initial snapshot): When you first create a CDC table, RisingWave performs an initial “backfill” or “snapshot” to load existing data from the source table. You can monitor the progress of this backfill.
- Checking real-time replication progress: After the initial backfill, RisingWave continuously streams changes from the source database. You can monitor the progress of this real-time replication.
Checking backfill progress (initial snapshot)
To check the progress of the initial data backfill:-
Identify the internal table: RisingWave creates internal tables to manage the backfill process. Use the
SHOW INTERNAL TABLES;
command to list all internal tables. The relevant table will have a name like__internal_{source_name}_{source_id}_streamcdcscan_{executor_id}
.Look for a table name that includes “streamcdcscan” and corresponds to your source name and ID. -
Query the internal table: Use a
SELECT
statement to query the internal table. The specific columns and their meanings may vary slightly depending on the source, but generally, you’ll find information about:split_id
: An identifier for the backfill task.backfill_finished
: A boolean indicating whether the backfill is complete (t
for true,f
for false).row_count
: The number of rows processed so far.cdc_offset
: Information about the current position in the source database’s change log (e.g., LSN for PostgreSQL, binlog position for MySQL).
This might return something like:This indicates that the backfill is complete (backfill_finished = t
), 4 rows have been processed, and the current LSN (Log Sequence Number) in the PostgreSQL WAL is 4558482960. Example (MySQL):This might return:
Checking real-time replication progress
To monitor the progress of real-time CDC replication:-
Identify the internal table: Use
SHOW INTERNAL TABLES;
to find the internal table associated with your source. This table will have a name like__internal_{source_name}_{source_id}_source_{executor_id}
. -
Query the internal table: Use a
SELECT
statement to query the internal table. This table contains information about the current offset being consumed from the source database. Example (PostgreSQL):This shows the current LSN.Example (MySQL):This might return something like:This output indicates that the CDC process is reading from the binlog filemysql-bin.000003
at position154
. -
Check for replication lag (optional):
You may also want to check the upstream source to confirm the log status.
-
PostgreSQL: You can check for replication lag by querying the
pg_replication_slots
view on your PostgreSQL server. This shows the difference between the current WAL position and the position of each replication slot. A large lag indicates that RisingWave is falling behind. - MySQL: You can check for replication lag on your MySQL Server.
The value ofSeconds_Behind_Master
indicates the replication lag. -
PostgreSQL: You can check for replication lag by querying the
Monitoring parallelized CDC backfill progress
Added in v2.6.0.
backfill.parallelism > 0
), you can monitor the overall progress using rw_cdc_progress
system catalog. It provides a high-level overview of CDC backfill progress:
job_id
: The CDC table’s internal job IDsplit_total_count
: Total number of splits created for this tablesplit_backfilled_count
: Number of splits that have completed snapshot backfillsplit_completed_count
: Number of splits that have completed backfill and synchronized with the latest CDC offset
cdc_offset_low
: The earliest CDC offset encountered during backfillcdc_offset_high
: The latest CDC offset encountered during backfill
RisingWave commits offsets to the upstream database, allowing the database to free up space used by log files (WAL files for PostgreSQL, binlog files for MySQL). This offset commitment happens during checkpoint commits in the CDC source. If there is high checkpoint latency, log files may accumulate on the upstream server.