This topic explains how to monitor Kafka source consumption progress and lag in RisingWave. Monitoring helps you identify performance bottlenecks and track data freshness.Documentation Index
Fetch the complete documentation index at: https://docs.risingwave.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
RisingWave provides two system catalogs to monitor Kafka consumer lag:rw_kafka_source_metrics(table): Raw Kafka metrics for each source partitionrw_kafka_job_lag(view): Computed lag per job, source, fragment, and partition
System catalog: rw_kafka_source_metrics
The rw_kafka_source_metrics table exposes raw Kafka metrics for each source partition.
Schema
| Column | Type | Description |
|---|---|---|
source_id | int | ID of the Kafka source |
partition_id | varchar | Kafka partition ID |
high_watermark | bigint | Latest offset available in the Kafka topic partition (nullable) |
latest_offset | bigint | Latest offset processed by RisingWave (nullable) |
Example usage
View all Kafka source metrics:System catalog: rw_kafka_job_lag
The rw_kafka_job_lag view provides a summary of Kafka consumption lag, designed to help diagnose consumption issues across materialized views and sinks.
Schema
| Column | Type | Description |
|---|---|---|
job_id | int | ID of the streaming job (materialized view or sink) |
source_id | int | ID of the Kafka source |
fragment_id | int | ID of the fragment processing this partition |
partition_id | varchar | Kafka partition ID |
lag_phase | varchar | Phase of consumption: BACKFILL (processing historical data) or LIVE (processing real-time data) |
high_watermark | bigint | Latest offset available in the Kafka topic partition (nullable) |
consumer_offset | bigint | Current consumer offset — during BACKFILL, this is the backfill progress offset; during LIVE, this is the latest offset processed by RisingWave (nullable) |
lag | bigint | Number of unprocessed messages, calculated as greatest(high_watermark - consumer_offset - 1, 0) (nullable) |
Lag phase
Thelag_phase column indicates the current phase of Kafka consumption:
BACKFILL: The consumer is processing historical data during initial job creation. In this phase, theconsumer_offsetreflects the backfill progress offset.LIVE: The backfill has completed (or no backfill state exists) and the consumer is processing real-time streaming data. In this phase, theconsumer_offsetreflects the latest offset reported by the source reader.
Example usage
Check for jobs with significant consumer lag:Best practices
- Monitor during initial creation: When creating a new materialized view or sink from a Kafka source, monitor the backfill progress to ensure it completes successfully.
- Track lag trends: Regularly query
rw_kafka_job_lagto identify partitions that consistently show high lag, which may indicate performance issues. - Alert on high lag: Set up monitoring alerts when lag exceeds acceptable thresholds for your use case.
- Check lag phase transitions: Monitor when jobs transition from
BACKFILLtoLIVEphase to understand when real-time processing begins.
Troubleshooting high lag
If you observe high consumer lag:- Check resource utilization: High lag may indicate that compute nodes are under-provisioned or overloaded.
- Review parallelism settings: Consider adjusting the parallelism of your streaming jobs to increase processing capacity.
- Inspect Kafka broker health: Verify that Kafka brokers are operating normally and not experiencing performance issues.
- Examine query complexity: Complex transformations in materialized views may slow down consumption.
- Check NULL values: If
high_watermarkorconsumer_offsetis NULL, metrics may not yet be available. This can happen shortly after source creation or if the Prometheus endpoint is not configured.