Service Health
Out of memory error
A RisingWave container was terminated due to an Out of Memory (OOM) condition in the project. Triggers- Large backfill operations.
- Sudden spikes in workload.
Streaming
Barrier pending for too long
No barrier has been committed in this project for more than 15 minutes. Triggers- Streaming graph bottlenecks. Typical causes include: join amplification, insufficient resources, and suboptimal streaming query (e.g., OverWindow, Joins).
- Compaction write stalls result in longer barrier sync duration.
- Check CPU and Memory utilization for all nodes. If those are maxed out, it suggests there’s insufficient resource.
- Check if there are any creating jobs, which are being backfilled via
SHOW JOBS. Backfilling can induce higher pressure on the cluster.
Sink lag too large
Data for a particular sink has been pending in RisingWave’s internal log store for more than 30 minutes. Triggers- Slow external sink processing.
- Insufficient sink parallelism.
Compaction
Compaction back pressure
Back pressure from compaction detected in your cluster. Triggers Insufficient compaction resource. Diagnosis- Check compaction CPU usage.
- Check the CPU ratio of Compute Nodes (including Streaming and Serving Nodes) and Compactor Nodes.