RisingWave Cloud provides detailed alert playbooks to help you quickly diagnose and address issues. Each playbook entry includes the alert’s name, description, common triggers, diagnostic steps, and immediate remediation actions. These alerts are organized by their function categories. This guide is regularly updated to address emerging scenarios.Documentation Index
Fetch the complete documentation index at: https://docs.risingwave.com/llms.txt
Use this file to discover all available pages before exploring further.
Service Health
Out of memory error
A RisingWave container was terminated due to an Out of Memory (OOM) condition in the project. Triggers- Large backfill operations.
- Sudden spikes in workload.
Streaming
Barrier pending for too long
No barrier has been committed in this project for more than 15 minutes. Triggers- Streaming graph bottlenecks. Typical causes include: join amplification, insufficient resources, and suboptimal streaming query (e.g., OverWindow, Joins).
- Compaction write stalls result in longer barrier sync duration.
- Check CPU and Memory utilization for all nodes. If those are maxed out, it suggests there’s insufficient resource.
- Check if there are any creating jobs, which are being backfilled via
SHOW JOBS. Backfilling can induce higher pressure on the cluster.
Sink lag too large
Data for a particular sink has been pending in RisingWave’s internal log store for more than 30 minutes. Triggers- Slow external sink processing.
- Insufficient sink parallelism.
Compaction
Compaction back pressure
Back pressure from compaction detected in your cluster. Triggers Insufficient compaction resource. Diagnosis- Check compaction CPU usage.
- Check the CPU ratio of Compute Nodes (including Streaming and Serving Nodes) and Compactor Nodes.