Symptoms
If you come across the following logs in the meta node, it is likely that you are encountering this particular issue.lease timeout
ERROR risingwave_meta::rpc::election_client: keep alive failed, stopping main loop
Possible causes
The observed issue is most likely a result of ETCD experiencing fluctuations, which can be attributed to either using a low-quality disk for ETCD or sending excessively large requests to it.Solutions
- Check etcd configures, whether
-auto-compaction-mode
,-max-request-bytes
are set properly. - If only one meta node is deployed, you can set the parameter
meta_leader_lease_secs
to86400
to avoid impact on leader election by the disk performance. For multi-node deployment, you can also increase the value of this parameter. - For better performance and stability of the cluster, it is recommended to use higher-performance disks and configure etcd correctly.