Dedicated Compute Nodes

To decouple streaming and serving, you need at least one Compute Node with the streaming role (Streaming Node) and one Compute Node with the serving role (Serving Node). When launching a Compute Node, its role can be specified via either the --role command-line argument, or RW_COMPUTE_NODE_ROLE environment variable. You need to restart the node to update the role. A role can be one of:

hybrid: The default role, if not specified. Indicates that the Compute Node is available for both streaming and serving.
serving: Indicates that the Compute Node is read-only and executes batch queries only.
streaming: Indicates that the Compute Node is only available for streaming.

In a production environment, it’s advisable to use separate nodes for batch and streaming operations. The both mode, which allows a node to handle both batch and streaming queries, is more suited for testing scenarios. While it’s possible to execute batch and streaming queries concurrently, it’s recommended to avoid running resource-intensive batch and streaming queries at the same time.

Enable decoupling with Kubernetes operator/Helm

Kubernetes operator
Helm

To enable decoupling of streaming and serving nodes, set spec.enableEmbeddedServingMode to true in your RisingWave custom resource definition:

spec:
  enableEmbeddedServingMode: true

The enableEmbeddedServingMode field is only available in the v0.7.1 or later version of the RisingWave operator.

Apply the changes to your Kubernetes cluster:

kubectl apply -f your-risingwave-custom-resource.yaml

After enabling the embedded serving mode, the frontend component will be transformed into a combination of frontend and serving Compute Node, while the compute component will be dedicated to streaming operations only. This means:

The frontend pods will now handle both frontend tasks and serving (batch) queries.
The compute pods will exclusively handle streaming tasks.
You can scale the frontend and compute components independently based on your serving and streaming workload requirements.

This architecture provides better isolation between streaming and serving workloads while maintaining efficient resource utilization.

To enable decoupling of streaming and serving nodes, set frontendComponent.embeddedServing to true in your RisingWave Helm chart values:

frontendComponent:
  embeddedServing: true

The frontendComponent.embeddedServing field is only available in the 0.1.58 or later version of the RisingWave Helm chart.

Apply the changes to your Kubernetes cluster:

helm upgrade your-risingwave-release-name risingwavelabs/risingwave \
  --namespace your-namespace \
  --values your-values.yaml \
  --reuse-values

The frontend pods will now handle both frontend tasks and serving (batch) queries.
The compute pods will exclusively handle streaming tasks.
You can scale the frontend and compute components independently based on your serving and streaming workload requirements.

This architecture provides better isolation between streaming and serving workloads while maintaining efficient resource utilization.

Configure a Serving Node for batch queries

You can use a TOML configuration file to configure a serving Compute Node (Serving Node). For detailed instructions, see Node-specific configurations. Unlike a general-purpose hybrid Compute Node, a serving Compute Node doesn’t require memory allocation or reservation for shared buffer and operator caches. Instead, it’s more efficient to increase the sizes of the block and meta caches. However, making these caches too large can limit the scope of data that batch queries can execute. Here’s an example configuration for a serving Compute Node with 16GB of memory which you can find in /risingwave/src/config/serving.toml:

[storage]
# Shared buffer is not needed for a serving-only Compute Node.
shared_buffer_capacity_mb = 1

# Compactor is irrelevant to a serving-only Compute Node.
compactor_memory_limit_mb = 1

# Allocate 30% of total memory to block cache: 16GB * 0.3 = 4.8GB
block_cache_capacity_mb = 4800

# Allocate 10% of total memory to meta cache: 16GB * 0.1 = 1.6GB
meta_cache_capacity_mb = 1600

The remaining memory (16GB - 4.8GB - 1.6GB - reserved memory 16GB * 0.3) is used for executing serving queries. We call it “compute memory”. If a batch query is resource-intensive and its runtime memory consumption exceeds the available compute memory, it will terminate itself automatically before triggering an out-of-memory (OOM) error.

Spilling behavior

Before terminating, batch executors will attempt to spill intermediate results to disk when memory pressure is high. Operations that support spilling include:

Hash joins
Hash aggregations
Large sorting operations

Spilling temporarily writes data to the directory specified by RW_BATCH_SPILL_DIR (default: /tmp/). While spilling prevents OOM, it significantly impacts query performance due to disk I/O overhead. If spilling is insufficient to keep memory usage within limits, the query will terminate. To prevent this, either reduce cache sizes to increase compute memory, or distribute the query across multiple nodes using SET QUERY_MODE TO distributed. While we don’t recommend executing OLAP-style batch queries that require a large amount of input data, you can adjust the configuration if such a query is needed and the default configuration leaves too little compute memory. Feel free to allocate less memory for the block cache and meta cache to increase the compute memory.

Get started

Work with data

Install & Operate

Performance

Troubleshooting

Reference

Cloud

Dedicated Compute Nodes

Enable decoupling with Kubernetes operator/Helm

Configure a Serving Node for batch queries

Spilling behavior

Get started

Work with data

Install & Operate

Performance

Troubleshooting

Reference

Cloud

​Enable decoupling with Kubernetes operator/Helm

​Configure a Serving Node for batch queries

​Spilling behavior

Enable decoupling with Kubernetes operator/Helm

Configure a Serving Node for batch queries

Spilling behavior