Skip to main content
This section outlines best practices for serving optimizations.

SQL Optimizations

Leverage predicate pushdown

Predicate pushdown allows filtering operations (predicates) to be applied as early as possible in the query pipeline. This means that instead of retrieving all data and then filtering it, RisingWave filters the data at the storage level. Examples:
CREATE TABLE t(k1 INT, k2 INT, v1 INT, v2 INT, PRIMARY KEY(k1,k2));

--- The predicate k1=1 is pushed down to the BatchScan.
EXPLAIN SELECT * FROM t WHERE k1=1;
 BatchExchange { order: [], dist: Single }
 └─BatchScan { table: t, columns: [k1, k2, v1, v2], scan_ranges: [k1 = Int32(1)] }

--- The predicate k2=1 cannot be pushed down to the BatchScan.
--- Have to create an index on k2 to push down the predicate.
EXPLAIN SELECT * FROM t WHERE k2=1;
 BatchExchange { order: [], dist: Single }
 └─BatchFilter { predicate: (t.k2 = 1:Int32) }
   └─BatchScan { table: t, columns: [k1, k2, v1, v2] }

Use indexes to accelerate your queries

Indexes in RisingWave are used to accelerate batch queries. They are incrementally maintained, similar to materialized views but with minimal computation. Therefore, they are cost-effective to create. We encourage users to detect the patterns in batch queries and create indexes if the pattern occurs frequently or/and a batch query is slow. You can check the Indexes page for more details on how to create and use indexes.

Accelerate batch queries with automatic MV selection

Added in v2.8.0.
When enable_mv_selection is enabled, the query optimizer automatically rewrites a batch query to scan a matching materialized view instead of the base tables. If the query’s SELECT list, WHERE clause, and GROUP BY match an existing MV (an additional ORDER BY on top is allowed), RisingWave reads the pre-computed result directly instead of rescanning and re-aggregating raw data. This is most useful for recurring dashboard or reporting queries that are already backed by an incrementally maintained MV.
-- A materialized view maintaining daily order totals
CREATE TABLE orders (order_id BIGINT, amount DECIMAL, order_time TIMESTAMPTZ);
CREATE MATERIALIZED VIEW daily_order_totals AS
  SELECT date_trunc('day', order_time) AS day, SUM(amount) AS total
  FROM orders
  GROUP BY 1;
With enable_mv_selection enabled, the following query is automatically rewritten to scan daily_order_totals instead of orders:
SET enable_mv_selection = true;

SELECT date_trunc('day', order_time) AS day, SUM(amount) AS total
FROM orders
GROUP BY 1
ORDER BY 1;
Use EXPLAIN to verify the optimizer chose the MV:
EXPLAIN
SELECT date_trunc('day', order_time) AS day, SUM(amount) AS total
FROM orders
GROUP BY 1
ORDER BY 1;

----
 BatchExchange { order: [daily_order_totals.day ASC], dist: Single }
 └─BatchScan { table: daily_order_totals, columns: [day, total] }
The plan shows a BatchScan on daily_order_totals — confirming the optimizer rewrote the query to read from the MV instead of scanning and aggregating orders. enable_mv_selection is disabled by default. Enable it per session:
SET enable_mv_selection = true;
Currently, MV selection requires an exact structural match between the query and the MV definition. For example, reordering columns in the SELECT list or querying only a subset of the MV’s columns will not trigger a rewrite. Create an MV that precisely mirrors your most frequent query pattern for best results. We are actively working on broadening the matching to cover more cases in future releases.

Memory management for batch queries

Spilling to disk

When batch queries process large amounts of data that exceed available memory, RisingWave automatically spills intermediate results to disk to prevent out-of-memory errors. This spilling behavior applies to memory-intensive operations like:
  • Large hash joins
  • Hash aggregations with high cardinality
  • Sorting operations on large datasets
The spill directory can be configured using the RW_BATCH_SPILL_DIR environment variable (default: /tmp/). Spilling trades memory for disk I/O, so queries may slow down when spilling occurs.

Compute memory limits

On serving-only Compute Nodes, the available memory for query execution (compute memory) is calculated as:
Compute Memory = Total Memory - Block Cache - Meta Cache - Reserved Memory (30%)
If a batch query’s memory consumption approaches this limit and spilling is insufficient, the query will automatically terminate to prevent node-level OOM. To handle larger queries:
  1. Reduce block cache and meta cache sizes in node configuration
  2. Use query modes (SET QUERY_MODE TO distributed) to distribute memory load
  3. Break complex queries into smaller steps using materialized views
  4. Add appropriate indexes to reduce data scanned