SQL Optimizations
Leverage predicate pushdown
Predicate pushdown allows filtering operations (predicates) to be applied as early as possible in the query pipeline. This means that instead of retrieving all data and then filtering it, RisingWave filters the data at the storage level. Examples:Use indexes to accelerate your queries
Indexes in RisingWave are used to accelerate batch queries. They are incrementally maintained, similar to materialized views but with minimal computation. Therefore, they are cost-effective to create. We encourage users to detect the patterns in batch queries and create indexes if the pattern occurs frequently or/and a batch query is slow. You can check the Indexes page for more details on how to create and use indexes.Accelerate batch queries with automatic MV selection
Added in v2.8.0.
enable_mv_selection is enabled, the query optimizer automatically rewrites a batch query to scan a matching materialized view instead of the base tables. If the query’s SELECT list, WHERE clause, and GROUP BY match an existing MV (an additional ORDER BY on top is allowed), RisingWave reads the pre-computed result directly instead of rescanning and re-aggregating raw data.
This is most useful for recurring dashboard or reporting queries that are already backed by an incrementally maintained MV.
enable_mv_selection enabled, the following query is automatically rewritten to scan daily_order_totals instead of orders:
EXPLAIN to verify the optimizer chose the MV:
BatchScan on daily_order_totals — confirming the optimizer rewrote the query to read from the MV instead of scanning and aggregating orders.
enable_mv_selection is disabled by default. Enable it per session:
Memory management for batch queries
Spilling to disk
When batch queries process large amounts of data that exceed available memory, RisingWave automatically spills intermediate results to disk to prevent out-of-memory errors. This spilling behavior applies to memory-intensive operations like:- Large hash joins
- Hash aggregations with high cardinality
- Sorting operations on large datasets
RW_BATCH_SPILL_DIR environment variable (default: /tmp/). Spilling trades memory for disk I/O, so queries may slow down when spilling occurs.
Compute memory limits
On serving-only Compute Nodes, the available memory for query execution (compute memory) is calculated as:- Reduce block cache and meta cache sizes in node configuration
- Use query modes (
SET QUERY_MODE TO distributed) to distribute memory load - Break complex queries into smaller steps using materialized views
- Add appropriate indexes to reduce data scanned