End-to-end latency

Streaming applications often prioritize real-time insights, making end-to-end (E2E) latency a critical factor for success. High E2E latency can significantly impact real-time use cases such as financial transactions, anomaly detection, and real-time analytics, where even milliseconds matter. This guide will introduce the key considerations for optimizing E2E latency and discuss how to handle the trade-off between lower latency and higher throughput.

Understanding E2E latency

In RisingWave, E2E latency refers to the total time taken for a data event to be ingested, processed, and delivered to its final destination. It consists of multiple stages:

Source latency: The time taken to ingest new data from upstream sources like Kafka, databases, or object storage.
Processing latency: The time RisingWave takes to transform and compute results on the ingested data.
Sink latency: The time taken to write computed results to a target system (e.g., Kafka, PostgreSQL, or MySQL).

Optimizing each of these stages collectively is essential for achieving low E2E latency.

Key considerations

Below are some key considerations to help you get started with optimizing E2E latency.

Materialized views vs. sink latency

It’s important to distinguish between E2E latency in terms of sinks versus materialized views. Updates to materialized views are not immediately available for batch queries because RisingWave enforces strong consistency for materialized views. As a result, updates must wait for the nearest barrier to be globally committed, with the commit interval controlled by the barrier_interval_ms system parameter.

In contrast, sink outputs are eagerly written to the downstream system without waiting for barriers to commit.

If your application requires latency ≤ 500ms, we recommend using a sink to trigger event-driven business logic downstream, rather than relying on a built-in materialized view with periodic polling. This ensures faster data availability.

Source latency and sink latency

When aiming for very low E2E latency, many users opt for message queues like Kafka as both the source and sink. With proper configuration, these systems can achieve source and sink latencies as low as 1 to 5 milliseconds. Consequently, the primary driver of E2E latency often becomes the processing latency within RisingWave itself.

Query complexity and processing latency

Query complexity has a direct impact on processing latency. The simpler the query and data flow, the lower the processing latency. Computational overhead is largely determined by:

The number and type of operators (e.g., joins, aggregations).
The volume of intermediate results (e.g., join amplification).

Even with optimal configurations, achieving ultra-low latency may not be feasible if a query requires significant computation. Therefore, if your query is computationally intensive, consider rewriting it or adjusting the business goal to better align with the desired latency constraints.

Trade-off between lower latency and higher throughput

Having discussed these various system design choices and factors, the primary factor influencing processing latency is the trade-off between lower latency and higher throughput.

There is an inherent trade-off between lower latency and higher throughput in stream processing systems. Achieving ultra-low latency often requires processing events as soon as they arrive, which can limit batch optimizations and reduce overall system efficiency. Conversely, higher throughput is typically achieved by batching data, reducing per-event overhead but introducing additional delays. The key is to understand how configuration parameters and system design choices impact this trade-off. This section will explore the impact of aggregation operator and parallelism on query latency and throughput.

Aggregation operator configuration

One notable configuration parameter in the aggregation operator is stream_hash_agg_max_dirty_groups_heap_size. It controls how many changes the aggregation operator buffers before consolidating them and sending the results to the next operator in the query plan. The trade-off here is between latency and computation efficiency.

Example

Consider a scenario where we receive a series of updates for key K:

(K, -1) → (K, +1) → (K, -1) → (K, +1)

If the aggregation operator buffers and consolidates these updates before emitting them downstream, the net effect is zero change—no output needs to be sent.

However, if we immediately propagate each individual change as it arrives, the system outputs four updates. This can have significant downstream effects, especially if multiple join operators follow, potentially leading to a large number of intermediate results.

Tuning tip

Lower values of stream_hash_agg_max_dirty_groups_heap_size prioritize low latency, ensuring changes are sent downstream as soon as they arrive. However, the second-order effect is that it can increase the load on downstream operators, potentially slowing them down due to a higher volume of intermediate results.
Higher values of stream_hash_agg_max_dirty_groups_heap_size buffer more updates before consolidating them, which reduces redundant computations and minimizes intermediate results—at the cost of slightly increased latency.

When tuning this parameter, you should consider the overall query structure and whether downstream operations, such as joins, may amplify unnecessary updates if changes are emitted too frequently.

Parallelism configuration

Another trade-off between lower latency and higher throughput lies in the parallelism configuration for each streaming job. To achieve higher throughput, it is natural to increase parallelism and execute the job in a distributed manner. However, higher parallelism does not always lead to lower latency, as it introduces additional overhead.

Example

For example, in a hash join, data from one upstream source must be shuffled to the corresponding hash partitions before being joined with data from the other upstream. If there are many upstream sources, this shuffling stage can add more latency. The impact becomes even more pronounced when a query involves multiple shuffle stages, such as hash joins or aggregations on different keys, where each stage introduces additional delays.

That said, if a query can be executed in an embarrassingly parallel manner where tasks run independently without shuffling, this concern does not apply.

Tuning tip

It is often more efficient to use a single large physical machine for computation rather than multiple smaller machines, as this helps minimize shuffling overhead and reduce network latency.

What’s next

Finally, if you experience higher-than-expected latency, refer to this guide for initial debugging steps. If the issue persists, feel free to join our Slack community, and we will help diagnose it in more detail.

Get started

Working with data

Deploy & operate

Performance

Troubleshooting

Reference

Cloud

Understanding E2E latency

Key considerations

Materialized views vs. sink latency

Source latency and sink latency

Query complexity and processing latency

Trade-off between lower latency and higher throughput

Aggregation operator configuration

Example

Tuning tip

Parallelism configuration

Example

Tuning tip

What’s next

Get started

Working with data

Deploy & operate

Performance

Troubleshooting

Reference

Cloud

​Understanding E2E latency

​Key considerations

​Materialized views vs. sink latency

​Source latency and sink latency

​Query complexity and processing latency

​Trade-off between lower latency and higher throughput

​Aggregation operator configuration

​Example

​Tuning tip

​Parallelism configuration

​Example

​Tuning tip

​What’s next

Understanding E2E latency

Key considerations

Materialized views vs. sink latency

Source latency and sink latency

Query complexity and processing latency

Trade-off between lower latency and higher throughput

Aggregation operator configuration

Example

Tuning tip

Parallelism configuration

Example

Tuning tip

What’s next