Vector indexes

RisingWave supports vector indexes to enable efficient similarity search operations. Vector indexes are specialized data structures that optimize queries involving vector distance calculations.

Creating vector indexes

Use the CREATE INDEX command with vector-specific syntax to create vector indexes. For more details, see CREATE INDEX.

Syntax

CREATE INDEX index_name ON table_name 
USING { FLAT | HNSW } (vector_column | expression) 
[ INCLUDE ( include_column [, ...] ) ]
[ WITH ( option = value [, ...] ) ];

Index types

Before creating a vector index, you may create a sample table item to reference the table name and column names. Currently, we only support creating vector indexes on append-only inputs, such as append-only tables or materialized views. Therefore, we have to specify the table as append-only here:

create table items (id int primary key, name string, embedding vector(128)) append only;

RisingWave supports two methods when creating index:

FLAT index: Provides exact results by comparing the query vector against all stored vectors.

-- Create a FLAT vector index
CREATE INDEX idx_embedding ON items 
USING FLAT (embedding) 
INCLUDE (name) 
WITH (distance_type = 'l2');

HNSW index: Hierarchical Navigable Small World (HNSW) index that provides approximate nearest neighbor search with better performance for large datasets.

-- Create an HNSW vector index
CREATE INDEX idx_embedding_hnsw ON items 
USING HNSW (embedding) 
INCLUDE (name) 
WITH (
    distance_type = 'inner_product', 
    m = 32, 
    ef_construction = 40, 
    max_level = 5
);

For HNSW index, we also support specifying a query parameter ef_search by setting the session variable batch_hnsw_ef_search (the default value is 40).

Parameters

Parameter	Description	Valid for
`distance_type`	Distance metric to use: `l2`, `cosine`, `l1`, or `inner_product`	FLAT, HNSW
`m`	Optional. Maximum number of connections per node	HNSW
`ef_construction`	Optional. Size of dynamic candidate list during construction	HNSW
`max_level`	Optional. Maximum level of the HNSW graph	HNSW

Vector distance operators

RisingWave provides specialized operators for calculating vector distances:

Operator	Function	Description
`<->`	`l2_distance()`	Euclidean (L2) distance
`<=>`	`cosine_distance()`	Cosine distance
`<+>`	`l1_distance()`	Manhattan (L1) distance
`<#>`	Negative inner product	Negative inner product distance

Vector similarity search

Use vector distance operators with ORDER BY and LIMIT to perform similarity search:

-- Find the 5 most similar items using L2 distance
SELECT * FROM items 
ORDER BY embedding <-> '[3,1,2]' 
LIMIT 5;

-- Find similar items using cosine distance
SELECT id, name FROM items 
ORDER BY embedding <=> '[0.5, 0.3, 0.2]' 
LIMIT 10;

Vector indexes on function expressions

You can create vector indexes on function expressions instead of raw columns. This allows you to avoid storing a separate vector column, saving storage and reducing maintenance costs.

Create the table to include the input column

CREATE TABLE items (
    id INT PRIMARY KEY,
    description STRING
    -- embedding column is optional if using function expression
);

The embedding column is used to store the embedding generated from the description column. If you create the vector index directly from description column with function expression, you don’t have to store raw embedding in the table.

Define the user-defined function (UDF)

CREATE FUNCTION get_embedding(string) RETURNS VECTOR(128) LANGUAGE SQL AS $$
SELECT openai_embedding('{"model": <EMBEDDING_MODEL_NAME>, "api_key": <API_KEY>}'::jsonb, $1)::vector(128);
$$;

Create the vector index on the function expression

CREATE INDEX idx_embedding_func ON items
USING FLAT (get_embedding(description))
INCLUDE(description)
WITH (distance_type = 'l2');

In this example, get_embedding(description) is used as the index expression. This approach avoids materializing a separate vector column in the table, which reduces storage costs and keeps the table schema simpler.

Streaming vector index lookup

Added in v2.7.0.

After creating a vector index, you can use array subqueries to define streaming jobs that look up the vector index, and expand the input with an additional array column containing the Top-N nearest rows. The query syntax is as follows:

Syntax

SELECT 
    <input_table_col1>, 
    <input_table_col2>, 
    ..., 
    array(
        SELECT row(<col1>, <col2>, .., <col_n>)
        FROM <indexed_table> FOR SYSTEM_TIME AS OF PROCTIME()
        ORDER BY <input_embedding_col> <=> <indexed_table_embedding_col>
        LIMIT N
    ) AS top_n_related
FROM <input>;

This query expands the input with an extra column. The data type of the expanded column is:

struct<col1, col2, ..., col_n>[]

You can use this query to create a materialized view, sink, or sink into table, thereby defining a streaming job that performs vector index lookups. When creating such streaming jobs, FOR SYSTEM_TIME AS OF must be set to PROCTIME(). See example. The same syntax is also supported in ad-hoc queries. In that case, FOR SYSTEM_TIME AS OF is optional, and you may specify a different system time, similar to how time travel queries are defined.

Currently, vector index lookups are supported only for append-only inputs.
All columns returned by the lookup from the indexed table must be included in the vector index.

Examples

Basic vector similarity search

-- Create table with vector data
CREATE TABLE products (
    id INT PRIMARY KEY,
    name STRING,
    description STRING,
    embedding vector(128)
) APPEND ONLY;

-- Create vector index
CREATE INDEX idx_embedding ON products
USING HNSW (embedding)
WITH (distance_type = 'cosine');

-- Insert sample data
INSERT INTO products (id, name, description, embedding) VALUES
(1, 'Product A', 'Description for Product A', '[0.1, 0.2, ...]'),
(2, 'Product B', 'Description for Product B', '[0.3, 0.4, ...]');

-- Find similar products
SELECT id, name
FROM products
ORDER BY embedding <=> '[0.2, 0.3, ...]'
LIMIT 5;

Using cosine distance type

The SQL query depends on the type of vector index you created:

If the vector index is built on a raw embedding column, use the raw column in your ORDER BY clause.

-- Query on the raw embedding column
SELECT * FROM items
ORDER BY embedding <=> '[0.5, 0.5, 0.0]' 
LIMIT 3;

If the vector index is built using a function expression, use the same function expression in your ORDER BY clause.

-- Query on a function expression
SELECT * FROM items
ORDER BY get_embedding(description) <=> '[1.0, 2.0, 3.0]'
LIMIT 3;

Vector index lookup in streaming sinks

Suppose you have two append-only tables, items and events:

CREATE TABLE items (
    id INT PRIMARY KEY,
    text STRING,
    embedding VECTOR(3)
) APPEND ONLY;

CREATE TABLE events (
    event_id INT PRIMARY KEY,
    time TIMESTAMP,
    embedding VECTOR(3)
);

Create a vector index on the items table:

CREATE INDEX raw_column_index
ON items USING HNSW (embedding)
WITH (distance_type = 'l2');

You can then create a sink to table or external system that uses the embedding from the events table to look up the vector index on items:

CREATE SINK AS
SELECT
    event_id,
    array(
        SELECT row(text)
        FROM items FOR SYSTEM_TIME AS OF PROCTIME()
        ORDER BY events.embedding <-> items.embedding
        LIMIT 3
    ) AS related_info,
    time
FROM events
WITH (...);

For each row in events, this query expands the output with a new column. The row embedding in events is used to query the vector index, retrieve a list of the Top-N nearest text values, and store the result in an array column. With these Top-N nearest texts, the related content can be further processed for downstream scenarios such as real-time RAG.

CREATE INDEX command
Vector data type, operators, functions

Get started

Work with data

Install & Operate

Performance

Troubleshooting

Reference

Cloud

Creating vector indexes

Index types

Parameters

Vector distance operators

Vector similarity search

Vector indexes on function expressions

Streaming vector index lookup

Examples

Basic vector similarity search

Using cosine distance type

Vector index lookup in streaming sinks

Get started

Work with data

Install & Operate

Performance

Troubleshooting

Reference

Cloud

​Creating vector indexes

​Index types

​Parameters

​Vector distance operators

​Vector similarity search

​Vector indexes on function expressions

​Streaming vector index lookup

​Examples

​Basic vector similarity search

​Using cosine distance type

​Vector index lookup in streaming sinks

​Related topics

Creating vector indexes

Index types

Parameters

Vector distance operators

Vector similarity search

Vector indexes on function expressions

Streaming vector index lookup

Examples

Basic vector similarity search

Using cosine distance type

Vector index lookup in streaming sinks

Related topics