Creating vector indexes
Use theCREATE INDEX command with vector-specific syntax to create vector indexes. For more details, see CREATE INDEX.
Syntax
Index types
Before creating a vector index, you may create a sample tableitem to reference the table name and column names. Currently, we only support creating vector indexes on append-only inputs, such as append-only tables or materialized views. Therefore, we have to specify the table as append-only here:
-
FLAT index: Provides exact results by comparing the query vector against all stored vectors.
-
HNSW index: Hierarchical Navigable Small World (HNSW) index that provides approximate nearest neighbor search with better performance for large datasets.
ef_search by setting the session variable batch_hnsw_ef_search (the default value is 40).
Parameters
| Parameter | Description | Valid for |
|---|---|---|
distance_type | Distance metric to use: l2, cosine, l1, or inner_product | FLAT, HNSW |
m | Optional. Maximum number of connections per node | HNSW |
ef_construction | Optional. Size of dynamic candidate list during construction | HNSW |
max_level | Optional. Maximum level of the HNSW graph | HNSW |
Vector distance operators
RisingWave provides specialized operators for calculating vector distances:| Operator | Function | Description |
|---|---|---|
<-> | l2_distance() | Euclidean (L2) distance |
<=> | cosine_distance() | Cosine distance |
<+> | l1_distance() | Manhattan (L1) distance |
<#> | Negative inner product | Negative inner product distance |
Vector similarity search
Use vector distance operators withORDER BY and LIMIT to perform similarity search:
Vector indexes on function expressions
You can create vector indexes on function expressions instead of raw columns. This allows you to avoid storing a separate vector column, saving storage and reducing maintenance costs.- Create the table to include the input column
embedding column is used to store the embedding generated from the description column. If you create the vector index directly from description column with function expression, you don’t have to store raw embedding in the table.
- Define the user-defined function (UDF)
- Create the vector index on the function expression
get_embedding(description) is used as the index expression.
This approach avoids materializing a separate vector column in the table, which reduces storage costs and keeps the table schema simpler.
Streaming vector index lookup
Added in v2.7.0.
Syntax
FOR SYSTEM_TIME AS OF must be set to PROCTIME(). See example.
The same syntax is also supported in ad-hoc queries. In that case, FOR SYSTEM_TIME AS OF is optional, and you may specify a different system time, similar to how time travel queries are defined.
- Currently, vector index lookups are supported only for append-only inputs.
- All columns returned by the lookup from the indexed table must be included in the vector index.
Examples
Basic vector similarity search
Using cosine distance type
The SQL query depends on the type of vector index you created:- If the vector index is built on a raw embedding column, use the raw column in your
ORDER BYclause.
- If the vector index is built using a function expression, use the same function expression in your
ORDER BYclause.
Vector index lookup in streaming sinks
Suppose you have two append-only tables,items and events:
items table:
events table to look up the vector index on items:
events, this query expands the output with a new column. The row embedding in events is used to query the vector index, retrieve a list of the Top-N nearest text values, and store the result in an array column. With these Top-N nearest texts, the related content can be further processed for downstream scenarios such as real-time RAG.