Changed in v2.1.5, v2.2.4, v2.3.0: Creation of embedded Python UDFs is disabled by default. See UDF configurations for more information.
Define your functions
You can create Python UDFs using theCREATE FUNCTION
command. Refer to the syntax below:
gcd
can be defined as follows:
Create function
CREATE FUNCTION
statement. The function’s parameters and return type must match those declared in the CREATE FUNCTION
statement, otherwise, an error may occur when the function is called.
See the correspondence between SQL types and Python types in the Data type mapping.
Due to the nature of Python, the correctness of the source code cannot be verified when creating a function. It is recommended to make sure your implementation is correct through batch query before using UDFs in materialized views. If an error occurs when executing UDF in materialized views, all output results will be NULL.
Call function
yield
statement. For example, to generate a sequence from 0 to n-1:
Create function
Call function
Create function
Create function
Define your aggregate functions
You can create aggregate functions using the CREATE AGGREGATE command. Refer to the syntax below:create_state() -> state
: Create a new state.accumulate(state, *args) -> state
: Accumulate a new value into the state, returning the updated state.
finish(state) -> value
: Get the result of the aggregate function. If not defined, the state is returned as the result.retract(state, *args) -> state
: Retract a value from the state, returning the updated state. If not defined, the state can not be updated incrementally in materialized views and performance may be affected.
weighted_avg
to calculate the weighted average.
Python UDAF
Limitations
Currently, embedded Python UDFs are only allowed to use the following standard libraries:json
, decimal
, re
, math
, datetime
. Other third-party libraries are not supported. Embedded Python UDFs cannot access external resources, and the following built-in functions are also not allowed: breakpoint
, exit
, eval
, help
, input
, open
, print
.
Data type mapping
The following table shows the data type mapping between SQL and Python:SQL Type | Python Type | Notes |
---|---|---|
BOOLEAN | bool | |
SMALLINT | int | |
INT | int | |
BIGINT | int | |
REAL | float | |
DOUBLE PRECISION | float | |
DECIMAL | decimal.Decimal | |
DATE | datetime.date | Not supported yet. |
TIME | datetime.time | Not supported yet. |
TIMESTAMP | datetime.datetime | Not supported yet. |
TIMESTAMPTZ | datetime.datetime | Not supported yet. |
INTERVAL | MonthDayNano / (int, int, int) | Not supported yet. |
VARCHAR | str | |
BYTEA | bytes | |
JSONB | bool, int, float, list, dict | |
T[] | list[T] | |
STRUCT<> | class or dict | |
…others | Not supported yet. |