As a workaround for HDFS, WebHDFS allows external clients to execute Hadoop file system operations without necessarily running on the Hadoop cluster itself. Therefore, it reduces the dependency on the Hadoop environment when using HDFS.

Syntax

CREATE SINK [ IF NOT EXISTS ] sink_name
[FROM sink_from | AS select_query]
WITH (
   connector='webhdfs',
   connector_parameter = 'value', ...
);

Parameters

Parameter namesDescription
connectorRequired. Support the WebHDFS connector only.
webhdfs.endpointRequired. The endpoint for the WebHDFS service.
webhdfs.pathRequired. The directory where the sink file is located.
typeRequired. Defines the type of the sink. Options include append-only or upsert.

Example

CREATE SINK webhdfs_sink AS SELECT v1
FROM t1
WITH (
    connector='webhdfs',
    webhdfs.path = '<test_path>',
    webhdfs.endpoint = '<test_endpoint>',
    type = 'append-only',
)FORMAT PLAIN ENCODE PARQUET(force_append_only=true);

For more information about encode Parquet or JSON, see Sink data in parquet or json encode.