Quickstart: Create a streaming Iceberg table

This guide shows you the essential steps to build a simple, streaming Iceberg table from scratch with RisingWave. It’s ideal for users who are new to RisingWave and Iceberg and want to quickly understand the process of building an Iceberg-based streaming lakehouse. The core process involves three simple steps:

Create a connection: Tell RisingWave where to store your Iceberg data and metadata.
Create a table: Define your table with the ENGINE = iceberg clause.
Stream and query data: Insert data and query it in real time.

For a complete, runnable demo with a pre-configured environment, please see the full tutorial in our demo repository.

Hands-on Tutorial: Streaming Iceberg Quickstart

This end-to-end tutorial provides a Docker Compose file to instantly set up the environment and includes all the code you need to run the examples below.

Step 1: Create a connection with the hosted catalog

First, you need to tell RisingWave where to store the table files and metadata. For the simplest setup, you can use RisingWave’s built-in hosted catalog, which manages the metadata for you without requiring any external services like AWS Glue or a separate database.

While you can also use external catalogs like AWS Glue or a JDBC database to create native Iceberg tables, this tutorial uses the hosted catalog because it requires no additional setup. For details on all available options, see the Catalogs guide.

CREATE CONNECTION my_iceberg_connection
WITH (
    type                 = 'iceberg',
    warehouse.path       = 's3://icebergdata/demo',
    s3.access.key        = 'minioadmin',
    s3.secret.key        = 'minioadmin',
    s3.endpoint          = 'http://minio:9000',
    hosted_catalog       = 'true'
);

Step 2: Create a native Iceberg table

Next, you create the table using the ENGINE = iceberg clause. This tells RisingWave to store the data in the Iceberg format. You can set a low commit_checkpoint_interval to enable low-latency commits, which is ideal for streaming workloads.

-- Set the connection for the session
SET iceberg_engine_connection = 'public.my_iceberg_connection';

-- Define the streaming table
CREATE TABLE machine_sensors (
  sensor_id   INT PRIMARY KEY,
  temperature DOUBLE,
  reading_ts  TIMESTAMP
)
WITH (commit_checkpoint_interval = 1)  -- For low-latency commits
ENGINE = iceberg;

Step 3: Stream data in and query it

The table is now ready to accept streaming data. You can insert data into the table, and it will be committed to Iceberg in near real-time.

-- Insert data into the table
INSERT INTO machine_sensors
VALUES
  (101, 25.5, NOW()),
  (102, 70.2, NOW());

-- Query the table to verify the commit
SELECT * FROM machine_sensors;

Because the table is stored in the open Iceberg format, you can immediately query it from other engines like Spark or Trino by pointing them to the same warehouse path and catalog.

Next steps

Run the full tutorial: To run these examples in a pre-configured environment, head over to the Streaming Iceberg quickstart demo.
Connect to an existing lake: If you already have Iceberg tables, see the Quickstart: Read from and write to existing Iceberg tables to learn how to connect to them.
Dive deeper: For more detailed information, explore the guides on catalogs and how to create and manage native tables.

Iceberg

Hands-on Tutorial: Streaming Iceberg Quickstart

​Step 1: Create a connection with the hosted catalog

​Step 2: Create a native Iceberg table

​Step 3: Stream data in and query it

​Next steps

Step 1: Create a connection with the hosted catalog

Step 2: Create a native Iceberg table

Step 3: Stream data in and query it

Next steps