Skip to main content
Version: Next

Multi-Table Transform Capability Boundary

Overview

SeaTunnel's multi-table transform feature allows a single transform node to process multiple tables flowing from an upstream source (typically a CDC connector) in one pipeline. This page documents precisely what is supported, what is not, and what alternatives to use when you hit a capability boundary.


1. What Is a Multi-Table Transform?

In a standard single-table pipeline, one Source feeds one Transform chain feeds one Sink. In a multi-table pipeline, a single Source (e.g., MySQL-CDC) emits records from many tables simultaneously, and each downstream Transform or Sink must declare which table(s) it applies to.

MySQL-CDC ──► FieldMapper (orders table)  ──► Kafka Sink (orders topic)

├──► FieldMapper (users table) ──► Kafka Sink (users topic)

└──► (unmatched tables pass through) ──► Elasticsearch Sink

2. Capability Boundary Table

CapabilitySupportedNotes
Per-table field rename / map✅ YesUse FieldMapper with plugin_input and table_match_regex
Per-table column filtering✅ YesUse Filter with plugin_input and table_match_regex
Per-table type casting✅ YesUse FieldMapper with define_sink_type option
Per-table SQL transform (single table)✅ YesUse SQL transform with plugin_input; scope it with table_match_regex when needed
TableMerge — merge multiple tables into one✅ YesTables must share compatible schema
TableRename — rename tables in the stream✅ YesWorks well for routing to Sink by table name
Row-level filtering (filter by rowkind)✅ YesUse FilterRowKind transform
Cross-table SQL JOIN❌ Not supportedSee Section 4 for alternatives
Aggregation across multiple CDC tables❌ Not supportedAggregate downstream in OLAP engine
Generating new tables from a JOIN result❌ Not supportedUse a dedicated SQL engine
Applying one transform to ALL tables wildcard⚠️ PartialTableMerge then single transform; schema must be compatible
Changing schema mid-stream (DDL events)⚠️ LimitedDepends on sink; some sinks handle schema evolution; transforms do not
Nested JSON field extraction per table✅ YesUse JsonPath transform with plugin_input and table_match_regex

3. TableMerge vs SQL Join

These two are the most commonly confused features.

3.1 TableMerge Transform

TableMerge merges the row streams of multiple tables into a single result table. All input tables must have the same (or compatible) schema. Use it to route all tables to one Sink.

{
"plugin_name": "TableMerge",
"plugin_input": ["orders_2023", "orders_2024"],
"plugin_output": "all_orders",
"merge_by_field": true
}

When to use: Fan-in from multiple source tables that have the same structure (e.g., sharded tables, multi-year partitions, or multi-database tables with identical schema).

When NOT to use: When tables have different schemas that you need to correlate or enrich from each other — that requires a JOIN.

3.2 SQL JOIN (not natively supported in multi-table pipelines)

A SQL JOIN correlates rows from two different tables based on a key. SeaTunnel's SQL transform does NOT support cross-table JOIN inside a multi-table streaming pipeline.

Attempting to JOIN records from two upstream tables inside a single SQL transform is not supported and will result in a configuration error.

Recommended alternatives:

  • Write both tables to a shared data lake or warehouse (e.g., Hudi, Iceberg, ClickHouse), then run the JOIN there
  • Use Apache Flink with SeaTunnel's Flink connector for stateful JOIN operations
  • Materialise the "dimension" table into a lookup cache (e.g., Redis, RocksDB) and use a custom transform for enrichment

4. Cross-Source JOIN Limitation

SeaTunnel does not support streaming JOINs where the two input sides come from different sources (e.g., joining MySQL-CDC with a PostgreSQL-CDC stream).

ScenarioSupported
Single-source multi-table pass-through
Single-source TableMerge (same schema)
Cross-source JOIN (MySQL-CDC + PG-CDC)
Cross-source JOIN (CDC + JDBC batch)
Same-source JOIN on two different tables

Workaround: Write both sources to a common sink (Kafka, Iceberg, etc.) and perform the JOIN downstream in a dedicated SQL engine (Flink, Spark, ClickHouse, etc.).


5. Per-Table Transform Configuration Pattern

When you need different transforms for different tables from the same source, declare separate transform blocks that share the same plugin_input and use different table_match_regex rules:

{
"env": {
"job.name": "cdc-multi-table",
"job.mode": "STREAMING"
},
"source": [
{
"plugin_name": "MySQL-CDC",
"plugin_output": "cdc_stream",
"base-url": "jdbc:mysql://localhost:3306/mydb",
"username": "cdc_user",
"password": "password",
"database-names": ["mydb"],
"table-names": ["mydb.orders", "mydb.users", "mydb.products"]
}
],
"transform": [
{
"plugin_name": "FieldMapper",
"plugin_input": ["cdc_stream"],
"plugin_output": "orders_mapped",
"field_mapper": { "order_id": "id", "order_amount": "amount" },
"table_match_regex": "mydb\\.orders"
},
{
"plugin_name": "FieldMapper",
"plugin_input": ["cdc_stream"],
"plugin_output": "users_mapped",
"field_mapper": { "user_id": "id", "user_email": "email" },
"table_match_regex": "mydb\\.users"
}
],
"sink": [
{
"plugin_name": "Kafka",
"plugin_input": ["orders_mapped"],
"topic": "orders"
},
{
"plugin_name": "Kafka",
"plugin_input": ["users_mapped"],
"topic": "users"
},
{
"plugin_name": "Kafka",
"plugin_input": ["cdc_stream"],
"topic": "products",
"table_match_regex": "mydb\\.products"
}
]
}

6. Common Fields Example (Shared Schema)

If multiple tables share a common set of fields, you can use TableMerge to combine them and apply a single transform:

"transform": [
{
"plugin_name": "TableMerge",
"plugin_input": ["cdc_stream"],
"plugin_output": "all_events",
"table_match_regex": "mydb\\.(orders|payments|refunds)"
},
{
"plugin_name": "FieldMapper",
"plugin_input": ["all_events"],
"plugin_output": "all_events_mapped",
"field_mapper": { "created_at": "event_time", "event_type": "type" }
}
]

This works only when all three tables (orders, payments, refunds) share created_at and event_type fields. If schemas differ, TableMerge will fail at runtime.


7. EtLT Patterns with Multi-Table Transform

EtLT (Extract, light-transform, Load, then Transform in the warehouse) is the recommended pattern when SeaTunnel's transform layer cannot fulfil the full transformation requirement:

CDC Source


Light transforms (field rename, type cast, row filter)


Data Lake / Warehouse (Hudi / Iceberg / ClickHouse)


Heavy transforms (JOINs, aggregations, complex SQL)
in dbt / Flink SQL / Spark SQL

Use SeaTunnel's transform layer for:

  • Field rename / filtering
  • Type normalisation
  • Row-level filtering
  • Schema routing (different tables → different topics/tables)

Offload to downstream:

  • Cross-table JOINs
  • Aggregations
  • Pivot / unpivot
  • Enrichment from dimension tables

8. FAQ

Q: Can I apply one transform to ALL tables without specifying each one?

Not directly. Use TableMerge to combine tables with compatible schemas first, then apply a single transform to the merged result. If schemas differ, you must use separate transform blocks, typically with different table_match_regex rules.

Q: Does the SQL transform support JOIN?

No. The SQL transform only supports single-table queries (SELECT, WHERE, expressions). For JOINs, use an external SQL engine after loading the data into a sink.

Q: What happens to tables that are not matched by any transform?

Unmatched tables continue flowing through the pipeline and can be captured by a Sink that uses the right plugin_input and table_match_regex.

Q: Can I add a new table to an existing CDC pipeline without downtime?

This depends on the connector. MySQL-CDC supports dynamic table discovery in some configurations, but adding a transform for a new table requires a pipeline restart. Use stop-with-savepoint to minimise data loss (see REST API v2 Reference).


See Also