Core API Design
Why This Page Exists
SeaTunnel already has separate pages for source architecture, sink architecture, catalog metadata, and translation. What is still missing is one page that explains how those APIs fit together as a single design.
This page gives that bridge.
Design Goal
SeaTunnel's core API design has one primary goal:
connector developers should express data integration logic once, while engines handle execution differences underneath.
To make that work, the API layer has to do three things at the same time:
- provide stable contracts for connector authors
- carry enough metadata for validation and planning
- stay independent from Flink, Spark, and Zeta-specific runtime details
The API Stack
At a high level, SeaTunnel's API layer is organized like this:
User Config
|
v
Option / OptionRule / ReadonlyConfig
|
v
Factory Layer
- TableSourceFactory
- TableSinkFactory
- Transform factory
|
v
Runtime Contracts
- SeaTunnelSource
- SeaTunnelSink
- SeaTunnelTransform
|
v
Metadata Contracts
- CatalogTable
- TableSchema
- SeaTunnelDataType
- SchemaChangeEvent
|
v
Translation / Engine Runtime
The important point is that connector logic sits in the middle: high enough to avoid engine coupling, but rich enough to describe real-world data pipelines.
The Five Core API Areas
1. Configuration Contract
Before a connector can run, SeaTunnel needs a stable way to describe user-facing options and validate them.
The configuration contract is centered on:
OptionOptionRuleReadonlyConfig
This part of the API matters because it connects:
- documentation
- runtime validation
- plugin discovery metadata
- UI or REST-driven configuration workflows
Related docs:
2. Source Contract
The source side is responsible for turning an external system into a stream of SeaTunnelRow records, plus schema and state metadata when needed.
Core interfaces:
SeaTunnelSourceSourceSplitEnumeratorSourceReaderSourceSplit
The design separates coordination from execution:
- the enumerator manages discovery and assignment
- the reader performs real data fetching on workers
This makes parallelism, failover, and checkpoint recovery possible without forcing each connector to invent its own execution model.
Related docs:
3. Sink Contract
The sink side turns processed rows into externally visible side effects.
Core interfaces:
SeaTunnelSinkSinkWriterSinkCommitterSinkAggregatedCommitter
The design goal is not only "write data out", but to let a sink clearly define:
- writer-side buffering and prepare logic
- commit coordination
- retry and idempotency behavior
- compatibility with checkpoint-driven recovery
Related docs:
4. Transform Contract
Transforms are the middle layer between source and sink. They operate on SeaTunnel's row and table model rather than on engine-native records.
This gives SeaTunnel a consistent contract for:
- field-level mapping
- filtering
- SQL-like logical projection
- metadata enrichment
- multi-table routing and transformation
The transform contract is what allows a job to remain declarative even when the physical runtime changes underneath.
Related docs:
5. Metadata Contract
Row processing alone is not enough for a serious integration system. SeaTunnel also needs a portable metadata model that can describe:
- table identity
- schema
- types
- constraints
- partition keys
- schema change events
That is the role of:
CatalogTableTableSchemaColumnSeaTunnelDataTypeSchemaChangeEvent
This metadata layer is essential for:
- sink-side schema validation
- multi-table jobs
- schema evolution
- engine-independent planning
Related docs:
How the Pieces Work Together
In a typical SeaTunnel job, the API contracts interact in this order:
- factories parse config and validate options
- source, transform, and sink instances are created
- source publishes
CatalogTablemetadata - transforms preserve or reshape row and schema information
- sink validates or derives the write-side table contract
- translation or native runtime adapts those contracts to the execution engine
This separation is why SeaTunnel can keep connector logic reusable across multiple engines without forcing connector authors to depend on engine-specific APIs.
Why the API Is Split This Way
Separation of User Contract and Runtime Contract
User configuration changes more slowly than internal runtime details. By separating Option and OptionRule from reader and writer logic, SeaTunnel can keep user-facing configuration stable while evolving execution internals.
Separation of Runtime Contract and Metadata Contract
Rows, schema, and schema changes have different lifecycles. A connector may read rows continuously, while metadata changes only occasionally. Keeping those contracts distinct makes the system easier to reason about and extend.
Separation of Logical API and Engine Translation
If connector implementations were written directly against Flink or Spark APIs, every connector would need multiple engine-specific versions. The SeaTunnel API layer avoids that duplication.
Related docs:
Design Questions for Contributors
When adding or reviewing an API-related change, check these questions first:
- Is this change user-facing or only runtime-facing?
- Does it belong in
OptionRule, runtime APIs, or metadata APIs? - Will it affect all engines equally, or only one engine adapter?
- Is the option name stable enough to become public contract?
- Does the change preserve backward compatibility for connector authors and users?
These questions matter because API drift is harder to unwind than implementation drift.
Common Misunderstandings
"Source and sink APIs are enough"
Not really. Without CatalogTable, SeaTunnelDataType, and schema change events, connectors would have no engine-independent way to express table metadata and schema evolution.
"Transform is only row-level mapping"
Not always. In SeaTunnel, transform logic may also need to preserve or reshape metadata, especially in multi-table and CDC pipelines.
"Translation is just an adapter layer"
It is an adapter layer, but it is also a design boundary. It keeps connector authors from depending on engine internals and limits how much engine-specific behavior leaks into connector code.