Version: Next

Source Connector Development

Goal

This page is the practical entry point for developing a SeaTunnel source connector. It does not replace the low-level API design pages. Instead, it helps contributors translate those APIs into an implementation plan.

If you are building a source connector, read this page first, then move to the deeper architecture references linked below.

What a Source Connector Must Do

A source connector must solve four problems:

identify and validate its user-facing options
describe the output schema
read data in batch, streaming, or both
support split assignment and state recovery where parallelism is required

In SeaTunnel, this usually means implementing:

a source factory
a SeaTunnelSource
one or more SourceReader implementations
split and enumerator classes when the source is parallel

Recommended Development Flow

1. Start From the User Contract

Before writing any runtime code, define:

plugin name
required options
optional options
default values
sample job config

If you cannot explain the connector in a minimal config snippet, the implementation is usually not ready either.

Related docs:

2. Implement the Factory

The factory is the user-facing entry of the connector. It should:

expose a stable identifier
define OptionRule
create the source instance

In practice, the factory is also the bridge between docs, runtime validation, REST metadata exposure, and UI-driven config generation.

3. Implement the Source Runtime

For simple sources, a reader may be enough. For scalable or fault-tolerant sources, you also need split and enumerator abstractions.

Typical responsibilities:

SeaTunnelSource: top-level source definition
SourceSplitEnumerator: discover and assign work
SourceReader: read data on workers
serializers: persist split and enumerator state across network transfer and checkpointing

4. Add Packaging and Discovery Metadata

A connector is not complete when the Java code compiles. You also need:

SPI registration
plugin mapping
packaging changes so the connector jar is present in the binary distribution
plugin dependency layout if isolated dependencies are required

5. Document and Test It

A user-visible connector is not considered complete unless:

docs/en and docs/zh are updated
example config matches the code exactly
unit or E2E tests cover the main reading path

Design Checklist

Before implementation, answer these questions:

Is the source bounded, unbounded, or both?
What is the split unit: file, shard, partition, table range, or something else?
How does the reader request more work?
What state is required for recovery?
How is schema discovered or configured?
Does the source emit single-table or multi-table output?
Does the source emit CDC semantics or append-only data?

These answers should drive the class structure, not the other way around.

Typical Class Layout

For a parallel source, the minimum useful structure often looks like this:

connector-<name>/
  src/main/java/.../source/
    <Name>SourceFactory.java
    <Name>Source.java
    <Name>SourceReader.java
    <Name>SourceSplit.java
    <Name>SourceSplitEnumerator.java
    <Name>SourceConfig.java

Depending on complexity, you may also need:

dialect or client abstraction
split serializer
enumerator state class
reader state helper
schema discoverer

Decision Guide

When a Simple Reader Is Enough

Use a simpler design when:

the source is single-threaded by nature
parallelism is not needed
there is no meaningful split model

When You Need Splits and an Enumerator

Use the full split-based model when:

the source can read partitions or ranges in parallel
failover should reassign unfinished work
initial discovery and worker-side reading should be separated

This is the default expectation for scalable database, file, queue, and CDC sources.

Common Source Patterns

File / Object Storage Source

Common split units:

file
block range
partition directory

Typical concerns:

file discovery
schema inference
checkpointing current file position

Database Snapshot Source

Common split units:

primary key range
partition
shard

Typical concerns:

chunk sizing
query pushdown
transaction or consistency boundary

Message Queue Source

Common split units:

topic partition
subscription shard

Typical concerns:

offset management
watermark or event time
dynamic partition discovery

CDC Source

Common split units:

snapshot chunk
incremental log split

Typical concerns:

snapshot to incremental handoff
source metadata
schema evolution

Related docs:

CDC Pipeline Architecture

Testing Strategy

At minimum, test these layers:

option validation
split generation or discovery
reader behavior with normal data
checkpoint or state snapshot behavior
recovery or split reassignment if the connector is parallel

If the source touches an external system, add or extend E2E coverage when possible.

Packaging Checklist

Before opening a PR, verify:

factory registration exists
connector module is included in build and distribution
plugin-mapping.properties is updated when needed
doc examples use the exact runtime plugin name
docs are added in both English and Chinese

Source Connector Development

Goal​

What a Source Connector Must Do​

Recommended Development Flow​

1. Start From the User Contract​

2. Implement the Factory​

3. Implement the Source Runtime​

4. Add Packaging and Discovery Metadata​

5. Document and Test It​

Design Checklist​

Typical Class Layout​

Decision Guide​

When a Simple Reader Is Enough​

When You Need Splits and an Enumerator​

Common Source Patterns​

File / Object Storage Source​

Database Snapshot Source​

Message Queue Source​

CDC Source​

Testing Strategy​

Packaging Checklist​

Recommended Reading Path​

Goal

What a Source Connector Must Do

Recommended Development Flow

1. Start From the User Contract

2. Implement the Factory

3. Implement the Source Runtime

4. Add Packaging and Discovery Metadata

5. Document and Test It

Design Checklist

Typical Class Layout

Decision Guide

When a Simple Reader Is Enough

When You Need Splits and an Enumerator

Common Source Patterns

File / Object Storage Source

Database Snapshot Source

Message Queue Source

CDC Source

Testing Strategy

Packaging Checklist

Recommended Reading Path