Job Configuration Guide
SeaTunnel jobs are defined declaratively. Instead of writing code for most integrations, you describe the execution environment, the source side, optional transforms, and the sink side in a configuration file.
This guide explains the structure of a SeaTunnel job, how data flows between plugins, and how to move from the built-in sample to a real pipeline.
Configuration Anatomy
Most SeaTunnel jobs follow the same top-level structure:
env {
parallelism = 1
job.mode = "BATCH"
}
source {
FakeSource {
plugin_output = "fake"
row.num = 16
schema = {
fields {
name = "string"
age = "int"
}
}
}
}
transform {
FieldMapper {
plugin_input = "fake"
plugin_output = "renamed"
field_mapper = {
name = user_name
age = age
}
}
}
sink {
Console {
plugin_input = "renamed"
}
}
At a high level:
envcontrols how the job executessourcedefines where data comes fromtransformchanges data in-flightsinkdefines where data goes
The env Block
The env block contains execution-level settings. Some keys are common across engines, while others are engine-specific.
Common settings usually include:
| Key | Meaning |
|---|---|
job.mode | BATCH or STREAMING |
parallelism | Default parallelism for the job |
job.name | Optional display name for the job |
checkpoint.interval | Checkpoint interval for streaming jobs and exactly-once workflows |
If you use Flink or Spark, engine-specific settings are also configured in env. See JobEnvConfig for the detailed engine parameter rules.
The source Block
The source block describes how SeaTunnel reads data from an external system.
A source usually includes:
- connector name
- connection parameters
- read scope, such as table, topic, path, or query
- schema or format-related parameters
plugin_outputso downstream plugins can refer to this output explicitly
If you use multiple sources in one job, naming each source output clearly will make the pipeline easier to read and maintain.
The transform Block
The transform block is optional. Use it when the data needs to be filtered, renamed, enriched, mapped, or validated before it reaches the sink.
Typical use cases:
- rename or map fields
- filter rows
- convert row kinds
- run SQL transforms
- validate data before writing
SeaTunnel supports going directly from source to sink. If no transform is needed, you can omit this block completely.
The sink Block
The sink block defines how SeaTunnel writes data to the target system.
A sink usually includes:
- connector name
- connection parameters
- target table, topic, or path
- write semantics or batching parameters
plugin_inputpointing to the upstream source or transform output
Different sinks expose different options. Use the relevant connector document for exact option names, defaults, and examples.
plugin_input And plugin_output
These two keys are the most important conventions for understanding how data moves through a SeaTunnel job.
plugin_outputnames the data stream produced by a source or transformplugin_inputtells a transform or sink which upstream stream to consume
This is especially useful when:
- one job reads from multiple sources
- one transform fans out into multiple sinks
- different branches of a job should remain easy to understand
If a job has only one upstream path, SeaTunnel can often follow the default convention without requiring both fields. Explicit naming is still recommended for readability.
Supported Configuration Formats
SeaTunnel supports multiple configuration styles:
- HOCON: the default and most commonly used format
- JSON: useful when configuration is generated by another system
- SQL: useful when expressing jobs in SQL-oriented workflows
Use these references for format-specific details:
Moving From The Sample To A Real Job
The fastest way to build a real job is to replace the sample plugins gradually:
- Keep the
envblock from the sample. - Replace
FakeSourcewith a real source connector. - Replace
Consolewith the target sink connector. - Add transforms only when the source schema and target schema do not align directly.
- Add connector-specific jars or drivers if required.
For example:
- MySQL to Doris
- Kafka to Iceberg
- S3File to StarRocks
- PostgreSQL CDC to Kafka
Validation Checklist
Before running a job, verify these points:
- Java and
JAVA_HOMEare set correctly - required connector plugins are installed
- third-party drivers are present if required
- source credentials and network access are valid
- target table, topic, or path already exists when required
job.modematches the connector capabilities you intend to use
Next Steps
- Need a runnable first example: Quick Start With SeaTunnel Engine
- Need connector parameters: Source Connectors and Sink Connectors
- Need transform capabilities: Transforms
- Need engine-level details: Engine Overview