Skip to main content
Version: Next

Job Configuration Guide

SeaTunnel jobs are defined declaratively. Instead of writing code for most integrations, you describe the execution environment, the source side, optional transforms, and the sink side in a configuration file.

This guide explains the structure of a SeaTunnel job, how data flows between plugins, and how to move from the built-in sample to a real pipeline.

Configuration Anatomy

Most SeaTunnel jobs follow the same top-level structure:

env {
parallelism = 1
job.mode = "BATCH"
}

source {
FakeSource {
plugin_output = "fake"
row.num = 16
schema = {
fields {
name = "string"
age = "int"
}
}
}
}

transform {
FieldMapper {
plugin_input = "fake"
plugin_output = "renamed"
field_mapper = {
name = user_name
age = age
}
}
}

sink {
Console {
plugin_input = "renamed"
}
}

At a high level:

  • env controls how the job executes
  • source defines where data comes from
  • transform changes data in-flight
  • sink defines where data goes

The env Block

The env block contains execution-level settings. Some keys are common across engines, while others are engine-specific.

Common settings usually include:

KeyMeaning
job.modeBATCH or STREAMING
parallelismDefault parallelism for the job
job.nameOptional display name for the job
checkpoint.intervalCheckpoint interval for streaming jobs and exactly-once workflows

If you use Flink or Spark, engine-specific settings are also configured in env. See JobEnvConfig for the detailed engine parameter rules.

The source Block

The source block describes how SeaTunnel reads data from an external system.

A source usually includes:

  • connector name
  • connection parameters
  • read scope, such as table, topic, path, or query
  • schema or format-related parameters
  • plugin_output so downstream plugins can refer to this output explicitly

If you use multiple sources in one job, naming each source output clearly will make the pipeline easier to read and maintain.

The transform Block

The transform block is optional. Use it when the data needs to be filtered, renamed, enriched, mapped, or validated before it reaches the sink.

Typical use cases:

  • rename or map fields
  • filter rows
  • convert row kinds
  • run SQL transforms
  • validate data before writing

SeaTunnel supports going directly from source to sink. If no transform is needed, you can omit this block completely.

The sink Block

The sink block defines how SeaTunnel writes data to the target system.

A sink usually includes:

  • connector name
  • connection parameters
  • target table, topic, or path
  • write semantics or batching parameters
  • plugin_input pointing to the upstream source or transform output

Different sinks expose different options. Use the relevant connector document for exact option names, defaults, and examples.

plugin_input And plugin_output

These two keys are the most important conventions for understanding how data moves through a SeaTunnel job.

  • plugin_output names the data stream produced by a source or transform
  • plugin_input tells a transform or sink which upstream stream to consume

This is especially useful when:

  • one job reads from multiple sources
  • one transform fans out into multiple sinks
  • different branches of a job should remain easy to understand

If a job has only one upstream path, SeaTunnel can often follow the default convention without requiring both fields. Explicit naming is still recommended for readability.

Supported Configuration Formats

SeaTunnel supports multiple configuration styles:

  • HOCON: the default and most commonly used format
  • JSON: useful when configuration is generated by another system
  • SQL: useful when expressing jobs in SQL-oriented workflows

Use these references for format-specific details:

Moving From The Sample To A Real Job

The fastest way to build a real job is to replace the sample plugins gradually:

  1. Keep the env block from the sample.
  2. Replace FakeSource with a real source connector.
  3. Replace Console with the target sink connector.
  4. Add transforms only when the source schema and target schema do not align directly.
  5. Add connector-specific jars or drivers if required.

For example:

  • MySQL to Doris
  • Kafka to Iceberg
  • S3File to StarRocks
  • PostgreSQL CDC to Kafka

Validation Checklist

Before running a job, verify these points:

  • Java and JAVA_HOME are set correctly
  • required connector plugins are installed
  • third-party drivers are present if required
  • source credentials and network access are valid
  • target table, topic, or path already exists when required
  • job.mode matches the connector capabilities you intend to use

Next Steps