Version: Next

Job Configuration Guide

SeaTunnel jobs are defined declaratively. Instead of writing code for most integrations, you describe the execution environment, the source side, optional transforms, and the sink side in a configuration file.

This guide explains the structure of a SeaTunnel job, how data flows between plugins, and how to move from the built-in sample to a real pipeline.

Configuration Anatomy

Most SeaTunnel jobs follow the same top-level structure:

env {
  parallelism = 1
  job.mode = "BATCH"
}

source {
  FakeSource {
    plugin_output = "fake"
    row.num = 16
    schema = {
      fields {
        name = "string"
        age = "int"
      }
    }
  }
}

transform {
  FieldMapper {
    plugin_input = "fake"
    plugin_output = "renamed"
    field_mapper = {
      name = user_name
      age = age
    }
  }
}

sink {
  Console {
    plugin_input = "renamed"
  }
}

At a high level:

env controls how the job executes
source defines where data comes from
transform changes data in-flight
sink defines where data goes

The `env` Block

The env block contains execution-level settings. Some keys are common across engines, while others are engine-specific.

Common settings usually include:

Key	Meaning
`job.mode`	`BATCH` or `STREAMING`
`parallelism`	Default parallelism for the job
`job.name`	Optional display name for the job
`checkpoint.interval`	Checkpoint interval for streaming jobs and exactly-once workflows

If you use Flink or Spark, engine-specific settings are also configured in env. See JobEnvConfig for the detailed engine parameter rules.

The `source` Block

The source block describes how SeaTunnel reads data from an external system.

A source usually includes:

connector name
connection parameters
read scope, such as table, topic, path, or query
schema or format-related parameters
plugin_output so downstream plugins can refer to this output explicitly

If you use multiple sources in one job, naming each source output clearly will make the pipeline easier to read and maintain.

The `transform` Block

The transform block is optional. Use it when the data needs to be filtered, renamed, enriched, mapped, or validated before it reaches the sink.

Typical use cases:

rename or map fields
filter rows
convert row kinds
run SQL transforms
validate data before writing

SeaTunnel supports going directly from source to sink. If no transform is needed, you can omit this block completely.

The `sink` Block

The sink block defines how SeaTunnel writes data to the target system.

A sink usually includes:

connector name
connection parameters
target table, topic, or path
write semantics or batching parameters
plugin_input pointing to the upstream source or transform output

Different sinks expose different options. Use the relevant connector document for exact option names, defaults, and examples.

`plugin_input` And `plugin_output`

These two keys are the most important conventions for understanding how data moves through a SeaTunnel job.

plugin_output names the data stream produced by a source or transform
plugin_input tells a transform or sink which upstream stream to consume

This is especially useful when:

one job reads from multiple sources
one transform fans out into multiple sinks
different branches of a job should remain easy to understand

If a job has only one upstream path, SeaTunnel can often follow the default convention without requiring both fields. Explicit naming is still recommended for readability.

Supported Configuration Formats

SeaTunnel supports multiple configuration styles:

HOCON: the default and most commonly used format
JSON: useful when configuration is generated by another system
SQL: useful when expressing jobs in SQL-oriented workflows

Use these references for format-specific details:

Moving From The Sample To A Real Job

The fastest way to build a real job is to replace the sample plugins gradually:

Keep the env block from the sample.
Replace FakeSource with a real source connector.
Replace Console with the target sink connector.
Add transforms only when the source schema and target schema do not align directly.
Add connector-specific jars or drivers if required.

For example:

MySQL to Doris
Kafka to Iceberg
S3File to StarRocks
PostgreSQL CDC to Kafka

Validation Checklist

Before running a job, verify these points:

Java and JAVA_HOME are set correctly
required connector plugins are installed
third-party drivers are present if required
source credentials and network access are valid
target table, topic, or path already exists when required
job.mode matches the connector capabilities you intend to use

Next Steps

Need a runnable first example: Quick Start With SeaTunnel Engine
Need end-to-end examples built from real connectors: Scenario Recipes
Need connector parameters: Source Connectors and Sink Connectors
Need transform capabilities: Transforms
Need engine-level details: Engine Overview

Job Configuration Guide

Configuration Anatomy​

The env Block​

The source Block​

The transform Block​

The sink Block​

plugin_input And plugin_output​

Supported Configuration Formats​

Moving From The Sample To A Real Job​

Validation Checklist​

Next Steps​