v2.3.13 live / Apache Top-Level Project

High-performance
data integration
for every workload.

A production-grade integration layer for distributed data systems, with fault tolerance, schema evolution, and multi-engine execution across batch, streaming, CDC, multimodal, and AI workloads.

Quick Start Read the Docs View Connectors

Zeta EngineApache FlinkApache SparkCDC / Exactly-OnceSchema Evolution

9.4k+

GitHub Stars

~200

Data Connectors

Execution Engines

2.3k+

GitHub Forks

Architecture Pattern

Not ETL. EtLT.

SeaTunnel handles extract, lightweight transform, and load. Your warehouse or lakehouse can keep the heavy downstream transformation while SeaTunnel moves reliable, structured data between systems.

01Upstream Sources

02EtLT PATTERN

03Downstream Targets

DatabaseOLTP / CDC

MySQLPostgreSQLOracleSQL ServerTiDB

StreamingPub / Sub

Apache KafkaApache PulsarRocketMQRabbitMQ

Files & LakeObject Storage

Amazon S3Alibaba OSSHDFSLocalFile

SeaTunnel

EtLT

EnginesZeta / Flink / Spark

ModesBatch / Streaming / CDC

GuaranteesExactly-once / Schema evolution / Multi-engine scale

Data WarehouseOLAP

ClickHouseApache DorisStarRocksSnowflake

LakehouseOpen Tables

Apache IcebergApache HudiApache PaimonDelta Lake

Search & VectorsRetrieval

ElasticsearchTypesenseMilvusQdrant

Why SeaTunnel

Production-grade from day one.

Real fault tolerance, schema evolution, and multi-engine scale for production pipelines.

Schema changes? Handled automatically.

SeaTunnel detects upstream schema changes and propagates them downstream in real time, so teams do not need to pause the pipeline for every column change.

# Before -> After (detected automatically)

-- v1 schema -------------------------
id       BIGINT
name     VARCHAR(255)

-- v2 schema (auto-propagated) ------
id       BIGINT
name     VARCHAR(255)
email    VARCHAR(512)    NEW
phone    VARCHAR(32)     ADDED

Real-Time CDC

Low-latency change capture across major databases, with seamless batch-to-stream handoff and no lock-based cutover.

Low latencyBatch to streamNo lock cutover

Exactly-Once Semantics

Checkpoint-backed fault tolerance keeps records consistent across failures, retries, and restarts.

Checkpoint backedFailure safeRestart consistent

Multi-Engine Support

Use the same pipeline definition on Zeta, Flink, or Spark without rewriting connector logic or delivery paths.

One definitionZeta / Flink / SparkNo connector rewrites

~200 Native Connectors

If your data lives there,
SeaTunnel connects to it.

Native connectors across databases, streams, lakehouses, search systems, and object stores.

OLTP Databases

MySQL / MySQL CDC

PostgreSQL

Oracle

SQL Server

TiDB / MariaDB

+25 via JDBC

Streaming & Messaging

Apache Kafka

Apache Pulsar

RabbitMQ

RocketMQ

AWS SQS

ActiveMQ

OLAP & Analytics

ClickHouse

Apache Doris

StarRocks

Snowflake

Amazon Redshift

Cloudberry

Data Lakes & Storage

Amazon S3 / Hudi

Alibaba OSS

HDFS / LocalFile

Apache Iceberg

Delta Lake

Apache Paimon

MySQLPostgreSQLOracleSQL ServerTiDBMariaDBMongoDBDynamoDBCassandraHBaseNeo4jDB2GreenplumOceanBaseApache KafkaApache PulsarRabbitMQRocketMQActiveMQAWS SQSElasticsearchApache DruidTypesenseClickHouseApache DorisStarRocksSnowflakeCloudberryAmazon RedshiftAmazon S3HDFSAlibaba OSSLocalFileApache IcebergApache HudiDelta LakeApache PaimonApache KuduApache HiveInfluxDBApache IoTDBTDengineRedisAerospikeFTPSFTPHTTPGraphQLGoogle SheetsGoogle FirestoreSlackDingTalkFeishuEmailMaxComputeTableStoreSelectDB CloudMilvusQdrantLanceApache FlussHugeGraphPrometheusSLSSentrySensorsDataWeb3jMySQLPostgreSQLOracleSQL ServerTiDBMariaDBMongoDBDynamoDBCassandraHBaseNeo4jDB2GreenplumOceanBaseApache KafkaApache PulsarRabbitMQRocketMQActiveMQAWS SQSElasticsearchApache DruidTypesenseClickHouseApache DorisStarRocksSnowflakeCloudberryAmazon RedshiftAmazon S3HDFSAlibaba OSSLocalFileApache IcebergApache HudiDelta LakeApache PaimonApache KuduApache HiveInfluxDBApache IoTDBTDengineRedisAerospikeFTPSFTPHTTPGraphQLGoogle SheetsGoogle FirestoreSlackDingTalkFeishuEmailMaxComputeTableStoreSelectDB CloudMilvusQdrantLanceApache FlussHugeGraphPrometheusSLSSentrySensorsDataWeb3j

Also: MongoDB / Redis / Elasticsearch / Neo4j / Cassandra / HBase / Druid / HugeGraph / IoTDB / InfluxDB / DynamoDB / Milvus / Qdrant ...

Simple by design

A config file.
That's all it takes.

Declare your source, transform, and sink in plain config, then deploy on any supported engine without rewriting the pipeline.

Read the Docs See Examples


# Real-time: MySQL CDC -> ClickHouse

env {
  parallelism = 4
  job.mode = "STREAMING"
  checkpoint.interval = 10000
}

source {
  MySQL-CDC {
    hostname = "db.prod.internal"
    username = "reader"
    password = "${DB_PASS}"
    database-names = ["orders"]
    table-names = ["orders.events"]
    base-url = "jdbc:mysql://db.prod.internal:3306"
  }
}

transform {
  Sql {
    query = """
      SELECT *, NOW() AS synced_at
      FROM events WHERE status != 'deleted'
    """
  }
}

sink {
  ClickHouse {
    host = "ch.analytics:8123"
    database = "analytics"
    table = events_realtime
    primary_key = ["id"]
  }
}

One pipeline definition.
Run it on Zeta, Flink, or Spark.

SeaTunnel unifies CDC, schema evolution, multimodal movement, and production-grade reliability into one Apache-licensed integration layer.

Quick Start GitHub Slack Access

High-performancedata integrationfor every workload.