Skip to main content
v2.3.13 live / Apache Top-Level Project

High-performance
data integration
for every workload.

A production-grade integration layer for distributed data systems, with fault tolerance, schema evolution, and multi-engine execution across batch, streaming, CDC, multimodal, and AI workloads.

Zeta EngineApache FlinkApache SparkCDC / Exactly-OnceSchema Evolution
9.4k+
GitHub Stars
~200
Data Connectors
3
Execution Engines
2.3k+
GitHub Forks

Architecture Pattern

Not ETL. EtLT.

SeaTunnel handles extract, lightweight transform, and load. Your warehouse or lakehouse can keep the heavy downstream transformation while SeaTunnel moves reliable, structured data between systems.

01Upstream Sources
02EtLT PATTERN
03Downstream Targets
DatabaseOLTP / CDC
MySQLPostgreSQLOracleSQL ServerTiDB
StreamingPub / Sub
Apache KafkaApache PulsarRocketMQRabbitMQ
Files & LakeObject Storage
Amazon S3Alibaba OSSHDFSLocalFile
SeaTunnel
EtLT
EnginesZeta / Flink / Spark
ModesBatch / Streaming / CDC
GuaranteesExactly-once / Schema evolution / Multi-engine scale
Data WarehouseOLAP
ClickHouseApache DorisStarRocksSnowflake
LakehouseOpen Tables
Apache IcebergApache HudiApache PaimonDelta Lake
Search & VectorsRetrieval
ElasticsearchTypesenseMilvusQdrant

Why SeaTunnel

Production-grade from day one.

Real fault tolerance, schema evolution, and multi-engine scale for production pipelines.

01

Schema changes? Handled automatically.

SeaTunnel detects upstream schema changes and propagates them downstream in real time, so teams do not need to pause the pipeline for every column change.

# Before -> After (detected automatically)

-- v1 schema -------------------------
id       BIGINT
name     VARCHAR(255)

-- v2 schema (auto-propagated) ------
id       BIGINT
name     VARCHAR(255)
email    VARCHAR(512)    NEW
phone    VARCHAR(32)     ADDED
02

Real-Time CDC

Low-latency change capture across major databases, with seamless batch-to-stream handoff and no lock-based cutover.

Low latencyBatch to streamNo lock cutover
03

Exactly-Once Semantics

Checkpoint-backed fault tolerance keeps records consistent across failures, retries, and restarts.

Checkpoint backedFailure safeRestart consistent
04

Multi-Engine Support

Use the same pipeline definition on Zeta, Flink, or Spark without rewriting connector logic or delivery paths.

One definitionZeta / Flink / SparkNo connector rewrites

~200 Native Connectors

If your data lives there,
SeaTunnel connects to it.

Native connectors across databases, streams, lakehouses, search systems, and object stores.

OLTP Databases
MySQL / MySQL CDC
PostgreSQL
Oracle
SQL Server
TiDB / MariaDB
+25 via JDBC
Streaming & Messaging
Apache Kafka
Apache Pulsar
RabbitMQ
RocketMQ
AWS SQS
ActiveMQ
OLAP & Analytics
ClickHouse
Apache Doris
StarRocks
Snowflake
Amazon Redshift
Cloudberry
Data Lakes & Storage
Amazon S3 / Hudi
Alibaba OSS
HDFS / LocalFile
Apache Iceberg
Delta Lake
Apache Paimon
MySQLPostgreSQLOracleSQL ServerTiDBMariaDBMongoDBDynamoDBCassandraHBaseNeo4jDB2GreenplumOceanBaseApache KafkaApache PulsarRabbitMQRocketMQActiveMQAWS SQSElasticsearchApache DruidTypesenseClickHouseApache DorisStarRocksSnowflakeCloudberryAmazon RedshiftAmazon S3HDFSAlibaba OSSLocalFileApache IcebergApache HudiDelta LakeApache PaimonApache KuduApache HiveInfluxDBApache IoTDBTDengineRedisAerospikeFTPSFTPHTTPGraphQLGoogle SheetsGoogle FirestoreSlackDingTalkFeishuEmailMaxComputeTableStoreSelectDB CloudMilvusQdrantLanceApache FlussHugeGraphPrometheusSLSSentrySensorsDataWeb3jMySQLPostgreSQLOracleSQL ServerTiDBMariaDBMongoDBDynamoDBCassandraHBaseNeo4jDB2GreenplumOceanBaseApache KafkaApache PulsarRabbitMQRocketMQActiveMQAWS SQSElasticsearchApache DruidTypesenseClickHouseApache DorisStarRocksSnowflakeCloudberryAmazon RedshiftAmazon S3HDFSAlibaba OSSLocalFileApache IcebergApache HudiDelta LakeApache PaimonApache KuduApache HiveInfluxDBApache IoTDBTDengineRedisAerospikeFTPSFTPHTTPGraphQLGoogle SheetsGoogle FirestoreSlackDingTalkFeishuEmailMaxComputeTableStoreSelectDB CloudMilvusQdrantLanceApache FlussHugeGraphPrometheusSLSSentrySensorsDataWeb3j
Also: MongoDB / Redis / Elasticsearch / Neo4j / Cassandra / HBase / Druid / HugeGraph / IoTDB / InfluxDB / DynamoDB / Milvus / Qdrant ...

Simple by design

A config file.
That's all it takes.

Declare your source, transform, and sink in plain config, then deploy on any supported engine without rewriting the pipeline.

mysql-cdc-to-clickhouse.conf

# Real-time: MySQL CDC -> ClickHouse

env {
  parallelism = 4
  job.mode = "STREAMING"
  checkpoint.interval = 10000
}

source {
  MySQL-CDC {
    hostname = "db.prod.internal"
    username = "reader"
    password = "${DB_PASS}"
    database-names = ["orders"]
    table-names = ["orders.events"]
    base-url = "jdbc:mysql://db.prod.internal:3306"
  }
}

transform {
  Sql {
    query = """
      SELECT *, NOW() AS synced_at
      FROM events WHERE status != 'deleted'
    """
  }
}

sink {
  ClickHouse {
    host = "ch.analytics:8123"
    database = "analytics"
    table = events_realtime
    primary_key = ["id"]
  }
}

One pipeline definition.
Run it on Zeta, Flink, or Spark.

SeaTunnel unifies CDC, schema evolution, multimodal movement, and production-grade reliability into one Apache-licensed integration layer.