Version: Next

Paimon

Paimon source connector

Description

Read data from Apache Paimon.

Comparison between Seatunnel and Paimon vsrsion

Seatunnel Version	Paimon Version
2.3.2 - 2.3.3	0.4-SNAPSHOT
2.3.4	0.6-SNAPSHOT
2.3.5 - 2.3.11	0.7.0-incubating
2.3.12	1.1.1

Key Considerations for Upgrading Paimon from `0.7.0-incubating` to `1.1.1`

Backup Recommendations Although compatibility is ensured, it is strongly recommended to backup critical data, especially the metadata directory, before initiating the upgrade.
Gradual Upgrade Process
- Test Environment Validation: First validate the upgrade process in a staging environment.
- Update JAR Files: Replace Paimon JAR files with version 1.1.1.
- Automatic Format Upgrade: The system will automatically detect and upgrade older file formats.
Configuration Check Review your configurations to ensure no deprecated options are in use. While most configurations remain backward-compatible, deprecated settings may require updates.
Post-Upgrade Validation Verify the following after upgrading:
- Read/Write Operations: Ensure data ingestion and retrieval workflows function normally.
- Query Performance: Confirm that query response times meet expectations.
- New Feature Verification: Test all newly introduced features (e.g., time travel, enhanced compaction) to ensure proper functionality.

Note: These steps help minimize risks and ensure a smooth transition to the stable version 1.1.1.

Key features

Options

name	type	required	default value
warehouse	String	Yes	-
catalog_type	String	No	filesystem
catalog_uri	String	No	-
database	String	Yes	-
table	String	Yes	-
hdfs_site_path	String	No	-
query	String	No	-
paimon.hadoop.conf	Map	No	-
paimon.hadoop.conf-path	String	No	-

warehouse [string]

Paimon warehouse path

catalog_type [string]

Catalog type of Paimon, support filesystem and hive

catalog_uri [string]

Catalog uri of Paimon, only needed when catalog_type is hive

database [string]

The database you want to access

table [string]

The table you want to access

hdfs_site_path [string]

The file path of hdfs-site.xml

query [string]

The filter condition of the table read. For example: select * from st_test where id > 100. If not specified, all rows are read. Currently, where conditions only support <, <=, >, >=, =, !=, or, and,is null, is not null, between...and, in, not in, like, and others are not supported. The Having, Group By, Order By clauses are currently unsupported, because these clauses are not supported by Paimon. you can also project specific columns, for example: select id, name from st_test where id > 100. The limit will be supported in the future.

Note: When the field after the where condition is a string or boolean value, its value must be enclosed in single quotes, otherwise an error will be reported. For example: name='abc' or tag='true' The field data types currently supported by where conditions are as follows:

string
boolean
tinyint
smallint
int
bigint
float
double
date
timestamp
time

paimon.hadoop.conf [string]

Properties in hadoop conf

paimon.hadoop.conf-path [string]

The specified loading path for the 'core-site.xml', 'hdfs-site.xml', 'hive-site.xml' files

Filesystems

The Paimon connector supports writing data to multiple file systems. Currently, the supported file systems are hdfs and s3. If you use the s3 filesystem. You can configure the fs.s3a.access-key、fs.s3a.secret-key、fs.s3a.endpoint、fs.s3a.path.style.access、fs.s3a.aws.credentials.provider properties in the paimon.hadoop.conf option. Besides, the warehouse should start with s3a://.

Examples

Simple example

source {
 Paimon {
     warehouse = "/tmp/paimon"
     database = "default"
     table = "st_test"
   }
}

Filter example

source {
  Paimon {
    warehouse = "/tmp/paimon"
    database = "full_type"
    table = "st_test"
    query = "select c_boolean, c_tinyint from st_test where c_boolean= 'true' and c_tinyint > 116 and c_smallint = 15987 or c_decimal='2924137191386439303744.39292213'"
  }
}

S3 example

env {
  execution.parallelism = 1
  job.mode = "BATCH"
}

source {
  Paimon {
    warehouse = "s3a://test/"
    database = "seatunnel_namespace11"
    table = "st_test"
    paimon.hadoop.conf = {
        fs.s3a.access-key=G52pnxg67819khOZ9ezX
        fs.s3a.secret-key=SHJuAQqHsLrgZWikvMa3lJf5T0NfM5LMFliJh9HF
        fs.s3a.endpoint="http://minio4:9000"
        fs.s3a.path.style.access=true
        fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
    }
  }
}

sink {
  Console{}
}

Hadoop conf example

source {
  Paimon {
    catalog_name="seatunnel_test"
    warehouse="hdfs:///tmp/paimon"
    database="seatunnel_namespace1"
    table="st_test"
    query = "select * from st_test where pk_id is not null and pk_id < 3"
    paimon.hadoop.conf = {
      hadoop_user_name = "hdfs"
      fs.defaultFS = "hdfs://nameservice1"
      dfs.nameservices = "nameservice1"
      dfs.ha.namenodes.nameservice1 = "nn1,nn2"
      dfs.namenode.rpc-address.nameservice1.nn1 = "hadoop03:8020"
      dfs.namenode.rpc-address.nameservice1.nn2 = "hadoop04:8020"
      dfs.client.failover.proxy.provider.nameservice1 = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
      dfs.client.use.datanode.hostname = "true"
    }
  }
}

Hive catalog example

source {
  Paimon {
    catalog_name="seatunnel_test"
    catalog_type="hive"
    catalog_uri="thrift://hadoop04:9083"
    warehouse="hdfs:///tmp/seatunnel"
    database="seatunnel_test"
    table="st_test3"
    paimon.hadoop.conf = {
      fs.defaultFS = "hdfs://nameservice1"
      dfs.nameservices = "nameservice1"
      dfs.ha.namenodes.nameservice1 = "nn1,nn2"
      dfs.namenode.rpc-address.nameservice1.nn1 = "hadoop03:8020"
      dfs.namenode.rpc-address.nameservice1.nn2 = "hadoop04:8020"
      dfs.client.failover.proxy.provider.nameservice1 = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
      dfs.client.use.datanode.hostname = "true"
    }
  }
}

Changelog

If you want to read the changelog of the Paimon table, first set the changelog-producer for the Paimon source table and then use the SeaTunnel stream task to read it.

Note

Currently, batch reads are always the latest snapshot read, so to read full changelog data, you need to use stream reads and start stream reads before writing data to the Paimon table, and to ensure order, the parallelism of the stream read task should be set to 1.

Streaming read example

env {
  parallelism = 1
  job.mode = "Streaming"
}

source {
  Paimon {
    warehouse = "/tmp/paimon"
    database = "full_type"
    table = "st_test"
  }
}

sink {
  Paimon {
    warehouse = "/tmp/paimon"
    database = "full_type"
    table = "st_test_sink"
    paimon.table.primary-keys = "c_tinyint"
  }
}

Changelog

Change Log

Change	Commit	Version
[feature][connectors-v2] Support in predicate pushdown in paimon (#9379)	https://github.com/apache/seatunnel/commit/1ec43755d5	dev
[Improve][Connector-V2] Fix the word misspellings for paimon connector (#9332)	https://github.com/apache/seatunnel/commit/ba7f5c9e30	2.3.11
[Feature][Transform] Support define sink column type (#9114)	https://github.com/apache/seatunnel/commit/ab7119e507	2.3.11
[improve] paimon options (#9167)	https://github.com/apache/seatunnel/commit/b0889305c2	2.3.11
[Fix][Paimon] nullable and comment attribute was lost during automatic table creation (#9020)	https://github.com/apache/seatunnel/commit/eb54fdd52c	2.3.11
[Feature][Connector-V2] Support between predicate pushdown in paimon (#8962)	https://github.com/apache/seatunnel/commit/3b141cf621	2.3.10
[Feature][Connector-V2] Suppor Time type in paimon connector (#8880)	https://github.com/apache/seatunnel/commit/9f1e590091	2.3.10
[Feature][Paimon] Customize the hadoop user (#8888)	https://github.com/apache/seatunnel/commit/2657626f93	2.3.10
[Improve][Connector-v2][Paimon]PaimonCatalog close error message update (#8640)	https://github.com/apache/seatunnel/commit/48253da8d6	2.3.10
[Improve] restruct connector common options (#8634)	https://github.com/apache/seatunnel/commit/f3499a6eeb	2.3.10
[Improve][Connector-v2] Support checkpoint in batch mode for paimon sink (#8333)	https://github.com/apache/seatunnel/commit/f22d4ebd4d	2.3.9
[Feature][Connector-v2] Support schema evolution for paimon sink (#8211)	https://github.com/apache/seatunnel/commit/57190e2a3b	2.3.9
[Improve][dist]add shade check rule (#8136)	https://github.com/apache/seatunnel/commit/51ef800016	2.3.9
[Feature][Connector-v2] Support S3 filesystem of paimon connector (#8036)	https://github.com/apache/seatunnel/commit/e2a4772933	2.3.9
[Feature][transform] transform support explode (#7928)	https://github.com/apache/seatunnel/commit/132278c06a	2.3.9
[Feature][Connector-V2] Piamon Sink supports changelog-procuder is lookup and full-compaction mode (#7834)	https://github.com/apache/seatunnel/commit/c0f27c2f76	2.3.9
[Fix][connector-v2]Fix Paimon table connector Error log information. (#7873)	https://github.com/apache/seatunnel/commit/a3b49e6354	2.3.9
[Improve][Connector-v2] Use checkpointId as the commit's identifier instead of the hash for streaming write of paimon sink (#7835)	https://github.com/apache/seatunnel/commit/c7a384af2b	2.3.9
[Feature][Restapi] Allow metrics information to be associated to logical plan nodes (#7786)	https://github.com/apache/seatunnel/commit/6b7c53d03c	2.3.9
[Fix][Connecotr-V2] Fix paimon dynamic bucket tale in primary key is not first (#7728)	https://github.com/apache/seatunnel/commit/dc7f695537	2.3.8
[Improve][Connector-v2] Remove useless code and add changelog doc for paimon sink (#7748)	https://github.com/apache/seatunnel/commit/846d876dc2	2.3.8
[Hotfix][Connector-V2] Release resources even the task is crashed for paimon sink (#7726)	https://github.com/apache/seatunnel/commit/5ddf8d461e	2.3.8
[Fix][Connector-V2] Fix paimon e2e error (#7721)	https://github.com/apache/seatunnel/commit/61d1964361	2.3.8
[Feature][Connector-Paimon] Support dynamic bucket splitting improves Paimon writing efficiency (#7335)	https://github.com/apache/seatunnel/commit/bc0326cba8	2.3.8
[Feature][Connector-v2] Support streaming read for paimon (#7681)	https://github.com/apache/seatunnel/commit/4a2e27291c	2.3.8
[Hotfix][Seatunnel-common] Fix the CommonError msg for paimon sink (#7591)	https://github.com/apache/seatunnel/commit/d1f5db9257	2.3.8
[Feature][CONNECTORS-V2-Paimon] Paimon Sink supported truncate table (#7560)	https://github.com/apache/seatunnel/commit/4f3df22124	2.3.8
[Improve][Connector-v2] Improve the exception msg in case-sensitive case for paimon sink (#7549)	https://github.com/apache/seatunnel/commit/7d31e5668c	2.3.8
[Hotfix][Connector-V2] Fixed lost data precision for decimal data types (#7527)	https://github.com/apache/seatunnel/commit/df210ea73d	2.3.8
[Improve][API] Move catalog open to SaveModeHandler (#7439)	https://github.com/apache/seatunnel/commit/8c2c5c79a1	2.3.8
[Improve][Connector] Add multi-table sink option check (#7360)	https://github.com/apache/seatunnel/commit/2489f6446b	2.3.7
The isNullable attribute is true when the primary key field in the Paimon table converts the Column object. #7231 (#7242)	https://github.com/apache/seatunnel/commit/b0fe432e99	2.3.6
[Feature][Core] Support using upstream table placeholders in sink options and auto replacement (#7131)	https://github.com/apache/seatunnel/commit/c4ca74122c	2.3.6
[Paimon]support projection for paimon source (#6343)	https://github.com/apache/seatunnel/commit/6c1577267f	2.3.6
[Improve][Paimon] Add check for the base type between source and sink before write. (#6953)	https://github.com/apache/seatunnel/commit/d56d64fc04	2.3.6
[Improve][Connector-V2] Improve the paimon source (#6887)	https://github.com/apache/seatunnel/commit/658643ae53	2.3.6
[Hotfix][Connector-V2] Close the tableWrite when task is close (#6897)	https://github.com/apache/seatunnel/commit/23a744b9b2	2.3.6
[Fix][Connector-V2] Field information lost during Paimon DataType and SeaTunnel Column conversion (#6767)	https://github.com/apache/seatunnel/commit/6cf6e41da7	2.3.6
[Improve][Connector-V2] Support hive catalog for paimon sink (#6833)	https://github.com/apache/seatunnel/commit/4969c91dc4	2.3.6
[Hotfix][Connector-V2] Fix the batch write with paimon (#6865)	https://github.com/apache/seatunnel/commit/9ec971d942	2.3.6
[Feature][Doris] Add Doris type converter (#6354)	https://github.com/apache/seatunnel/commit/5189991843	2.3.6
[Improve][Connector-V2] Support hadoop ha and kerberos for paimon sink (#6585)	https://github.com/apache/seatunnel/commit/20b62f3bf3	2.3.5
[Feature][Paimon] Support specify paimon table write properties, partition keys and primary keys (#6535)	https://github.com/apache/seatunnel/commit/2b1234c7ae	2.3.5
[Feature][Connector-V2] Support multi-table sink feature for paimon #5652 (#6449)	https://github.com/apache/seatunnel/commit/b0abbd2d89	2.3.5
[Feature][Connectors-v2-Paimon] Adaptation Paimon 0.6 Version (#6061)	https://github.com/apache/seatunnel/commit/b32df930e9	2.3.4
[Fix][Connectors-v2-Paimon] Flink table store failed to prepare commit (#6057)	https://github.com/apache/seatunnel/commit/c8dcefc3be	2.3.4
[Improve][Common] Introduce new error define rule (#5793)	https://github.com/apache/seatunnel/commit/9d1b2582b2	2.3.4
[Improve] Remove use `SeaTunnelSink::getConsumedType` method and mark it as deprecated (#5755)	https://github.com/apache/seatunnel/commit/8de7408100	2.3.4
[Hotfix][Connector-V2][Paimon] Bump paimon-bundle version to 0.4.0-incubating (#5219)	https://github.com/apache/seatunnel/commit/2917542bfa	2.3.3
[Improve] Documentation and partial word optimization. (#4936)	https://github.com/apache/seatunnel/commit/6e8de0e2a6	2.3.3
[Connector-V2][Paimon] Introduce paimon connector (#4178)	https://github.com/apache/seatunnel/commit/da507bbe0e	2.3.2

Paimon

Description​

Comparison between Seatunnel and Paimon vsrsion​

Key Considerations for Upgrading Paimon from 0.7.0-incubating to 1.1.1​

Key features​

Options​

warehouse [string]​

catalog_type [string]​

catalog_uri [string]​

database [string]​

table [string]​

hdfs_site_path [string]​

query [string]​

paimon.hadoop.conf [string]​

paimon.hadoop.conf-path [string]​

Filesystems​

Examples​

Simple example​

Filter example​

S3 example​

Hadoop conf example​

Hive catalog example​

Changelog​

Note​

Streaming read example​

Changelog​

Description

Comparison between Seatunnel and Paimon vsrsion

Key Considerations for Upgrading Paimon from `0.7.0-incubating` to `1.1.1`

Key features

Options

warehouse [string]

catalog_type [string]

catalog_uri [string]

database [string]

table [string]

hdfs_site_path [string]

query [string]

paimon.hadoop.conf [string]

paimon.hadoop.conf-path [string]

Filesystems

Examples

Simple example

Filter example

S3 example

Hadoop conf example

Hive catalog example

Changelog

Note

Streaming read example

Changelog