Version: Next

Cloudberry

JDBC Cloudberry Source Connector

Support Those Engines

Spark
Flink
SeaTunnel Zeta

Using Dependency

For Spark/Flink Engine

You need to ensure that the jdbc driver jar package has been placed in directory ${SEATUNNEL_HOME}/plugins/.

For SeaTunnel Zeta Engine

You need to ensure that the jdbc driver jar package has been placed in directory ${SEATUNNEL_HOME}/lib/.

Key Features

supports query SQL and can achieve projection effect.

Description

Read external data source data through JDBC. Cloudberry currently does not have its own native JDBC driver, using PostgreSQL's drivers and implementation.

Supported DataSource Info

Datasource	Supported Versions	Driver	Url	Maven
Cloudberry	Uses PostgreSQL driver implementation	org.postgresql.Driver	jdbc:postgresql://localhost:5432/test	Download

Database Dependency

Please download the PostgreSQL driver jar and copy it to the '$SEATUNNEL_HOME/plugins/jdbc/lib/' working directory
For example: cp postgresql-xxx.jar $SEATUNNEL_HOME/plugins/jdbc/lib/

Data Type Mapping

Cloudberry uses PostgreSQL's data type implementation. Please refer to PostgreSQL documentation for data type compatibility and mappings.

Options

Cloudberry connector uses the same options as PostgreSQL. For detailed configuration options, please refer to the PostgreSQL documentation.

Key options include:

url (required): The JDBC connection URL
driver (required): The driver class name (org.postgresql.Driver)
user/password: Authentication credentials
query or table_path: What data to read
partition options for parallel reading

Parallel Reader

Cloudberry supports parallel reading following the same rules as PostgreSQL connector. For detailed information on split strategies and parallel reading options, please refer to the PostgreSQL connector documentation.

Task Example

Simple

env {
  parallelism = 4
  job.mode = "BATCH"
}

source {
  Jdbc {
    url = "jdbc:postgresql://localhost:5432/cloudberrydb"
    driver = "org.postgresql.Driver"
    user = "dbadmin"
    password = "password"
    query = "select * from mytable limit 100"
  }
}

sink {
  Console {}
}

Parallel reading with table_path

env {
  parallelism = 4
  job.mode = "BATCH"
}

source {
  Jdbc {
    url = "jdbc:postgresql://localhost:5432/cloudberrydb"
    driver = "org.postgresql.Driver"
    user = "dbadmin"
    password = "password"
    table_path = "public.mytable"
    split.size = 10000
  }
}

sink {
  Console {}
}

Multiple table read

env {
  job.mode = "BATCH"
  parallelism = 4
}

source {
  Jdbc {
    url = "jdbc:postgresql://localhost:5432/cloudberrydb"
    driver = "org.postgresql.Driver"
    user = "dbadmin"
    password = "password"
    "table_list" = [
      {
        "table_path" = "public.table1"
      },
      {
        "table_path" = "public.table2"
      }
    ]
    split.size = 10000
  }
}

sink {
  Console {}
}

For more detailed examples and configurations, please refer to the PostgreSQL connector documentation.

Changelog

Change Log

Change	Commit	Version
[Feature][Connector] Add Apache Cloudberry Support (#8985)	https://github.com/apache/seatunnel/commit/b6f82c1	dev

Cloudberry

Support Those Engines​

Using Dependency​

For Spark/Flink Engine​

For SeaTunnel Zeta Engine​

Key Features​

Description​

Supported DataSource Info​

Database Dependency​

Data Type Mapping​

Options​

Parallel Reader​

Task Example​

Simple​

Parallel reading with table_path​

Multiple table read​

Changelog​