Cloudberry
JDBC Cloudberry Source Connector
Support Those Engines
Spark
Flink
SeaTunnel Zeta
Using Dependency
For Spark/Flink Engine
- You need to ensure that the jdbc driver jar package has been placed in directory
${SEATUNNEL_HOME}/plugins/
.
For SeaTunnel Zeta Engine
- You need to ensure that the jdbc driver jar package has been placed in directory
${SEATUNNEL_HOME}/lib/
.
Key Features
supports query SQL and can achieve projection effect.
Description
Read external data source data through JDBC. Cloudberry currently does not have its own native JDBC driver, using PostgreSQL's drivers and implementation.
Supported DataSource Info
Datasource | Supported Versions | Driver | Url | Maven |
---|---|---|---|---|
Cloudberry | Uses PostgreSQL driver implementation | org.postgresql.Driver | jdbc:postgresql://localhost:5432/test | Download |
Database Dependency
Please download the PostgreSQL driver jar and copy it to the '$SEATUNNEL_HOME/plugins/jdbc/lib/' working directory
For example: cp postgresql-xxx.jar $SEATUNNEL_HOME/plugins/jdbc/lib/
Data Type Mapping
Cloudberry uses PostgreSQL's data type implementation. Please refer to PostgreSQL documentation for data type compatibility and mappings.
Options
Cloudberry connector uses the same options as PostgreSQL. For detailed configuration options, please refer to the PostgreSQL documentation.
Key options include:
- url (required): The JDBC connection URL
- driver (required): The driver class name (org.postgresql.Driver)
- user/password: Authentication credentials
- query or table_path: What data to read
- partition options for parallel reading
Parallel Reader
Cloudberry supports parallel reading following the same rules as PostgreSQL connector. For detailed information on split strategies and parallel reading options, please refer to the PostgreSQL connector documentation.
Task Example
Simple:
env {
parallelism = 4
job.mode = "BATCH"
}
source {
Jdbc {
url = "jdbc:postgresql://localhost:5432/cloudberrydb"
driver = "org.postgresql.Driver"
user = "dbadmin"
password = "password"
query = "select * from mytable limit 100"
}
}
sink {
Console {}
}
Parallel reading with table_path:
env {
parallelism = 4
job.mode = "BATCH"
}
source {
Jdbc {
url = "jdbc:postgresql://localhost:5432/cloudberrydb"
driver = "org.postgresql.Driver"
user = "dbadmin"
password = "password"
table_path = "public.mytable"
split.size = 10000
}
}
sink {
Console {}
}
Multiple table read:
env {
job.mode = "BATCH"
parallelism = 4
}
source {
Jdbc {
url = "jdbc:postgresql://localhost:5432/cloudberrydb"
driver = "org.postgresql.Driver"
user = "dbadmin"
password = "password"
"table_list" = [
{
"table_path" = "public.table1"
},
{
"table_path" = "public.table2"
}
]
split.size = 10000
}
}
sink {
Console {}
}
For more detailed examples and configurations, please refer to the PostgreSQL connector documentation.
Changelog
Change Log
Change | Commit | Version |
---|---|---|
[Feature][Connector] Add Apache Cloudberry Support (#8985) | https://github.com/apache/seatunnel/commit/b6f82c1 | dev |