Redshift
JDBC Redshift Source Connector
Description
Read external data source data through JDBC.
Support those engines
Spark
Flink
Seatunnel Zeta
For Spark/Flink Engine
- You need to ensure that the jdbc driver jar package has been placed in directory
${SEATUNNEL_HOME}/plugins/
.
For SeaTunnel Zeta Engine
- You need to ensure that the jdbc driver jar package has been placed in directory
${SEATUNNEL_HOME}/lib/
.
Key features
supports query SQL and can achieve projection effect.
Supported DataSource list
datasource | supported versions | driver | url | maven |
---|---|---|---|---|
redshift | Different dependency version has different driver class. | com.amazon.redshift.jdbc.Driver | jdbc:redshift://localhost:5439/database | Download |
Database dependency
Please download the support list corresponding to 'Maven' and copy it to the '$SEATNUNNEL_HOME/plugins/jdbc/lib/' working directory
For example Redshift datasource: cp RedshiftJDBC42-xxx.jar $SEATNUNNEL_HOME/plugins/jdbc/lib/
Data Type Mapping
Redshift Data type | Seatunnel Data type |
---|---|
SMALLINT INT2 | SHORT |
INTEGER INT INT4 | INT |
BIGINT INT8 OID | LONG |
DECIMAL NUMERIC | DECIMAL((Get the designated column's specified column size)+1, (Gets the designated column's number of digits to right of the decimal point.))) |
REAL FLOAT4 | FLOAT |
DOUBLE_PRECISION FLOAT8 FLOAT | DOUBLE |
BOOLEAN BOOL | BOOLEAN |
CHAR CHARACTER NCHAR BPCHAR VARCHAR CHARACTER_VARYING NVARCHAR TEXT SUPER | STRING |
VARBYTE BINARY_VARYING | BYTES |
TIME TIME_WITH_TIME_ZONE TIMETZ | LOCALTIME |
TIMESTAMP TIMESTAMP_WITH_OUT_TIME_ZONE TIMESTAMPTZ | LOCALDATETIME |
Example
Simple:
This example queries type_bin 'table' 16 data in your test "database" in single parallel and queries all of its fields. You can also specify which fields to query for final output to the console.
env {
parallelism = 2
job.mode = "BATCH"
}
source{
Jdbc {
url = "jdbc:redshift://localhost:5439/dev"
driver = "com.amazon.redshift.jdbc.Driver"
user = "root"
password = "123456"
table_path = "public.table2"
# Use query filetr rows & columns
query = "select id, name from public.table2 where id > 100"
#split.size = 8096
#split.even-distribution.factor.upper-bound = 100
#split.even-distribution.factor.lower-bound = 0.05
#split.sample-sharding.threshold = 1000
#split.inverse-sampling.rate = 1000
}
}
sink {
Console {}
}
Multiple table read:
Configuring table_list
will turn on auto split, you can configure `split.` to adjust the split strategy*
env {
job.mode = "BATCH"
parallelism = 2
}
source {
Jdbc {
url = "jdbc:redshift://localhost:5439/dev"
driver = "com.amazon.redshift.jdbc.Driver"
user = "root"
password = "123456"
table_list = [
{
table_path = "public.table1"
},
{
table_path = "public.table2"
# Use query filetr rows & columns
query = "select id, name from public.table2 where id > 100"
}
]
#split.size = 8096
#split.even-distribution.factor.upper-bound = 100
#split.even-distribution.factor.lower-bound = 0.05
#split.sample-sharding.threshold = 1000
#split.inverse-sampling.rate = 1000
}
}
sink {
Console {}
}