Version: 2.3.8

Redshift

JDBC Redshift Source Connector

Description

Read external data source data through JDBC.

Support those engines

Spark
Flink
Seatunnel Zeta

For Spark/Flink Engine

You need to ensure that the jdbc driver jar package has been placed in directory ${SEATUNNEL_HOME}/plugins/.

For SeaTunnel Zeta Engine

You need to ensure that the jdbc driver jar package has been placed in directory ${SEATUNNEL_HOME}/lib/.

Key features

supports query SQL and can achieve projection effect.

Supported DataSource list

datasource	supported versions	driver	url	maven
redshift	Different dependency version has different driver class.	com.amazon.redshift.jdbc.Driver	jdbc:redshift://localhost:5439/database	Download

Database dependency

Please download the support list corresponding to 'Maven' and copy it to the '$SEATNUNNEL_HOME/plugins/jdbc/lib/' working directory
For example Redshift datasource: cp RedshiftJDBC42-xxx.jar $SEATNUNNEL_HOME/plugins/jdbc/lib/

Data Type Mapping

Redshift Data type	Seatunnel Data type
SMALLINT INT2	SHORT
INTEGER INT INT4	INT
BIGINT INT8 OID	LONG
DECIMAL NUMERIC	DECIMAL((Get the designated column's specified column size)+1, (Gets the designated column's number of digits to right of the decimal point.)))
REAL FLOAT4	FLOAT
DOUBLE_PRECISION FLOAT8 FLOAT	DOUBLE
BOOLEAN BOOL	BOOLEAN
CHAR CHARACTER NCHAR BPCHAR VARCHAR CHARACTER_VARYING NVARCHAR TEXT SUPER	STRING
VARBYTE BINARY_VARYING	BYTES
TIME TIME_WITH_TIME_ZONE TIMETZ	LOCALTIME
TIMESTAMP TIMESTAMP_WITH_OUT_TIME_ZONE TIMESTAMPTZ	LOCALDATETIME

Example

Simple:

This example queries type_bin 'table' 16 data in your test "database" in single parallel and queries all of its fields. You can also specify which fields to query for final output to the console.

env {
  parallelism = 2
  job.mode = "BATCH"
}
source{
    Jdbc {
        url = "jdbc:redshift://localhost:5439/dev"
        driver = "com.amazon.redshift.jdbc.Driver"
        user = "root"
        password = "123456"
        
        table_path = "public.table2"
        # Use query filetr rows & columns
        query = "select id, name from public.table2 where id > 100"
        
        #split.size = 8096
        #split.even-distribution.factor.upper-bound = 100
        #split.even-distribution.factor.lower-bound = 0.05
        #split.sample-sharding.threshold = 1000
        #split.inverse-sampling.rate = 1000
    }
}

sink {
    Console {}
}

Multiple table read:

Configuring table_list will turn on auto split, you can configure `split.` to adjust the split strategy*

env {
  job.mode = "BATCH"
  parallelism = 2
}
source {
  Jdbc {
    url = "jdbc:redshift://localhost:5439/dev"
    driver = "com.amazon.redshift.jdbc.Driver"
    user = "root"
    password = "123456"

    table_list = [
      {
        table_path = "public.table1"
      },
      {
        table_path = "public.table2"
        # Use query filetr rows & columns
        query = "select id, name from public.table2 where id > 100"
      }
    ]
    #split.size = 8096
    #split.even-distribution.factor.upper-bound = 100
    #split.even-distribution.factor.lower-bound = 0.05
    #split.sample-sharding.threshold = 1000
    #split.inverse-sampling.rate = 1000
  }
}

sink {
  Console {}
}

Redshift

Description​

Support those engines​

For Spark/Flink Engine​

For SeaTunnel Zeta Engine​

Key features​

Supported DataSource list​

Database dependency​

Data Type Mapping​

Example​

Simple:​

Multiple table read:​

Description

Support those engines

For Spark/Flink Engine

For SeaTunnel Zeta Engine

Key features

Supported DataSource list

Database dependency

Data Type Mapping

Example

Simple:

Multiple table read: