Version: Next

Oracle CDC

Oracle CDC source connector

Support Those Engines

SeaTunnel Zeta
Flink

Key features

Description

The Oracle CDC connector allows for reading snapshot data and incremental data from Oracle database. This document describes how to set up the Oracle CDC connector to run SQL queries against Oracle databases.

Notice

The Debezium Oracle connector does not rely on the continuous mining option. The connector is responsible for detecting log switches and adjusting the logs that are mined automatically, which the continuous mining option did for you automatically. So, you can not set this property named log.mining.continuous.mine in the debezium.

Supported DataSource Info

Datasource	Supported versions	Driver	Url	Maven
Oracle	Different dependency version has different driver class.	oracle.jdbc.OracleDriver	jdbc:oracle:thin:@datasource01:1523:xe	https://mvnrepository.com/artifact/com.oracle.database.jdbc/ojdbc8

Database Dependency

Install Jdbc Driver

For Spark/Flink Engine

You need to ensure that the jdbc driver jar package has been placed in directory ${SEATUNNEL_HOME}/plugins/.
To support the i18n character set, copy the orai18n.jar to the $SEATUNNEL_HOME/plugins/ directory.

For SeaTunnel Zeta Engine

You need to ensure that the jdbc driver jar package has been placed in directory ${SEATUNNEL_HOME}/lib/.
To support the i18n character set, copy the orai18n.jar to the $SEATUNNEL_HOME/lib/ directory.

Enable Oracle Logminer

To enable Oracle CDC (Change Data Capture) using Logminer in Seatunnel, which is a built-in tool provided by Oracle, follow the steps below:

Enabling Logminer without CDB (Container Database) mode.

The operating system creates an empty file directory to store Oracle archived logs and user tablespaces.

mkdir -p /opt/oracle/oradata/recovery_area
mkdir -p /opt/oracle/oradata/ORCLCDB
chown -R oracle /opt/oracle/***

sqlplus /nolog;
connect sys as sysdba;
alter system set db_recovery_file_dest_size = 10G;
alter system set db_recovery_file_dest = '/opt/oracle/oradata/recovery_area' scope=spfile;
shutdown immediate;
startup mount;
alter database archivelog;
alter database open;
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
archive log list;

Login as admin and create an account called logminer_user with the password "oracle", and grant it privileges to read tables and logs.

CREATE TABLESPACE logminer_tbs DATAFILE '/opt/oracle/oradata/ORCLCDB/logminer_tbs.dbf' SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED;
CREATE USER logminer_user IDENTIFIED BY oracle DEFAULT TABLESPACE logminer_tbs QUOTA UNLIMITED ON logminer_tbs;

GRANT CREATE SESSION TO logminer_user;
GRANT SELECT ON V_$DATABASE to logminer_user;
GRANT SELECT ON V_$LOG TO logminer_user;
GRANT SELECT ON V_$LOGFILE TO logminer_user;
GRANT SELECT ON V_$LOGMNR_LOGS TO logminer_user;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO logminer_user;
GRANT SELECT ON V_$ARCHIVED_LOG TO logminer_user;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO logminer_user;
GRANT EXECUTE ON DBMS_LOGMNR TO logminer_user;
GRANT EXECUTE ON DBMS_LOGMNR_D TO logminer_user;
GRANT SELECT ANY TRANSACTION TO logminer_user;
GRANT SELECT ON V_$TRANSACTION TO logminer_user;

Oracle 11g is not supported

GRANT LOGMINING TO logminer_user;

Grant privileges only to the tables that need to be collected

GRANT SELECT ANY TABLE TO logminer_user;
GRANT ANALYZE ANY TO logminer_user;

To enable Logminer in Oracle with CDB (Container Database) + PDB (Pluggable Database) mode

The operating system creates an empty file directory to store Oracle archived logs and user tablespaces.

mkdir -p /opt/oracle/oradata/recovery_area
mkdir -p /opt/oracle/oradata/ORCLCDB
mkdir -p /opt/oracle/oradata/ORCLCDB/ORCLPDB1
chown -R oracle /opt/oracle/***

sqlplus /nolog
connect sys as sysdba; # Password: oracle
alter system set db_recovery_file_dest_size = 10G;
alter system set db_recovery_file_dest = '/opt/oracle/oradata/recovery_area' scope=spfile;
shutdown immediate
startup mount
alter database archivelog;
alter database open;
archive log list;

Executing in CDB

ALTER TABLE TEST.* ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
ALTER TABLE TEST.T2 ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

Creating debeziume account

Operating in CDB

sqlplus sys/top_secret@//localhost:1521/ORCLCDB as sysdba
CREATE TABLESPACE logminer_tbs DATAFILE '/opt/oracle/oradata/ORCLCDB/logminer_tbs.dbf'
 SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED;
exit;

Operating in PDB

sqlplus sys/top_secret@//localhost:1521/ORCLPDB1 as sysdba
 CREATE TABLESPACE logminer_tbs DATAFILE '/opt/oracle/oradata/ORCLCDB/ORCLPDB1/logminer_tbs.dbf'
   SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED;
 exit;

Operating in CDB

sqlplus sys/top_secret@//localhost:1521/ORCLCDB as sysdba

CREATE USER c##dbzuser IDENTIFIED BY dbz
DEFAULT TABLESPACE logminer_tbs
QUOTA UNLIMITED ON logminer_tbs
CONTAINER=ALL;

GRANT CREATE SESSION TO c##dbzuser CONTAINER=ALL;
GRANT SET CONTAINER TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$DATABASE to c##dbzuser CONTAINER=ALL;
GRANT FLASHBACK ANY TABLE TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ANY TABLE TO c##dbzuser CONTAINER=ALL;
GRANT SELECT_CATALOG_ROLE TO c##dbzuser CONTAINER=ALL;
GRANT EXECUTE_CATALOG_ROLE TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ANY TRANSACTION TO c##dbzuser CONTAINER=ALL;
GRANT LOGMINING TO c##dbzuser CONTAINER=ALL;

GRANT CREATE TABLE TO c##dbzuser CONTAINER=ALL;
GRANT LOCK ANY TABLE TO c##dbzuser CONTAINER=ALL;
GRANT CREATE SEQUENCE TO c##dbzuser CONTAINER=ALL;

GRANT EXECUTE ON DBMS_LOGMNR TO c##dbzuser CONTAINER=ALL;
GRANT EXECUTE ON DBMS_LOGMNR_D TO c##dbzuser CONTAINER=ALL;

GRANT SELECT ON V_$LOG TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$LOG_HISTORY TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_LOGS TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_PARAMETERS TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$LOGFILE TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVED_LOG TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO c##dbzuser CONTAINER=ALL;
GRANT analyze any TO debeziume_1 CONTAINER=ALL;

exit;

Data Type Mapping

Oracle Data type	SeaTunnel Data type
INTEGER	INT
FLOAT	DECIMAL(38, 18)
NUMBER(precision <= 9, scale == 0)	INT
NUMBER(9 < precision <= 18, scale == 0)	BIGINT
NUMBER(18 < precision, scale == 0)	DECIMAL(38, 0)
NUMBER(precision == 0, scale == 0)	DECIMAL(38, 18)
NUMBER(scale != 0)	DECIMAL(38, 18)
BINARY_DOUBLE	DOUBLE
BINARY_FLOAT REAL	FLOAT
CHAR NCHAR NVARCHAR2 VARCHAR2 LONG ROWID NCLOB CLOB	STRING
DATE	DATE
TIMESTAMP TIMESTAMP WITH LOCAL TIME ZONE	TIMESTAMP
BLOB RAW LONG RAW BFILE	BYTES

Source Options

Name	Type	Required	Default	Description
base-url	String	Yes	-	The URL of the JDBC connection. Refer to a case: `idbc:oracle:thin:datasource01:1523:xe`.
username	String	Yes	-	Name of the database to use when connecting to the database server.
password	String	Yes	-	Password to use when connecting to the database server.
database-names	List	No	-	Database name of the database to monitor.
schema-names	List	No	-	Schema name of the database to monitor.
table-names	List	Yes	-	Table name of the database to monitor. The table name needs to include the database name, for example: `database_name.table_name`
table-names-config	List	No	-	Table config list. for example: [{"table": "db1.schema1.table1","primaryKeys": ["key1"],"snapshotSplitColumn": "key2"}]
startup.mode	Enum	No	INITIAL	Optional startup mode for Oracle CDC consumer, valid enumerations are `initial`, `earliest`, `latest` and `specific`. `initial`: Synchronize historical data at startup, and then synchronize incremental data. `earliest`: Startup from the earliest offset possible. `latest`: Startup from the latest offset. `specific`: Startup from user-supplied specific offsets.
startup.specific-offset.file	String	No	-	Start from the specified binlog file name. Note, This option is required when the `startup.mode` option used `specific`.
startup.specific-offset.pos	Long	No	-	Start from the specified binlog file position. Note, This option is required when the `startup.mode` option used `specific`.
stop.mode	Enum	No	NEVER	Optional stop mode for Oracle CDC consumer, valid enumerations are `never`, `latest` or `specific`. `never`: Real-time job don't stop the source. `latest`: Stop from the latest offset. `specific`: Stop from user-supplied specific offset.
stop.specific-offset.file	String	No	-	Stop from the specified binlog file name. Note, This option is required when the `stop.mode` option used `specific`.
stop.specific-offset.pos	Long	No	-	Stop from the specified binlog file position. Note, This option is required when the `stop.mode` option used `specific`.
snapshot.split.size	Integer	No	8096	The split size (number of rows) of table snapshot, captured tables are split into multiple splits when read the snapshot of table.
snapshot.fetch.size	Integer	No	1024	The maximum fetch size for per poll when read table snapshot.
server-time-zone	String	No	UTC	The session time zone in database server. If not set, then ZoneId.systemDefault() is used to determine the server time zone.
connect.timeout.ms	Duration	No	30000	The maximum time that the connector should wait after trying to connect to the database server before timing out.
connect.max-retries	Integer	No	3	The max retry times that the connector should retry to build database server connection.
connection.pool.size	Integer	No	20	The jdbc connection pool size.
chunk-key.even-distribution.factor.upper-bound	Double	No	100	The upper bound of the chunk key distribution factor. This factor is used to determine whether the table data is evenly distributed. If the distribution factor is calculated to be less than or equal to this upper bound (i.e., (MAX(id) - MIN(id) + 1) / row count), the table chunks would be optimized for even distribution. Otherwise, if the distribution factor is greater, the table will be considered as unevenly distributed and the sampling-based sharding strategy will be used if the estimated shard count exceeds the value specified by `sample-sharding.threshold`. The default value is 100.0.
chunk-key.even-distribution.factor.lower-bound	Double	No	0.05	The lower bound of the chunk key distribution factor. This factor is used to determine whether the table data is evenly distributed. If the distribution factor is calculated to be greater than or equal to this lower bound (i.e., (MAX(id) - MIN(id) + 1) / row count), the table chunks would be optimized for even distribution. Otherwise, if the distribution factor is less, the table will be considered as unevenly distributed and the sampling-based sharding strategy will be used if the estimated shard count exceeds the value specified by `sample-sharding.threshold`. The default value is 0.05.
sample-sharding.threshold	Integer	No	1000	This configuration specifies the threshold of estimated shard count to trigger the sample sharding strategy. When the distribution factor is outside the bounds specified by `chunk-key.even-distribution.factor.upper-bound` and `chunk-key.even-distribution.factor.lower-bound`, and the estimated shard count (calculated as approximate row count / chunk size) exceeds this threshold, the sample sharding strategy will be used. This can help to handle large datasets more efficiently. The default value is 1000 shards.
inverse-sampling.rate	Integer	No	1000	The inverse of the sampling rate used in the sample sharding strategy. For example, if this value is set to 1000, it means a 1/1000 sampling rate is applied during the sampling process. This option provides flexibility in controlling the granularity of the sampling, thus affecting the final number of shards. It's especially useful when dealing with very large datasets where a lower sampling rate is preferred. The default value is 1000.
exactly_once	Boolean	No	false	Enable exactly once semantic.
use_select_count	Boolean	No	false	Use select count for table count rather then other methods in full stage.In this scenario, select count directly is used when it is faster to update statistics using sql from analysis table
skip_analyze	Boolean	No	false	Skip the analysis of table count in full stage.In this scenario, you schedule analysis table sql to update related table statistics periodically or your table data does not change frequently
format	Enum	No	DEFAULT	Optional output format for Oracle CDC, valid enumerations are `DEFAULT`、`COMPATIBLE_DEBEZIUM_JSON`.
schema-changes.enabled	Boolean	No	false	Schema evolution is disabled by default. Now we only support `add column`、`drop column`、`rename column` and `modify column`.
debezium	Config	No	-	Pass-through Debezium's properties to Debezium Embedded Engine which is used to capture data changes from Oracle server.
common-options		no	-	Source plugin common parameters, please refer to Source Common Options for details
decimal_type_narrowing	Boolean	No	true	Decimal type narrowing, if true, the decimal type will be narrowed to the int or long type if without loss of precision. Only support for Oracle at now. Please refer to `decimal_type_narrowing` below

decimal_type_narrowing

Decimal type narrowing, if true, the decimal type will be narrowed to the int or long type if without loss of precision. Only support for Oracle at now.

eg:

decimal_type_narrowing = true

Oracle	SeaTunnel
NUMBER(1, 0)	Boolean
NUMBER(6, 0)	INT
NUMBER(10, 0)	BIGINT

decimal_type_narrowing = false

Oracle	SeaTunnel
NUMBER(1, 0)	Decimal(1, 0)
NUMBER(6, 0)	Decimal(6, 0)
NUMBER(10, 0)	Decimal(10, 0)

Task Example

Simple

Support multi-table reading

source {
  # This is a example source plugin **only for test and demonstrate the feature source plugin**
  Oracle-CDC {
    plugin_output = "customers"
    username = "system"
    password = "oracle"
    database-names = ["XE"]
    schema-names = ["DEBEZIUM"]
    table-names = ["XE.DEBEZIUM.FULL_TYPES", "XE.DEBEZIUM.FULL_TYPES2"]
    base-url = "jdbc:oracle:thin:@oracle-host:1521:xe"
    source.reader.close.timeout = 120000
  }
}

Use the select count(*) instead of analysis table for count table rows in full stage

source {
# This is a example source plugin **only for test and demonstrate the feature source plugin**
  Oracle-CDC {
    plugin_output = "customers"
    use_select_count = true 
    username = "system"
    password = "oracle"
    database-names = ["XE"]
    schema-names = ["DEBEZIUM"]
    table-names = ["XE.DEBEZIUM.FULL_TYPES"]
    base-url = "jdbc:oracle:thin:system/oracle@oracle-host:1521:xe"
    source.reader.close.timeout = 120000
  }
}

Use the select NUM_ROWS from all_tables for the table rows but skip the analyze table.

source {
# This is a example source plugin **only for test and demonstrate the feature source plugin**
  Oracle-CDC {
    plugin_output = "customers"
    skip_analyze = true 
    username = "system"
    password = "oracle"
    database-names = ["XE"]
    schema-names = ["DEBEZIUM"]
    table-names = ["XE.DEBEZIUM.FULL_TYPES"]
    base-url = "jdbc:oracle:thin:system/oracle@oracle-host:1521:xe"
    source.reader.close.timeout = 120000
  }
}

Support custom primary key for table

source {
  Oracle-CDC {
    plugin_output = "customers"
    base-url = "jdbc:oracle:thin:system/oracle@oracle-host:1521:xe"
    source.reader.close.timeout = 120000
    username = "system"
    password = "oracle"
    database-names = ["XE"]
    schema-names = ["DEBEZIUM"]
    table-names = ["XE.DEBEZIUM.FULL_TYPES"]
    table-names-config = [
      {
        table = "XE.DEBEZIUM.FULL_TYPES"
        primaryKeys = ["ID"]
      }
    ]
  }
}

Support debezium-compatible format send to kafka

Must be used with kafka connector sink, see compatible debezium format for details

Changelog

Change Log

Change	Commit	Version
[Improve][Oracle-CDC] Remove duplicate load table names (#9357)	https://github.com/apache/seatunnel/commit/90e88cafc5	dev
[Feature][Connector-JDBC] Supprot read Oracle BLOB data as string instead of bytes (#9305)	https://github.com/apache/seatunnel/commit/454a88f81a	2.3.11
[Improve][CDC] Filter ddl for snapshot phase (#8911)	https://github.com/apache/seatunnel/commit/641cc72f2f	2.3.10
[Improve][Oracle-CDC] Support ReadOnlyLogWriterFlushStrategy (#8912)	https://github.com/apache/seatunnel/commit/6aebdc0384	2.3.10
[Improve][CDC] Extract duplicate code (#8906)	https://github.com/apache/seatunnel/commit/b922bb90e6	2.3.10
[Improve] restruct connector common options (#8634)	https://github.com/apache/seatunnel/commit/f3499a6eeb	2.3.10
[hotfix][connector-cdc-oracle ] support read partition table (#8265)	https://github.com/apache/seatunnel/commit/91b86b2faf	2.3.9
[Improve][E2E] improve oracle e2e (#8292)	https://github.com/apache/seatunnel/commit/9f761b9d32	2.3.9
[Feature][CDC] Add 'schema-changes.enabled' options (#8285)	https://github.com/apache/seatunnel/commit/8e29ecf54f	2.3.9
Revert "[Feature][Redis] Flush data when the time reaches checkpoint interval" and "[Feature][CDC] Add 'schema-changes.enabled' options" (#8278)	https://github.com/apache/seatunnel/commit/fcb2938286	2.3.9
[Feature][CDC] Add 'schema-changes.enabled' options (#8252)	https://github.com/apache/seatunnel/commit/d783f9447c	2.3.9
[Improve][dist]add shade check rule (#8136)	https://github.com/apache/seatunnel/commit/51ef800016	2.3.9
[Feature][Connector-V2]Jdbc chunk split add snapshotSplitColumn config #7794 (#7840)	https://github.com/apache/seatunnel/commit/b6c6dc0438	2.3.9
[Feature][Core] Support cdc task ddl restore for zeta (#7463)	https://github.com/apache/seatunnel/commit/8e322281ed	2.3.9
[Feature][Connector-v2] Support schema evolution for Oracle connector (#7908)	https://github.com/apache/seatunnel/commit/79406bcc2f	2.3.9
[Hotfix][CDC] Fix package name spelling mistake (#7415)	https://github.com/apache/seatunnel/commit/469112fa64	2.3.8
[Improve][Connector-v2] Optimize the count table rows for jdbc-oracle and oracle-cdc (#7248)	https://github.com/apache/seatunnel/commit/0d08b20061	2.3.6
[Improve][CDC] Bump the version of debezium to 1.9.8.Final (#6740)	https://github.com/apache/seatunnel/commit/c3ac953524	2.3.6
[Improve][CDC] Close idle subtasks gorup(reader/writer) in increment phase (#6526)	https://github.com/apache/seatunnel/commit/454c339b9c	2.3.6
[Improve][JDBC Source] Fix Split can not be cancel (#6825)	https://github.com/apache/seatunnel/commit/ee3b7c3723	2.3.6
[Fix] Fix ConnectorSpecificationCheckTest failed (#6828)	https://github.com/apache/seatunnel/commit/52d1020eb7	2.3.6
[Hotfix][Jdbc/CDC] Fix postgresql uuid type in jdbc read (#6684)	https://github.com/apache/seatunnel/commit/868ba4d7c7	2.3.6
[Improve] Improve read table schema in cdc connector (#6702)	https://github.com/apache/seatunnel/commit/a8c6cc6e0c	2.3.6
[Improve][Jdbc] Add quote identifier for sql (#6669)	https://github.com/apache/seatunnel/commit/849d748d3d	2.3.5
[Improve][CDC] Optimize split state memory allocation in increment phase (#6554)	https://github.com/apache/seatunnel/commit/fe33422161	2.3.5
[Improve][CDC-Connector]Fix CDC option rule. (#6454)	https://github.com/apache/seatunnel/commit/1ea27afa87	2.3.5
[Improve][CDC] Optimize memory allocation for snapshot split reading (#6281)	https://github.com/apache/seatunnel/commit/4856645837	2.3.5
[Improve][API] Unify type system api(data & type) (#5872)	https://github.com/apache/seatunnel/commit/b38c7edcc9	2.3.5
[Fix][Oracle-CDC] Fix invalid split key when no primary key (#6251)	https://github.com/apache/seatunnel/commit/b83c40a6f6	2.3.4
[Feature][Oracle-CDC] Support custom table primary key (#6216)	https://github.com/apache/seatunnel/commit/ae4240ca6b	2.3.4
[Improve][Oracle-CDC] Clean unused code (#6212)	https://github.com/apache/seatunnel/commit/919a91032a	2.3.4
[Hotfix][Oracle-CDC] Fix state recovery error when switching a single table to multiple tables (#6211)	https://github.com/apache/seatunnel/commit/74cfe1995f	2.3.4
[Hotfix][Oracle-CDC] Fix jdbc setFetchSize error (#6210)	https://github.com/apache/seatunnel/commit/b7f06ec6d9	2.3.4
[Feature][Oracle-CDC] Support read no primary key table (#6209)	https://github.com/apache/seatunnel/commit/3cb34c2b71	2.3.4
[Feature][Connector-V2][Oracle-cdc]Support for oracle cdc (#5196)	https://github.com/apache/seatunnel/commit/aaef22b31b	2.3.4

Oracle CDC

Support Those Engines​

Key features​

Description​

Notice​

Supported DataSource Info​

Database Dependency​

Install Jdbc Driver​

For Spark/Flink Engine​

For SeaTunnel Zeta Engine​

Enable Oracle Logminer​

Enabling Logminer without CDB (Container Database) mode.​

Oracle 11g is not supported​

Grant privileges only to the tables that need to be collected​

To enable Logminer in Oracle with CDB (Container Database) + PDB (Pluggable Database) mode​

Data Type Mapping​

Source Options​

decimal_type_narrowing​

Task Example​

Simple​

Support custom primary key for table​

Support debezium-compatible format send to kafka​

Changelog​