Version: Next

JDBC to S3

Use this recipe when you want to export table data from a relational database to an S3-compatible object store.

Prerequisites

Finish Run your first job.
Install the plugins required by this recipe. Follow Deployment > Download The Connector Plugins, then keep only the plugins below in config/plugin_config:

--seatunnel-connectors--
connector-jdbc
connector-file-s3
--end--

cd "${SEATUNNEL_HOME}"
sh bin/install-plugin.sh
ls connectors | rg 'connector-(jdbc|file-s3)'

Put the source database JDBC driver into ${SEATUNNEL_HOME}/lib for SeaTunnel Zeta, then confirm the jar is visible:

ls "${SEATUNNEL_HOME}/lib" | rg 'mysql-connector'

Put hadoop-aws and the AWS SDK bundle required by the S3 connector into ${SEATUNNEL_HOME}/lib, then confirm both jars are present:

ls "${SEATUNNEL_HOME}/lib" | rg 'hadoop-aws|aws-java-sdk-bundle'

Prepare the source table in the relational database. This example uses MySQL and exports two rows from analytics.orders:

CREATE DATABASE IF NOT EXISTS analytics;

CREATE TABLE IF NOT EXISTS analytics.orders (
  id BIGINT PRIMARY KEY,
  customer_id BIGINT,
  total_amount DECIMAL(16, 2),
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

INSERT INTO analytics.orders (id, customer_id, total_amount, updated_at) VALUES
  (5001, 101, 19.99, NOW()),
  (5002, 102, 29.99, NOW());

Prepare an S3 bucket and credentials with write permission. The example below assumes the bucket already exists and that access_key and secret_key can write to s3://company-data-lake/seatunnel/orders/.

Minimal configuration

This example exports a query result from MySQL to S3 in JSON lines format so that the output is easy to inspect.

env {
  parallelism = 1
  job.mode = "BATCH"
}

source {
  Jdbc {
    plugin_output = "orders_jdbc"
    url = "jdbc:mysql://mysql:3306/analytics"
    driver = "com.mysql.cj.jdbc.Driver"
    username = "root"
    password = "password"
    query = "select id, customer_id, total_amount, updated_at from orders"
  }
}

sink {
  S3File {
    plugin_input = "orders_jdbc"
    bucket = "s3a://company-data-lake"
    path = "/seatunnel/orders/"
    fs.s3a.endpoint = "s3.us-east-1.amazonaws.com"
    fs.s3a.aws.credentials.provider = "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
    access_key = "your-access-key"
    secret_key = "your-secret-key"
    file_format_type = "json"
    row_delimiter = "\n"
    custom_filename = true
    file_name_expression = "orders"
    filename_extension = "json"
    single_file_mode = true
    is_enable_transaction = false
    schema_save_mode = "CREATE_SCHEMA_WHEN_NOT_EXIST"
    data_save_mode = "APPEND_DATA"
  }
}

Run the job

Save the config as config/jdbc-to-s3.conf, then run SeaTunnel in local mode:

cd "${SEATUNNEL_HOME}"
./bin/seatunnel.sh --config ./config/jdbc-to-s3.conf -m local

Validation result

Run the source query directly in the database and record the expected row count.
Start the SeaTunnel job.
Verify that new objects are written to the expected S3 prefix.

aws s3 ls s3://company-data-lake/seatunnel/orders/ --recursive
aws s3 cp s3://company-data-lake/seatunnel/orders/orders.json - | head

If objects are created under the target prefix and the exported content matches the source query, the pipeline is working.

Common pitfalls

The JDBC driver is available on your workstation but not under ${SEATUNNEL_HOME}/lib.
bucket and path are mixed up. Keep the bucket in bucket and the prefix in path.
The credential provider does not match the authentication method you configured.
Large tables are exported through one unbounded query without filtering or partitioning.
Fixed filenames are only safe for this single-file tutorial. If you enable transactions again, keep ${transactionId} in file_name_expression.
The target endpoint is S3-compatible, but the fs.s3a.endpoint value still points to AWS.

JDBC to S3

Prerequisites​

Minimal configuration​

Run the job​

Validation result​

Common pitfalls​

Related docs​

Prerequisites

Minimal configuration

Run the job

Validation result

Common pitfalls

Related docs