Deployment SeaTunnel Engine
1. Downloadโ
SeaTunnel Engine is the default engine of SeaTunnel. The installation package of SeaTunnel already contains all the contents of SeaTunnel Engine.
2 Config SEATUNNEL_HOMEโ
You can config SEATUNNEL_HOME
by add /etc/profile.d/seatunnel.sh
file. The content of /etc/profile.d/seatunnel.sh
are
export SEATUNNEL_HOME=${seatunnel install path}
export PATH=$PATH:$SEATUNNEL_HOME/bin
3. Config SeaTunnel Engine JVM optionsโ
SeaTunnel Engine supported two ways to set jvm options.
Add JVM Options to
$SEATUNNEL_HOME/bin/seatunnel-cluster.sh
.Modify the
$SEATUNNEL_HOME/bin/seatunnel-cluster.sh
file and addJAVA_OPTS="-Xms2G -Xmx2G"
in the first line.Add JVM Options when start SeaTunnel Engine. For example
seatunnel-cluster.sh -DJvmOption="-Xms2G -Xmx2G"
4. Config SeaTunnel Engineโ
SeaTunnel Engine provides many functions, which need to be configured in seatunnel.yaml.
4.1 Backup countโ
SeaTunnel Engine implement cluster management based on Hazelcast IMDG. The state data of cluster(Job Running State, Resource State) are storage is Hazelcast IMap. The data saved in Hazelcast IMap will be distributed and stored in all nodes of the cluster. Hazelcast will partition the data stored in Imap. Each partition can specify the number of backups. Therefore, SeaTunnel Engine can achieve cluster HA without using other services(for example zookeeper).
The backup count
is to define the number of synchronous backups. For example, if it is set to 1, backup of a partition will be placed on one other member. If it is 2, it will be placed on two other members.
We suggest the value of backup-count
is the min(1, max(5, N/2))
. N
is the number of the cluster node.
seatunnel:
engine:
backup-count: 1
# other config
4.2 Slot serviceโ
The number of Slots determines the number of TaskGroups the cluster node can run in parallel. SeaTunnel Engine is a data synchronization engine and most jobs are IO intensive.
Dynamic Slot is suggest.
seatunnel:
engine:
slot-service:
dynamic-slot: true
# other config
4.3 Checkpoint Managerโ
Like Flink, SeaTunnel Engine support ChandyโLamport algorithm. Therefore, SeaTunnel Engine can realize data synchronization without data loss and duplication.
interval
The interval between two checkpoints, unit is milliseconds. If the checkpoint.interval
parameter is configured in the env
of the job config file, the value set here will be overwritten.
timeout
The timeout of a checkpoint. If a checkpoint cannot be completed within the timeout period, a checkpoint failure will be triggered. Therefore, Job will be restored.
max-concurrent
How many checkpoints can be performed simultaneously at most.
tolerable-failure
Maximum number of retries after checkpoint failure.
Example
seatunnel:
engine:
backup-count: 1
print-execution-info-interval: 10
slot-service:
dynamic-slot: true
checkpoint:
interval: 300000
timeout: 10000
max-concurrent: 1
tolerable-failure: 2
checkpoint storage
About the checkpoint storage, you can see checkpoint storage
5. Config SeaTunnel Engine Serverโ
All SeaTunnel Engine Server config in hazelcast.yaml
file.
5.1 cluster-nameโ
The SeaTunnel Engine nodes use the cluster name to determine whether the other is a cluster with themselves. If the cluster names between the two nodes are different, the SeaTunnel Engine will reject the service request.
5.2 Networkโ
Base on Hazelcast, A SeaTunnel Engine cluster is a network of cluster members that run SeaTunnel Engine Server. Cluster members automatically join together to form a cluster. This automatic joining takes place with various discovery mechanisms that the cluster members use to find each other.
Please note that, after a cluster is formed, communication between cluster members is always via TCP/IP, regardless of the discovery mechanism used.
SeaTunnel Engine uses the following discovery mechanisms.
TCPโ
You can configure SeaTunnel Engine to be a full TCP/IP cluster. See the Discovering Members by TCP section for configuration details.
An example is like this hazelcast.yaml
hazelcast:
cluster-name: seatunnel
network:
join:
tcp-ip:
enabled: true
member-list:
- hostname1
port:
auto-increment: false
port: 5801
properties:
hazelcast.logging.type: log4j2
TCP is our suggest way in a standalone SeaTunnel Engine cluster.
On the other hand, Hazelcast provides some other service discovery methods. For details, please refer to hazelcast network
6. Config SeaTunnel Engine Clientโ
All SeaTunnel Engine Client config in hazelcast-client.yaml
.
6.1 cluster-nameโ
The Client must have the same cluster-name
with the SeaTunnel Engine. Otherwise, SeaTunnel Engine will reject the client request.
6.2 Networkโ
cluster-members
All SeaTunnel Engine Server Node address need add to here.
hazelcast-client:
cluster-name: seatunnel
properties:
hazelcast.logging.type: log4j2
network:
cluster-members:
- hostname1:5801
7. Start SeaTunnel Engine Server Nodeโ
mkdir -p $SEATUNNEL_HOME/logs
nohup seatunnel-cluster.sh &
The logs will write in $SEATUNNEL_HOME/logs/seatunnel-server.log
8. Install SeaTunnel Engine Clientโ
You only need to copy the $SEATUNNEL_HOME
directory on the SeaTunnel Engine node to the Client node and config the SEATUNNEL_HOME
like SeaTunnel Engine Server Node.