Set Up with Locally
Let's take an application that randomly generates data in memory, processes it through SQL, and finally outputs it to the console as an example.
Step 1: Prepare the environment
Before you getting start the local run, you need to make sure you already have installed the following software which SeaTunnel required:
- Java (Java 8 or 11, other versions greater than Java 8 can theoretically work as well) installed and
JAVA_HOME
set. - Download the engine, you can choose and download one of them from below as your favour, you could see more information about why we need engine in SeaTunnel
- Spark: Please download Spark first(required version >= 2 and version < 3.x ). For more information you could see Getting Started: standalone
- Flink: Please download Flink first(required version >= 1.12.0 and version < 1.14.x ). For more information you could see Getting Started: standalone
Step 2: Download SeaTunnel
Enter the seatunnel download page and download the latest version of distribute
package seatunnel-<version>-bin.tar.gz
Or you can download it by terminal
export version="2.2.0-beta"
wget "https://archive.apache.org/dist/incubator/seatunnel/${version}/apache-seatunnel-incubating-${version}-bin.tar.gz"
tar -xzvf "apache-seatunnel-incubating-${version}-bin.tar.gz"
Step 3: Install connectors plugin
Since 2.2.0-beta, the binary package does not provide connector dependencies by default, so when using it for the first time, we need to execute the following command to install the connector: (Of course, you can also manually download the connector from [Apache Maven Repository](https://repo. maven.apache.org/maven2/org/apache/seatunnel/ to download, then manually move to the corresponding subdirectory of the connectors directory, for example, flink plugins should be placed in the flink directory, spark plugins should be placed in the spark directory).
sh bin/install_plugin.sh 2.2.0-beta
If you need to specify the version of the connector, take 2.2.0-beta as an example, we need to execute
sh bin/install_plugin.sh 2.2.0-beta
Usually we don't need all the connector plugins, so you can specify the plugins you need by configuring config/plugin_config
, for example, you only need the flink-console
plugin, then you can modify plugin.properties as
--flink-connectors--
seatunnel-connector-flink-console
--end--
If we want our sample application to work properly, we need to add the following plugins
- Spark
- Flink
--spark-connectors--
seatunnel-connector-spark-fake
seatunnel-connector-spark-console
--end--
--flink-connectors--
seatunnel-connector-flink-fake
seatunnel-connector-flink-console
--end--
You can find all supported connectors and corresponding plugin_config configuration names under ${SEATUNNEL_HOME}/connectors/plugins-mapping.properties
.
If you want to install the connector plugin by manually downloading the connector, you need to pay special attention to the following
The connectors directory contains the following subdirectories, if they do not exist, you need to create them manually
flink
flink-sql
seatunnel
spark
If you want to manually install the connector plugin of the flink engine, you need to download the connector plugin of the flink engine you need, and then put them in the flink directory. Similarly, if you want to manually install the connector plugin of the spark engine, you need to download the connector plugin of the spark engine you need, and then put them in the spark directory
Step 4: Configure SeaTunnel Application
Configure SeaTunnel: Change the setting in config/seatunnel-env.sh
, it is base on the path your engine install at prepare step two.
Change SPARK_HOME
if you using Spark as your engine, or change FLINK_HOME
if you're using Flink.
Edit config/flink(spark).streaming.conf.template
, which determines the way and logic of data input, processing, and output after seatunnel is started.
The following is an example of the configuration file, which is the same as the example application mentioned above.
######
###### This config file is a demonstration of streaming processing in SeaTunnel config
######
env {
# You can set flink configuration here
execution.parallelism = 1
# For Spark
#spark.app.name = "SeaTunnel"
#spark.executor.instances = 2
#spark.executor.cores = 1
#spark.executor.memory = "1g"
#spark.master = local
}
source {
FakeSourceStream {
result_table_name = "fake"
field_name = "name,age"
}
}
transform {
sql {
sql = "select name,age from fake"
}
}
sink {
ConsoleSink {}
}
More information about config please check config concept
Step 5: Run SeaTunnel Application
You could start the application by the following commands
- Spark
- Flink
cd "apache-seatunnel-incubating-${version}"
./bin/start-seatunnel-spark.sh \
--master local[4] \
--deploy-mode client \
--config ./config/spark.streaming.conf.template
cd "apache-seatunnel-incubating-${version}"
./bin/start-seatunnel-flink.sh \
--config ./config/flink.streaming.conf.template
See The Output: When you run the command, you could see its output in your console or in Flink UI, You can think this is a sign that the command ran successfully or not.
- Spark
- Flink
Hello World, SeaTunnel
Hello World, SeaTunnel
Hello World, SeaTunnel
...
Hello World, SeaTunnel
The content printed in the TaskManager Stdout log of flink WebUI
, is two columned record just like below(your
content maybe different cause we use fake source to create data random):
apache, 15
seatunnel, 30
incubator, 20
...
topLevel, 20
Explore More Build-in Examples
Our local quick start is using one of the build-in example in directory config
, and we provider more than one out-of-box
example you could and feel free to have a try and make your hands dirty. All you have to do is change the started command
option value in running application to the configuration you want to run, we use batch
template in config
as examples:
- Spark
- Flink
cd "apache-seatunnel-incubating-${version}"
./bin/start-seatunnel-spark.sh \
--master local[4] \
--deploy-mode client \
--config ./config/spark.batch.conf.template
cd "apache-seatunnel-incubating-${version}"
./bin/start-seatunnel-flink.sh \
--config ./config/flink.batch.conf.template
What's More
For now, you are already take a quick look about SeaTunnel, you could see connector to find all source and sink SeaTunnel supported. Or see deployment if you want to submit your application in other kind of your engine cluster.