Version: 2.1.0

Quick start

Let's take an application that receives data through a socket , divides the data into multiple fields, and outputs the processing results as an example to quickly show how to use seatunnel.

Step 1: Prepare Spark runtime environment

If you are familiar with Spark or have prepared the Spark operating environment, you can ignore this step. Spark does not require any special configuration.

Please download Spark first, please choose Spark version >= 2.x.x . After downloading and decompressing, you can submit the Spark deploy-mode = local mode task without any configuration. If you expect tasks to run on Standalone clusters or Yarn clusters or Mesos clusters, please refer to the Spark deployment documentation on the Spark official website.

Step 2: Download seatunnel

Enter the seatunnel installation package download page and download the latest version of seatunnel-<version>.zip

Or download the specified version directly (take 2.0.4 as an example):

wget https://github.com/apache/incubator-seatunnel/releases/download/v2.0.4/waterdrop-dist-2.0.4-2.11.8-release.zip -O seatunnel-2.0.4.zip

After downloading, unzip:

unzip seatunnel-<version>.zip
ln -s seatunnel-<version> seatunnel

Step 3: Configure seatunnel

Edit config/seatunnel-env.sh , specify the necessary environment configuration such as SPARK_HOME (the directory after Spark downloaded and decompressed in Step 1)
Create a new config/application.conf , which determines the method and logic of data input, processing, and output after seatunnel is started.

env {
  # seatunnel defined streaming batch duration in seconds
  spark.streaming.batchDuration = 5

  spark.app.name = "seatunnel"
  spark.ui.port = 13000
}

source {
  socketStream {}
}

transform {
  split {
    fields = ["msg", "name"]
    delimiter = ","
  }
}

sink {
  console {}
}

Step 4: Start the `netcat server` to send data

nc -lk 9999

Step 5: start seatunnel

cd seatunnel
./bin/start-seatunnel-spark.sh \
--master local[4] \
--deploy-mode client \
--config ./config/application.conf

Step 6: Input at the `nc` terminal

Hello World, seatunnel

The seatunnel log prints out:

+----------------------+-----------+---------+
|raw_message           |msg        |name     |
+----------------------+-----------+---------+
|Hello World, seatunnel|Hello World|seatunnel|
+----------------------+-----------+---------+

summary

seatunnel is simple and easy to use, and there are more abundant data processing functions waiting to be discovered. The data processing case shown in this article does not require any code, compilation, and packaging, and is simpler than the official Quick Example.

If you want to know more seatunnel configuration examples, please refer to:

Configuration example 2: Batch offline batch processing

The above configuration is the default [offline batch configuration template], which can be run directly, the command is as follows:

cd seatunnel
./bin/start-seatunnel-spark.sh \
--master 'local[2]' \
--deploy-mode client \
--config ./config/spark.batch.conf.template

Quick start

Step 1: Prepare Spark runtime environment​

Step 2: Download seatunnel​

Step 3: Configure seatunnel​

Step 4: Start the netcat server to send data​

Step 5: start seatunnel​

Step 6: Input at the nc terminal​

summary​