General configuration
Core idea
Row is a piece of data in the logical sense of seatunnel, and is the basic unit of data processing. When Filter processes data, all data will be mapped to Row.
Field is a field of Row. Row can contain nested levels of fields.
raw_message refers to the
raw_message
field in the Row for the data entered from the input.root refers to the same field level as the top-level field of Row, and is often used to specify the storage location (top level field) of new fields generated during data processing in Row.
config file
A complete seatunnel configuration includes spark
, input
, filter
, output
, namely:
spark {
...
}
input {
...
}
filter {
...
}
output {
...
}
spark
is spark related configuration,
Configurable spark parameters see: Spark Configuration, Among them, the two parameters of master and deploy-mode cannot be configured here and need to be specified in the seatunnel startup script.
input
can configure any input plugin and its parameters, and the specific parameters vary with different input plugins.filter
can configure any filter plugin and its parameters, and the specific parameters vary with different filter plugins.
Multiple plugins in the filter form a data processing pipeline in the configuration order, and the output of the previous filter is the input of the next filter.
output
can configure any output plugin and its parameters, and the specific parameters vary with different output plugins.
The data processed by filter
will be sent to each plugin configured in output
.
Configuration file example
An example is as follows:
In configuration, behavior comments beginning with
#
.
spark {
# You can set spark configuration here
# seatunnel defined streaming batch duration in seconds
spark.streaming.batchDuration = 5
# see available properties defined by spark: https://spark.apache.org/docs/latest/configuration.html#available-properties
spark.app.name = "seatunnel"
spark.executor.instances = 2
spark.executor.cores = 1
spark.executor.memory = "1g"
}
input {
# This is an example input plugin **only for test and demonstrate the feature input plugin**
fakestream {
content = ["Hello World, InterestingLab"]
rate = 1
}
# If you would like to get more information about how to configure seatunnel and see full list of input plugins,
# please go to https://interestinglab.github.io/seatunnel-docs/#/en-us/v1/configuration/base
}
filter {
split {
fields = ["msg", "name"]
delimiter = ","
}
# If you would like to get more information about how to configure seatunnel and see full list of filter plugins,
# please go to https://interestinglab.github.io/seatunnel-docs/#/en-us/v1/configuration/base
}
output {
stdout {}
# If you would like to get more information about how to configure seatunnel and see full list of output plugins,
# please go to https://interestinglab.github.io/seatunnel-docs/#/en-us/v1/configuration/base
}
For other configurations, please refer to:
Configuration Example 1: Streaming Streaming Computing
Configuration example 2: Batch offline batch
Configuration example 3: A flexible multi-data process processing