Skip to main content
Version: 1.x

Introduction


Core Concepts

Event

Field Name

A valid field name should not contains ., @ and any other characters that not allowed in ANSI standard SQL 2003 syntax.

Reserved field names includes:

  • __root__ means top level of the event.
  • __metadata__ means metadata field for internal use.

Metadata

Metadata can be set as usual fields, all the fields in metadata are invisible for output, it's just for internal use.

Field Reference

Single level: a Multiple level: a.b.c Top leve (Root) Reference: __root__

[TODO] Notes: this design should be compatible with Spark SQL.


Input

Kafka


Filters

JSON

Split

Synopsis

SettingInput typeRequiredDefault value
delimiterstringno" "
keysarrayyes[]
source_fieldstringyes""
tag_on_failurestringno"_tag"
target_fieldstringno"__root__"

Details

  • delimiter

regular expression is supported.

  • keys

if number of parts splited by delimiter is larger than number of keys in keys, the extra parts in the right side will be ignored.

  • source_field

if source_field does not exists, nothing will be done.

  • target_field

SQL

SQL can be used to filter and aggregate events, the underlying techniques is Spark SQL.

For example, the following sql filters events that response_time between [300, 1200] milliseconds.

select * from mytable where response_time >= 300 and response_time <= 1200

And this sql count sales for each city:

select city, count(sales) from mytable group by city

Also, You can combine these two sqls into one sql for both filtering and aggregation:

select city, count(*) from mytable where response_time >= 300 and response_time <= 1200 group by city

Pipeline multiple sqls:

sql {
query {
table_name = "mytable1"
sql = "select * from mytable1 where "
}

query {
table_name = ""
}
}

Query

Synopsis

SettingInput typeRequiredDefault value
table_namestringno"mytable"
sqlstringyes-

TODO : maybe we can add a schema settings for explicitly defining table schema. By now, schema is auto generated.

Details

  • table_name

Registers a temporary table using the given name, the default value is "mytable". You can use it in sql, such as:

select * from mytable where http_status >= 500
  • sql

Executes a SQL query using the given sql string.


Output

Kafka

Serializer

Raw

The default serializer is raw. If no serializers configured in input/output, raw will be used.

Synopsis

SettingInput typeRequiredDefault value
charsetstringno"utf-8"

Details

  • charset

Serialize or deserialize using the given charset.

Available charsets are:

[TODO] list all supported charsets, refer to logstash and these links:

https://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html http://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html http://www.iana.org/assignments/character-sets/character-sets.xhtml

JSON

Tar.gz

compressed codec

Contact Us

  • Mail list: dev@seatunnel.apache.org. Mail to dev-subscribe@seatunnel.apache.org, follow the reply to subscribe the mail list.
  • Slack: Send Request to join SeaTunnel slack mail to the mail list(dev@seatunnel.apache.org), we will invite you in.