Json
Transform plugin : Json [Spark]
Description
Json analysis of the specified fields of the original data set
Options
name | type | required | default value |
---|---|---|---|
source_field | string | no | raw_message |
target_field | string | no | root |
schema_dir | string | no | - |
schema_file | string | no | - |
common-options | string | no | - |
source_field [string]
Source field, if not configured, the default is raw_message
target_field [string]
The target field, if it is not configured, the default is __root__
, and the result of Json parsing will be uniformly placed at the top of the Dataframe
schema_dir [string]
Style directory, if not configured, the default is $seatunnelRoot/plugins/json/files/schemas/
schema_file [string]
The style file name, if it is not configured, the default is empty, that is, the structure is not specified, and the system derives it by itself according to the input of the data source.
common options [string]
Transform plugin common parameters, please refer to Transform Plugin for details
Schema Use cases
json schema
usage scenarios
The multiple data sources of a single task may contain different styles of json data. For example, the topicA
style from Kafka
is
{
"A": "a_val",
"B": "b_val"
}
The style from topicB
is
{
"C": "c_val",
"D": "d_val"
}
When running Transform
, you need to fuse the data of topicA
and topicB
into a wide table for calculation. You can specify a schema
whose content style is:
{
"A": "a_val",
"B": "b_val",
"C": "c_val",
"D": "d_val"
}
Then the fusion output result of topicA
and topicB
is:
+-----+-----+-----+-----+
|A |B |C |D |
+-----+-----+-----+-----+
|a_val|b_val|null |null |
|null |null |c_val|d_val|
+-----+-----+-----+-----+
Examples
Do not use target_field
json {
source_field = "message"
}
- Source
+----------------------------+
|message |
+----------------------------+
|{"name": "ricky", "age": 24}|
|{"name": "gary", "age": 28} |
+----------------------------+
- Sink
+----------------------------+---+-----+
|message |age|name |
+----------------------------+---+-----+
|{"name": "gary", "age": 28} |28 |gary |
|{"name": "ricky", "age": 23}|23 |ricky|
+----------------------------+---+-----+
Use target_field
json {
source_field = "message"
target_field = "info"
}
- Souce
+----------------------------+
|message |
+----------------------------+
|{"name": "ricky", "age": 24}|
|{"name": "gary", "age": 28} |
+----------------------------+
- Sink
+----------------------------+----------+
|message |info |
+----------------------------+----------+
|{"name": "gary", "age": 28} |[28,gary] |
|{"name": "ricky", "age": 23}|[23,ricky]|
+----------------------------+----------+
The results of json processing support
select * from where info.age = 23
such SQL statements
Use schema_file
json {
source_field = "message"
schema_file = "demo.json"
}
- Schema
Place the following content in ~/seatunnel/plugins/json/files/schemas/demo.json
of Driver Node:
{
"name": "demo",
"age": 24,
"city": "LA"
}
- Source
+----------------------------+
|message |
+----------------------------+
|{"name": "ricky", "age": 24}|
|{"name": "gary", "age": 28} |
+----------------------------+
- Sink
+----------------------------+---+-----+-----+
|message |age|name |city |
+----------------------------+---+-----+-----+
|{"name": "gary", "age": 28} |28 |gary |null |
|{"name": "ricky", "age": 23}|23 |ricky|null |
+----------------------------+---+-----+-----+
If you use
cluster mode
for deployment, make sure that thejson schemas
directory is packaged inplugins.tar.gz