Json
Description
Json analysis of the specified fields of the original data set
This transform ONLY supported by Spark.
Options
| name | type | required | default value |
|---|---|---|---|
| source_field | string | no | raw_message |
| target_field | string | no | root |
| schema_dir | string | no | - |
| schema_file | string | no | - |
| common-options | string | no | - |
source_field [string]
Source field, if not configured, the default is raw_message
target_field [string]
The target field, if it is not configured, the default is __root__ , and the result of Json parsing will be uniformly placed at the top of the Dataframe
schema_dir [string]
Style directory, if not configured, the default is $seatunnelRoot/plugins/json/files/schemas/
schema_file [string]
The style file name, if it is not configured, the default is empty, that is, the structure is not specified, and the system derives it by itself according to the input of the data source.
common options [string]
Transform plugin common parameters, please refer to Transform Plugin for details
Schema Use cases
json schemausage scenarios
The multiple data sources of a single task may contain different styles of json data. For example, the topicA style from Kafka is
{
"A": "a_val",
"B": "b_val"
}
The style from topicB is
{
"C": "c_val",
"D": "d_val"
}
When running Transform , you need to fuse the data of topicA and topicB into a wide table for calculation. You can specify a schema whose content style is:
{
"A": "a_val",
"B": "b_val",
"C": "c_val",
"D": "d_val"
}
Then the fusion output result of topicA and topicB is:
+-----+-----+-----+-----+
|A |B |C |D |
+-----+-----+-----+-----+
|a_val|b_val|null |null |
|null |null |c_val|d_val|
+-----+-----+-----+-----+
Examples
Do not use target_field
json {
source_field = "message"
}
- Source
+----------------------------+
|message |
+----------------------------+
|{"name": "ricky", "age": 24}|
|{"name": "gary", "age": 28} |
+----------------------------+
- Sink
+----------------------------+---+-----+
|message |age|name |
+----------------------------+---+-----+
|{"name": "gary", "age": 28} |28 |gary |
|{"name": "ricky", "age": 23}|23 |ricky|
+----------------------------+---+-----+
Use target_field
json {
source_field = "message"
target_field = "info"
}
- Souce
+----------------------------+
|message |
+----------------------------+
|{"name": "ricky", "age": 24}|
|{"name": "gary", "age": 28} |
+----------------------------+
- Sink
+----------------------------+----------+
|message |info |
+----------------------------+----------+
|{"name": "gary", "age": 28} |[28,gary] |
|{"name": "ricky", "age": 23}|[23,ricky]|
+----------------------------+----------+
The results of json processing support
select * from where info.age = 23such SQL statements
Use schema_file
json {
source_field = "message"
schema_file = "demo.json"
}
- Schema
Place the following content in ~/seatunnel/plugins/json/files/schemas/demo.json of Driver Node:
{
"name": "demo",
"age": 24,
"city": "LA"
}
- Source
+----------------------------+
|message |
+----------------------------+
|{"name": "ricky", "age": 24}|
|{"name": "gary", "age": 28} |
+----------------------------+
- Sink
+----------------------------+---+-----+-----+
|message |age|name |city |
+----------------------------+---+-----+-----+
|{"name": "gary", "age": 28} |28 |gary |null |
|{"name": "ricky", "age": 23}|23 |ricky|null |
+----------------------------+---+-----+-----+
If you use
cluster modefor deployment, make sure that thejson schemasdirectory is packaged inplugins.tar.gz