File
File sink connector
Description
Output data to local or hdfs file.
Engine Supported and plugin name
- Spark: File
- Flink: File
Options
- Spark
- Flink
name | type | required | default value |
---|---|---|---|
options | object | no | - |
partition_by | array | no | - |
path | string | yes | - |
path_time_format | string | no | yyyyMMddHHmmss |
save_mode | string | no | error |
serializer | string | no | json |
common-options | string | no | - |
options [object]
Custom parameters
partition_by [array]
Partition data based on selected fields
path [string]
The file path is required. The hdfs file
starts with hdfs://
, and the local file
starts with file://
,
we can add the variable ${now}
or ${uuid}
in the path, like hdfs:///test_${uuid}_${now}.txt
,
${now}
represents the current time, and its format can be defined by specifying the option path_time_format
path_time_format [string]
When the format in the path
parameter is xxxx-${now}
, path_time_format
can specify the time format of the path, and the default value is yyyy.MM.dd
. The commonly used time formats are listed as follows:
Symbol | Description |
---|---|
y | Year |
M | Month |
d | Day of month |
H | Hour in day (0-23) |
m | Minute in hour |
s | Second in minute |
See Java SimpleDateFormat for detailed time format syntax.
save_mode [string]
Storage mode, currently supports overwrite
, append
, ignore
and error
. For the specific meaning of each mode, see save-modes
serializer [string]
Serialization method, currently supports csv
, json
, parquet
, orc
and text
common options [string]
Sink plugin common parameters, please refer to Sink Plugin for details
name | type | required | default value |
---|---|---|---|
format | string | yes | - |
path | string | yes | - |
path_time_format | string | no | yyyyMMddHHmmss |
write_mode | string | no | - |
common-options | string | no | - |
parallelism | int | no | - |
rollover_interval | long | no | 1 |
max_part_size | long | no | 1024 |
prefix | string | no | seatunnel |
suffix | string | no | .ext |
format [string]
Currently, csv
, json
, and text
are supported. The streaming mode currently only supports text
path [string]
The file path is required. The hdfs file
starts with hdfs://
, and the local file
starts with file://
,
we can add the variable ${now}
or ${uuid}
in the path, like hdfs:///test_${uuid}_${now}.txt
,
${now}
represents the current time, and its format can be defined by specifying the option path_time_format
path_time_format [string]
When the format in the path
parameter is xxxx-${now}
, path_time_format
can specify the time format of the path, and the default value is yyyy.MM.dd
. The commonly used time formats are listed as follows:
Symbol | Description |
---|---|
y | Year |
M | Month |
d | Day of month |
H | Hour in day (0-23) |
m | Minute in hour |
s | Second in minute |
See Java SimpleDateFormat for detailed time format syntax.
write_mode [string]
NO_OVERWRITE
No overwrite, there is an error in the path
OVERWRITE
Overwrite, delete and then write if the path exists
common options [string]
Sink plugin common parameters, please refer to Sink Plugin for details
parallelism [Int
]
The parallelism of an individual operator, for FileSink
rollover_interval [long]
The new file part rollover interval, unit min.
max_part_size [long]
The max size of each file part, unit MB.
prefix [string]
The prefix of each file part.
suffix [string]
The suffix of each file part.
Example
- Spark
- Flink
file {
path = "file:///var/logs"
serializer = "text"
}
FileSink {
format = "json"
path = "hdfs://localhost:9000/flink/output/"
write_mode = "OVERWRITE"
}