File
File sink connector
Description
Output data to local or hdfs file.
Engine Supported and plugin name
- Spark: File
- Flink: File
Options
- Spark
- Flink
| name | type | required | default value |
|---|---|---|---|
| options | object | no | - |
| partition_by | array | no | - |
| path | string | yes | - |
| path_time_format | string | no | yyyyMMddHHmmss |
| save_mode | string | no | error |
| serializer | string | no | json |
| common-options | string | no | - |
options [object]
Custom parameters
partition_by [array]
Partition data based on selected fields
path [string]
The file path is required. The hdfs file starts with hdfs:// , and the local file starts with file://,
we can add the variable ${now} or ${uuid} in the path, like hdfs:///test_${uuid}_${now}.txt,
${now} represents the current time, and its format can be defined by specifying the option path_time_format
path_time_format [string]
When the format in the path parameter is xxxx-${now} , path_time_format can specify the time format of the path, and the default value is yyyy.MM.dd . The commonly used time formats are listed as follows:
| Symbol | Description |
|---|---|
| y | Year |
| M | Month |
| d | Day of month |
| H | Hour in day (0-23) |
| m | Minute in hour |
| s | Second in minute |
See Java SimpleDateFormat for detailed time format syntax.
save_mode [string]
Storage mode, currently supports overwrite , append , ignore and error . For the specific meaning of each mode, see save-modes
serializer [string]
Serialization method, currently supports csv , json , parquet , orc and text
common options [string]
Sink plugin common parameters, please refer to Sink Plugin for details
| name | type | required | default value |
|---|---|---|---|
| format | string | yes | - |
| path | string | yes | - |
| path_time_format | string | no | yyyyMMddHHmmss |
| write_mode | string | no | - |
| common-options | string | no | - |
| parallelism | int | no | - |
| rollover_interval | long | no | 1 |
| max_part_size | long | no | 1024 |
| prefix | string | no | seatunnel |
| suffix | string | no | .ext |
format [string]
Currently, csv , json , and text are supported. The streaming mode currently only supports text
path [string]
The file path is required. The hdfs file starts with hdfs:// , and the local file starts with file://,
we can add the variable ${now} or ${uuid} in the path, like hdfs:///test_${uuid}_${now}.txt,
${now} represents the current time, and its format can be defined by specifying the option path_time_format
path_time_format [string]
When the format in the path parameter is xxxx-${now} , path_time_format can specify the time format of the path, and the default value is yyyy.MM.dd . The commonly used time formats are listed as follows:
| Symbol | Description |
|---|---|
| y | Year |
| M | Month |
| d | Day of month |
| H | Hour in day (0-23) |
| m | Minute in hour |
| s | Second in minute |
See Java SimpleDateFormat for detailed time format syntax.
write_mode [string]
NO_OVERWRITE
No overwrite, there is an error in the path
OVERWRITE
Overwrite, delete and then write if the path exists
common options [string]
Sink plugin common parameters, please refer to Sink Plugin for details
parallelism [Int]
The parallelism of an individual operator, for FileSink
rollover_interval [long]
The new file part rollover interval, unit min.
max_part_size [long]
The max size of each file part, unit MB.
prefix [string]
The prefix of each file part.
suffix [string]
The suffix of each file part.
Example
- Spark
- Flink
file {
path = "file:///var/logs"
serializer = "text"
}
FileSink {
format = "json"
path = "hdfs://localhost:9000/flink/output/"
write_mode = "OVERWRITE"
}