File
Description
read data from local or hdfs file.
Write Data to a Doris Table.
提示
Engine Supported and plugin name
- Spark: File
- Flink: File
Options
- Spark
- Flink
| name | type | required | default value |
|---|---|---|---|
| format | string | no | json |
| path | string | yes | - |
| common-options | string | yes | - |
format [string]
Format for reading files, currently supports text, parquet, json, orc, csv.
| name | type | required | default value |
|---|---|---|---|
| format.type | string | yes | - |
| path | string | yes | - |
| schema | string | yes | - |
| common-options | string | no | - |
| parallelism | int | no | - |
format.type [string]
The format for reading files from the file system, currently supports csv , json , parquet , orc and text .
schema [string]
- csv
- The
schemaofcsvis a string ofjsonArray, such as"[{\"type\":\"long\"},{\"type\":\"string\"}]", this can only specify the type of the field , The field name cannot be specified, and the common configuration parameterfield_nameis generally required.
- The
- json
- The
schemaparameter ofjsonis to provide ajson stringof the original data, and theschemacan be automatically generated, but the original data with the most complete content needs to be provided, otherwise the fields will be lost.
- The
- parquet
- The
schemaofparquetis anAvro schema string, such as{\"type\":\"record\",\"name\":\"test\",\"fields\":[{\"name\" :\"a\",\"type\":\"int\"},{\"name\":\"b\",\"type\":\"string\"}]}.
- The
- orc
- The
schemaoforcis the string oforc schema, such as"struct<name:string,addresses:array<struct<street:string,zip:smallint>>>".
- The
- text
- The
schemaoftextcan be filled withstring.
- The
parallelism [Int]
The parallelism of an individual operator, for FileSource
path [string]
- If read data from hdfs , the file path should start with
hdfs:// - If read data from local , the file path should start with
file://
common options [string]
Source plugin common parameters, please refer to Source Plugin for details
Examples
- Spark
- Flink
file {
path = "hdfs:///var/logs"
result_table_name = "access_log"
}
file {
path = "file:///var/logs"
result_table_name = "access_log"
}
FileSource{
path = "hdfs://localhost:9000/input/"
format.type = "json"
schema = "{\"data\":[{\"a\":1,\"b\":2},{\"a\":3,\"b\":4}],\"db\":\"string\",\"q\":{\"s\":\"string\"}}"
result_table_name = "test"
}