跳到主要内容
版本:2.1.2

ClickhouseFile

Description

Generate the clickhouse data file with the clickhouse-local program, and then send it to the clickhouse server, also call bulk load.

提示

Engine Supported and plugin name

  • Spark: ClickhouseFile
  • Flink

Options

nametyperequireddefault value
databasestringyes-
fieldsarrayno-
hoststringyes-
passwordstringno-
tablestringyes-
usernamestringno-
sharding_keystringno-
clickhouse_local_pathstringyes-
tmp_batch_cache_lineintno100000
copy_methodstringnoscp
node_free_passwordbooleannofalse
node_passlistno-
node_pass.node_addressstringno-
node_pass.passwordstringno-
common-optionsstringno-

database [string]

database name

fields [array]

The data field that needs to be output to ClickHouse , if not configured, it will be automatically adapted according to the data schema .

host [string]

ClickHouse cluster address, the format is host:port , allowing multiple hosts to be specified. Such as "host1:8123,host2:8123" .

password [string]

ClickHouse user password . This field is only required when the permission is enabled in ClickHouse .

table [string]

table name

username [string]

ClickHouse user username, this field is only required when permission is enabled in ClickHouse

sharding_key [string]

When use split_mode, which node to send data to is a problem, the default is random selection, but the 'sharding_key' parameter can be used to specify the field for the sharding algorithm. This option only worked when 'split_mode' is true.

clickhouse_local_path [string]

The address of the clickhouse-local program on the spark node. Since each task needs to be called, clickhouse-local should be located in the same path of each spark node.

tmp_batch_cache_line [int]

SeaTunnel will use memory map technology to write temporary data to the file to cache the data that the user needs to write to clickhouse. This parameter is used to configure the number of data pieces written to the file each time. Most of the time you don't need to modify it.

copy_method [string]

Specifies the method used to transfer files, the default is scp, optional scp and rsync

node_free_password [boolean]

Because seatunnel need to use scp or rsync for file transfer, seatunnel need clickhouse server-side access. If each spark node and clickhouse server are configured with password-free login, you can configure this option to true, otherwise you need to configure the corresponding node password in the node_pass configuration

node_pass [list]

Used to save the addresses and corresponding passwords of all clickhouse servers

node_pass.node_address [string]

The address corresponding to the clickhouse server

node_pass.node_password [string]

The password corresponding to the clickhouse server, only support root user yet.

common options [string]

Sink plugin common parameters, please refer to common options for details

ClickHouse type comparison table

ClickHouse field typeConvert plugin conversion goal typeSQL conversion expressionDescription
Datestringstring()yyyy-MM-dd Format string
DateTimestringstring()yyyy-MM-dd HH:mm:ss Format string
Stringstringstring()
Int8integerint()
Uint8integerint()
Int16integerint()
Uint16integerint()
Int32integerint()
Uint32longbigint()
Int64longbigint()
Uint64longbigint()
Float32floatfloat()
Float64doubledouble()
Decimal(P, S)-CAST(source AS DECIMAL(P, S))Decimal32(S), Decimal64(S), Decimal128(S) Can be used
Array(T)--
Nullable(T)Depends on TDepends on T
LowCardinality(T)Depends on TDepends on T

Examples

ClickhouseFile {
host = "localhost:8123"
database = "nginx"
table = "access_msg"
fields = ["date", "datetime", "hostname", "http_code", "data_size", "ua", "request_time"]
username = "username"
password = "password"
clickhouse_local_path = "/usr/bin/clickhouse-local"
node_free_password = true
}
ClickhouseFile {
host = "localhost:8123"
database = "nginx"
table = "access_msg"
fields = ["date", "datetime", "hostname", "http_code", "data_size", "ua", "request_time"]
username = "username"
password = "password"
sharding_key = "age"
clickhouse_local_path = "/usr/bin/Clickhouse local"
node_pass = [
{
node_address = "localhost1"
password = "password"
}
{
node_address = "localhost2"
password = "password"
}
]
}