Skip to main content
Version: 1.x

Clickhouse

Output plugin : Clickhouseโ€‹

Descriptionโ€‹

Write Rows to ClickHouse via Clickhouse-jdbc. You need to create the corresponding table in advance.

Optionsโ€‹

nametyperequireddefault value
bulk_sizenumberno20000
clickhouse.*stringno-
databasestringyes-
fieldslistyes-
hoststringyes-
passwordstringno-
tablestringyes-
usernamestringno-

bulk_size [number]โ€‹

The number of Rows written to ClickHouse through ClickHouse JDBC. Default is 20000.

database [string]โ€‹

ClickHouse database.

fields [list]โ€‹

Field list which need to be written to ClickHouseใ€‚

host [string]โ€‹

ClickHouse hosts, format as hostname:port

cluster [string]โ€‹

ClickHouse cluster name which the table belongs to, see Distributed

password [string]โ€‹

ClickHouse password, only used when ClickHouse has authority authentication.

table [string]โ€‹

ClickHouse table name.

username [string]โ€‹

ClickHouse username, only used when ClickHouse has authority authentication.

clickhouse [string]โ€‹

In addition to the above parameters that must be specified for the clickhouse jdbc, you can also specify multiple parameters described in clickhouse-jdbc settings

The way to specify parameters is to use the prefix "clickhouse" before the parameter. For example, socket_timeout is specified as: clickhouse.socket_timeout = 50000.If you do not specify these parameters, it will be set the default values according to clickhouse-jdbc.

ClickHouse Data Type Check Listโ€‹

ClickHouse Data TypeConvert Plugin Target TypeSQL ExpressionDescription
Datestringstring()Format of yyyy-MM-dd
DateTimestringstring()Format of yyyy-MM-dd HH:mm:ss
Stringstringstring()
Int8integerint()
Uint8integerint()
Int16integerint()
Uint16integerint()
Int32integerint()
Uint32longbigint()
Int64longbigint()
Uint64longbigint()
Float32floatfloat()
Float64doubledouble()
Array(T)--
Nullable(T)depend on Tdepend on T

Examplesโ€‹

clickhouse {
host = "localhost:8123"
clickhouse.socket_timeout = 50000
database = "nginx"
table = "access_msg"
fields = ["date", "datetime", "hostname", "http_code", "data_size", "ua", "request_time"]
username = "username"
password = "password"
bulk_size = 20000
}

distribue table configโ€‹

ClickHouse {
host = "localhost:8123"
database = "nginx"
table = "access_msg"
cluster = "no_replica_cluster"
fields = ["date", "datetime", "hostname", "http_code", "data_size", "ua", "request_time"]
}

Query system.clusters table info, find out which physic shard node store the table. The single spark partition would only write to a certain ClickHouse node using random policy.