Skip to main content
Version: 1.x

Clickhouse

Output plugin : Clickhouse

Description

Write Rows to ClickHouse via Clickhouse-jdbc. You need to create the corresponding table in advance.

Options

nametyperequireddefault value
bulk_sizenumberno20000
clickhouse.*stringno-
databasestringyes-
fieldslistyes-
hoststringyes-
passwordstringno-
tablestringyes-
usernamestringno-

bulk_size [number]

The number of Rows written to ClickHouse through ClickHouse JDBC. Default is 20000.

database [string]

ClickHouse database.

fields [list]

Field list which need to be written to ClickHouse。

host [string]

ClickHouse hosts, format as hostname:port

cluster [string]

ClickHouse cluster name which the table belongs to, see Distributed

password [string]

ClickHouse password, only used when ClickHouse has authority authentication.

table [string]

ClickHouse table name.

username [string]

ClickHouse username, only used when ClickHouse has authority authentication.

clickhouse [string]

In addition to the above parameters that must be specified for the clickhouse jdbc, you can also specify multiple parameters described in clickhouse-jdbc settings

The way to specify parameters is to use the prefix "clickhouse" before the parameter. For example, socket_timeout is specified as: clickhouse.socket_timeout = 50000.If you do not specify these parameters, it will be set the default values according to clickhouse-jdbc.

ClickHouse Data Type Check List

ClickHouse Data TypeConvert Plugin Target TypeSQL ExpressionDescription
Datestringstring()Format of yyyy-MM-dd
DateTimestringstring()Format of yyyy-MM-dd HH:mm:ss
Stringstringstring()
Int8integerint()
Uint8integerint()
Int16integerint()
Uint16integerint()
Int32integerint()
Uint32longbigint()
Int64longbigint()
Uint64longbigint()
Float32floatfloat()
Float64doubledouble()
Array(T)--
Nullable(T)depend on Tdepend on T

Examples

clickhouse {
host = "localhost:8123"
clickhouse.socket_timeout = 50000
database = "nginx"
table = "access_msg"
fields = ["date", "datetime", "hostname", "http_code", "data_size", "ua", "request_time"]
username = "username"
password = "password"
bulk_size = 20000
}

distribue table config

ClickHouse {
host = "localhost:8123"
database = "nginx"
table = "access_msg"
cluster = "no_replica_cluster"
fields = ["date", "datetime", "hostname", "http_code", "data_size", "ua", "request_time"]
}

Query system.clusters table info, find out which physic shard node store the table. The single spark partition would only write to a certain ClickHouse node using random policy.