Skip to main content
Version: Next

Hbase

Hbase Source Connector

Description

Reads data from Apache Hbase.

Key Features

Options

NameTypeRequiredDefault
zookeeper_quorumstringYes-
tablestringYes-
schemaconfigYes-
hbase_extra_configstringNo-
cachingintNo-1
batchintNo-1
cache_blocksbooleanNofalse
is_binary_rowkeybooleanNofalse
start_rowkeystringNo-
end_rowkeystringNo-
start_row_inclusivebooleanNotrue
end_row_inclusivebooleanNofalse
common-optionsNo-

zookeeper_quorum [string]

The zookeeper quorum for Hbase cluster hosts, e.g., "hadoop001:2181,hadoop002:2181,hadoop003:2181".

table [string]

The name of the table to write to, e.g., "seatunnel".

schema [config]

Hbase stores data in byte arrays. Therefore, you need to configure the data types for each column in the table. For more information, see: guide.

hbase_extra_config [config]

Additional configurations for Hbase.

caching

The caching parameter sets the number of rows fetched per server trip during scans. This reduces round-trips between client and server, improving scan efficiency. Default: -1.

batch

The batch parameter sets the maximum number of columns returned per scan. This is useful for rows with many columns to avoid fetching excessive data at once, thus saving memory and improving performance.

cache_blocks

The cache_blocks parameter determines whether to cache data blocks during scans. By default, HBase caches data blocks during scans. Setting this to false reduces memory usage during scans. Default in SeaTunnel: false.

is_binary_rowkey

The row key in HBase can be either a text string or binary data. In SeaTunnel, the row key is set to a text string by default (i.e., the default value of is_binary_rowkey is false).

start_rowkey

The start row of the scan

end_rowkey

The stop row of the scan

start_row_inclusive

Whether to include the start row in the scan range. When set to true, the start row is included in the scan results. Default: true (inclusive).

Note: In most cases, you should keep the default value (true). Only modify this parameter if you have specific requirements for excluding the start row from your scan results.

end_row_inclusive

Whether to include the end row in the scan range. When set to false, the end row is excluded from the scan results, following the left-closed-right-open convention [start, end). Default: false (exclusive).

Note: In most cases, you should keep the default value (false) which follows HBase's standard left-closed-right-open convention. Only modify this parameter if you need to include the end row in your scan results.

Important: When using parallel reading with multiple splits, the combination of these two parameters is critical for data integrity:

  • Default (start_row_inclusive=true, end_row_inclusive=false): This is the recommended configuration that ensures no data loss or duplication across splits. Each split follows the [start, end) convention.
  • Both false (start_row_inclusive=false, end_row_inclusive=false): This may cause data loss at split boundaries, as the boundary rows will be excluded from all splits.
  • Both true (start_row_inclusive=true, end_row_inclusive=true): This may cause duplicate data at split boundaries, as the boundary rows will be included in multiple adjacent splits.

common-options

Common parameters for Source plugins, refer to Common Source Options.

Example

source {
Hbase {
zookeeper_quorum = "hadoop001:2181,hadoop002:2181,hadoop003:2181"
table = "seatunnel_test"
caching = 1000
batch = 100
cache_blocks = false
is_binary_rowkey = false
start_rowkey = "B"
end_rowkey = "C"
schema = {
columns = [
{
name = "rowkey"
type = string
},
{
name = "columnFamily1:column1"
type = boolean
},
{
name = "columnFamily1:column2"
type = double
},
{
name = "columnFamily2:column1"
type = bigint
}
]
}
}
}

Changelog

Change Log
ChangeCommitVersion
[Feature][Checkpoint] Add check script for source/sink state class serialVersionUID missing (#9118)https://github.com/apache/seatunnel/commit/4f5adeb1c72.3.11
[Improve] hbase options (#8923)https://github.com/apache/seatunnel/commit/b6a702b58f2.3.10
[Improve] restruct connector common options (#8634)https://github.com/apache/seatunnel/commit/f3499a6eeb2.3.10
[Improve][dist]add shade check rule (#8136)https://github.com/apache/seatunnel/commit/51ef8000162.3.9
[Feature][Restapi] Allow metrics information to be associated to logical plan nodes (#7786)https://github.com/apache/seatunnel/commit/6b7c53d03c2.3.9
[Fix][Connector-V2] Fix known directory create and delete ignore issues (#7700)https://github.com/apache/seatunnel/commit/e2fb6795772.3.8
[Feature][Connector-V2][Hbase] implement hbase catalog (#7516)https://github.com/apache/seatunnel/commit/b978792cb12.3.8
[Feature][Connector-V2] Support multi-table sink feature for HBase (#7169)https://github.com/apache/seatunnel/commit/025fa3bb882.3.8
[hotfix][connector-v2-hbase]fix and optimize hbase source problem (#7148)https://github.com/apache/seatunnel/commit/34a6b8e9f62.3.7
[Improve][hbase] The specified column is written to the specified column family (#5234)https://github.com/apache/seatunnel/commit/49d397c61d2.3.6
[feature][connector-v2-hbase-sink] Support Connector v2 HBase sink TTL data writing (#7116)https://github.com/apache/seatunnel/commit/adafd802552.3.6
[E2E][HBase]Refactor hbase e2e (#6859)https://github.com/apache/seatunnel/commit/1da9bd6ce42.3.6
[Connector]Add hbase source connector (#6348)https://github.com/apache/seatunnel/commit/f108a5e6582.3.6
[Feature][HbaseSink]support array data. (#6100)https://github.com/apache/seatunnel/commit/b5920147662.3.4
[Improve][Common] Introduce new error define rule (#5793)https://github.com/apache/seatunnel/commit/9d1b2582b22.3.4
[Improve] Remove use SeaTunnelSink::getConsumedType method and mark it as deprecated (#5755)https://github.com/apache/seatunnel/commit/8de74081002.3.4
[Hotfix][Connector-v2][HbaseSink]Fix default timestamp (#4958)https://github.com/apache/seatunnel/commit/3d8f3bf9022.3.3
[Improve][build] Give the maven module a human readable name (#4114)https://github.com/apache/seatunnel/commit/d7cd6010512.3.1
[Improve][Project] Code format with spotless plugin. (#4101)https://github.com/apache/seatunnel/commit/a2ab1665612.3.1
[Feature][Connector-V2][Hbase] Introduce hbase sink connector (#4049)https://github.com/apache/seatunnel/commit/68bda94a4c2.3.1