Hive
Hive source connector
Descriptionâ
Read data from Hive.
In order to use this connector, You must ensure your spark/flink cluster already integrated hive. The tested hive version is 2.3.9.
If you use SeaTunnel Engine, You need put seatunnel-hadoop3-3.1.4-uber.jar and hive-exec-2.3.9.jar in $SEATUNNEL_HOME/lib/ dir.
Key featuresâ
Read all the data in a split in a pollNext call. What splits are read will be saved in snapshot.
- schema projection
- parallelism
- support user-defined split
- file format
- text
- csv
- parquet
- orc
- json
Optionsâ
name | type | required | default value |
---|---|---|---|
table_name | string | yes | - |
metastore_uri | string | yes | - |
kerberos_principal | string | no | - |
kerberos_keytab_path | string | no | - |
hdfs_site_path | string | no | - |
hive_site_path | string | no | - |
read_partitions | list | no | - |
read_columns | list | no | - |
abort_drop_partition_metadata | boolean | no | true |
common-options | no | - |
table_name [string]â
Target Hive table name eg: db1.table1
metastore_uri [string]â
Hive metastore uri
hdfs_site_path [string]â
The path of hdfs-site.xml
, used to load ha configuration of namenodes
hive_site_path [string]â
The path of hive-site.xml
, used to authentication hive metastore
read_partitions [list]â
The target partitions that user want to read from hive table, if user does not set this parameter, it will read all the data from hive table.
Tips: Every partition in partitions list should have the same directory depth. For example, a hive table has two partitions: par1 and par2, if user sets it like as the following: read_partitions = [par1=xxx, par1=yyy/par2=zzz], it is illegal
kerberos_principal [string]â
The principal of kerberos authentication
kerberos_keytab_path [string]â
The keytab file path of kerberos authentication
read_columns [list]â
The read column list of the data source, user can use it to implement field projection.
abort_drop_partition_metadata [list]â
Flag to decide whether to drop partition metadata from Hive Metastore during an abort operation. Note: this only affects the metadata in the metastore, the data in the partition will always be deleted(data generated during the synchronization process).
common optionsâ
Source plugin common parameters, please refer to Source Common Options for details
Exampleâ
Hive {
table_name = "default.seatunnel_orc"
metastore_uri = "thrift://namenode001:9083"
}
Changelogâ
2.2.0-beta 2022-09-26â
- Add Hive Source Connector