Skip to main content
Version: Next

HugeGraph Sink Connector

Sink: HugeGraph

Description

The HugeGraph sink connector allows you to write data from SeaTunnel to Apache HugeGraph, a fast and scalable graph database.

This connector supports writing data as vertices or edges, providing flexible mapping from relational data models to graph structures. It is designed for high-performance data loading.

Features

  • Batch Writing: Data is written in batches for high throughput.
  • Flexible Mapping: Supports flexible mapping of source fields to vertex/edge properties.
  • Vertex and Edge Writing: Can write data as either vertices or edges.
  • Automatic Schema Creation: Can automatically create graph schema elements (property keys, vertex labels, edge labels) if they do not exist.

Configuration Options

NameTypeRequiredDefault ValueDescription
hostStringYes-The host of the HugeGraph server.
portIntegerYes-The port of the HugeGraph server.
graph_nameStringYes-The name of the graph to write to.
graph_spaceStringYes-The graph space of the graph to be operated on.
usernameStringNo-The username for HugeGraph authentication.
passwordStringNo-The password for HugeGraph authentication.
batch_sizeIntegerNo500The number of records to buffer before writing to HugeGraph in a single batch.
batch_interval_msIntegerNo5000The maximum time in milliseconds to wait before flushing a batch.
max_retriesIntegerNo3The maximum number of times to retry a failed write operation.
retry_backoff_msIntegerNo5000The backoff time between retries in milliseconds.

Sink Options

NameTypeRequiredDefault ValueDescription
schema_configObjectYes-The configuration for mapping the input data to HugeGraph's schema (vertices or edges).
selected_fieldsListNo-A list of fields to be selected from the input data. If not specified, all fields will be used.
ignored_fieldsListNo-A list of fields to be ignored from the input data. Mutually exclusive with selected_fields.

Schema Configuration (schema_config)

Each object in the schema_config list defines a mapping from the source data to a specific vertex or edge label in HugeGraph.

NameTypeRequiredDefault ValueDescription
typeStringYes-The type of graph element to map to. Must be VERTEX or EDGE.
labelStringYes-The label of the vertex or edge in HugeGraph.
propertiesList<String>No-A list of source field names for the vertex or edge.
ttlLongNo-The time-to-live for the vertex or edge in seconds.
ttlStartTimeStringNo-The start time for the TTL.
enableLabelIndexBooleanNofalseWhether to enable label index for this label.
userdataMap<String, Object>No-User-defined data associated with the label.
idStrategyStringFor Vertex-The ID generation strategy for vertices. Supported values: PRIMARY_KEY, CUSTOMIZE_UUID, AUTOMATIC.
idFieldsList<string>For Vertex-A list of source field names used to generate the vertex ID.
sourceConfigObjectFor Edge-An object defining the mapping for the edge's source vertex. See Source/Target Config below.
targetConfigObjectFor Edge-An object defining the mapping for the edge's target vertex. See Source/Target Config below.
frequencyStringFor Edge-The frequency of the edge, e.g., SINGLE, MULTIPLE.
mappingObjectNo-An object defining advanced field and value mappings. See Mapping Config below.

Source/Target Config (sourceConfig and targetConfig)

This object is used within an EDGE schema to define how to identify the source and target vertices.

NameTypeRequiredDefault ValueDescription
labelStringYes-The label of the source or target vertex.
idFieldsList<String>Yes-A list of source field names from the input row used to construct the ID of the source/target vertex. The values will be concatenated to form the vertex ID.

Mapping Config (mapping)

This object provides advanced control over how fields and values are mapped to properties.

NameTypeRequiredDefault ValueDescription
fieldMappingMap<String, String>No-A map where the key is the source field name and the value is the target property name in HugeGraph. If not specified, the source field name is used as the target property name.
valueMappingMap<Object, Object>No-A map to transform specific field values. The key is the original value from the source, and the value is the new value to be written.
nullableKeysList<String>No-A list of property keys that can have null values.
nullValuesList<String>No-A list of string values that should be treated as null. Any field containing one of these values will not be written.
dateFormatStringNoyyyy-MM-ddThe date format for parsing date strings.
timeZoneStringNoGMT+8The time zone for date parsing.
sortKeysList<String>For Edge-A list of property keys to sort edges with the same source and target vertices.

Usage Examples

1. Writing Vertices

This example shows how to read from a FakeSource and write person vertices to HugeGraph. The vertex ID is based on the name field.

env {
job.mode = "BATCH"
}

source {
FakeSource {
plugin_input = "fake_source"
schema = {
fields = {
name = "string"
age = "int"
}
}
}
}

sink {
HugeGraph {
host = "localhost"
port = 8080
graph_name = "hugegraph"
graph_space = "default"
selected_fields = ["name", "age"]
schema_config = {
type = "VERTEX"
label = "person"
idStrategy = "PRIMARY_KEY"
idFields = ["name"]
properties = ["name", "age"]
}
}
}

2. Writing Edges

This example syncs a relationship table to knows edges in HugeGraph. The source table contains the names of the two people who know each other and the year they met.

env {
job.mode = "BATCH"
}

source {
FakeSource {
plugin_input = "fake_source"
schema = {
fields = {
person1_name = "string"
person2_name = "string"
since = "int"
}
}
}
}

sink {
HugeGraph {
host = "localhost"
port = 8080
graph_name = "hugegraph"
graph_space = "default"
schema_config = {
type = "EDGE"
label = "knows"
sourceConfig = {
label = "person"
idFields = ["person1_name"]
}
targetConfig = {
label = "person"
idFields = ["person2_name"]
}
properties = ["since"]
mapping = {
fieldMapping = {
person1_name = "name"
person2_name = "name"
}
}
}
}
}

Changelog

Change Log
ChangeCommitVersion
[Feature][Connector-V2] Support sink connector for Apache HugeGraphhttps://github.com/apache/seatunnel/pull/10002/commits/002a653d11f48c3f76b47db23f5f2a68bc9d690c2.3.12