Skip to main content
Version: Next

Lance

Lance sink connector

Support Those Engines

Spark(not support version under spark 3.4, reference https://lance.org/integrations/spark/install/#scala)
Flink(not support, reference https://github.com/lance-format/lance-flink)
SeaTunnel Zeta

Description

Sink connector for Lance format. It can support create and write dataset 、lance namespace manage schema and version.

Key features

  • [][exactly-once](../../concept/connector-v2-features.md)

Using Dependency

    <dependency>
<groupId>com.lancedb</groupId>
<artifactId>lance-core</artifactId>
<version>0.33.0</version>
</dependency>

<dependency>
<groupId>com.lancedb</groupId>
<artifactId>lance-namespace-core</artifactId>
<version>0.0.14</version>
</dependency>

Sink Options

NameTypeRequiredDefaultDescription
dataset_pathstringyes/tmpThe dataset path for the Lance sink connection.
namespace_typestringyesdirThe namespace type of Lance dataset, now only support DirectoryNamespace, the type will be set default with "dir"
tablestringyestestThe name of Lance dataset, If not set, the dataset name will be set default with test
namespace_idstringno-The id of the lance namespace. Please refer to https://lance.org/format/namespace/

Data Type Mapping

The data type of lance depends on the Arrow data type system

SeaTunnel Data typeLance Data type
BOOLEANbool/boolean
TINYINTint8
SMALLINTint16
INTint32
BIGINTint64
FLOATfloat16
DOUBLEfloat32
BYTESbinary
DATEDATE
TIMETIME
TIMESTAMPTIMESTAMP
STRINGstring/utf8

Task Example

Simple

env {
parallelism = 1
job.mode = "BATCH"

# You can set spark configuration here
spark.app.name = "SeaTunnel"
spark.executor.instances = 2
spark.executor.cores = 1
spark.executor.memory = "1g"
spark.master = local
}

source {
FakeSource {
row.num = 100
schema = {
fields {
c_string = string
c_boolean = boolean
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
c_decimal = "decimal(30, 8)"
c_bytes = bytes
c_date = date
c_timestamp = timestamp
}
}
plugin_output = "fake"
}
}

transform {
}

sink {
Lance {
dataset_path = "/tmp/seatunnel_mnt/lanceTest/lance_sink_table"
namespace_type = "dir"
namespace_id = "root"
table = "lance_sink_table"
}
}

Changelog

Change Log
ChangeCommitVersion