Version: Next

Pulsar

Pulsar sink connector

Support Those Engines

Spark
Flink
Seatunnel Zeta

Key features

exactly-once

Description

Sink connector for Apache Pulsar.

Supported DataSource Info

Datasource	Supported Versions
Pulsar	Universal

Sink Options

Name	Type	Required	Default	Description
topic	String	Yes	-	sink pulsar topic
client.service-url	String	Yes	-	Service URL provider for Pulsar service.
admin.service-url	String	Yes	-	The Pulsar service HTTP URL for the admin endpoint.
auth.plugin-class	String	No	-	Name of the authentication plugin.
auth.params	String	No	-	Parameters for the authentication plugin.
format	String	No	json	Data format. The default format is json. Optional text format.
field_delimiter	String	No	,	Customize the field delimiter for data format.
semantics	Enum	No	AT_LEAST_ONCE	Consistency semantics for writing to pulsar.
transaction_timeout	Int	No	600	The transaction timeout is specified as 10 minutes by default.
pulsar.config	Map	No	-	In addition to the above parameters that must be specified by the Pulsar producer client.
message.routing.mode	Enum	No	RoundRobinPartition	Default routing mode for messages to partition.
partition_key_fields	array	No	-	Configure which fields are used as the key of the pulsar message.
common-options	config	no	-	Source plugin common parameters, please refer to Source Common Options for details.

Parameter Interpretation

client.service-url [String]

Service URL provider for Pulsar service. To connect to Pulsar using client libraries, you need to specify a Pulsar protocol URL. You can assign Pulsar protocol URLs to specific clusters and use the Pulsar scheme.

For example, localhost: pulsar://localhost:6650,localhost:6651.

admin.service-url [String]

The Pulsar service HTTP URL for the admin endpoint.

For example, http://my-broker.example.com:8080, or https://my-broker.example.com:8443 for TLS.

auth.plugin-class [String]

Name of the authentication plugin.

auth.params [String]

Parameters for the authentication plugin.

For example, key1:val1,key2:val2

format [String]

Data format. The default format is json. Optional text format. The default field separator is ",". If you customize the delimiter, add the "field_delimiter" option.

field_delimiter [String]

Customize the field delimiter for data format.The default field_delimiter is ','.

semantics [Enum]

Consistency semantics for writing to pulsar. Available options are EXACTLY_ONCE,NON,AT_LEAST_ONCE, default AT_LEAST_ONCE. If semantic is specified as EXACTLY_ONCE, we will use 2pc to guarantee the message is sent to pulsar exactly once. If semantic is specified as NON, we will directly send the message to pulsar, the data may duplicat/lost if job restart/retry or network error.

transaction_timeout [Int]

The transaction timeout is specified as 10 minutes by default. If the transaction does not commit within the specified timeout, the transaction will be automatically aborted. So you need to ensure that the timeout is greater than the checkpoint interval.

pulsar.config [Map]

In addition to the above parameters that must be specified by the Pulsar producer client, the user can also specify multiple non-mandatory parameters for the producer client, covering all the producer parameters specified in the official Pulsar document.

message.routing.mode [Enum]

Default routing mode for messages to partition. Available options are SinglePartition,RoundRobinPartition. If you choose SinglePartition, If no key is provided, The partitioned producer will randomly pick one single partition and publish all the messages into that partition, If a key is provided on the message, the partitioned producer will hash the key and assign message to a particular partition. If you choose RoundRobinPartition, If no key is provided, the producer will publish messages across all partitions in round-robin fashion to achieve maximum throughput. Please note that round-robin is not done per individual message but rather it's set to the same boundary of batching delay, to ensure batching is effective.

partition_key_fields [String]

Configure which fields are used as the key of the pulsar message.

For example, if you want to use value of fields from upstream data as key, you can assign field names to this property.

Upstream data is the following:

name	age	data
Jack	16	data-example1
Mary	23	data-example2

If name is set as the key, then the hash value of the name column will determine which partition the message is sent to.

If not set partition key fields, the null message key will be sent to.

The format of the message key is json, If name is set as the key, for example '{"name":"Jack"}'.

The selected field must be an existing field in the upstream.

common options

Source plugin common parameters, please refer to Source Common Options for details.

Task Example

Simple

This example defines a SeaTunnel synchronization task that automatically generates data through FakeSource and sends it to Pulsar Sink. FakeSource generates a total of 16 rows of data (row.num=16), with each row having two fields, name (string type) and age (int type). The final target topic is test_topic will also be 16 rows of data in the topic. And if you have not yet installed and deployed SeaTunnel, you need to follow the instructions in Install SeaTunnel to install and deploy SeaTunnel. And then follow the instructions in Quick Start With SeaTunnel Engine to run this job.

# Defining the runtime environment
env {
  # You can set flink configuration here
  execution.parallelism = 1
  job.mode = "BATCH"
}

source {
  FakeSource {
    parallelism = 1
    plugin_output = "fake"
    row.num = 16
    schema = {
      fields {
        name = "string"
        age = "int"
      }
    }
  }
}

sink {
  Pulsar {
    topic = "example"
    client.service-url = "localhost:pulsar://localhost:6650"
    admin.service-url = "http://my-broker.example.com:8080"
    plugin_output = "test"
    pulsar.config = {
        sendTimeoutMs = 30000
    }
  }
}

Changelog

Change Log

Change	Commit	Version
[Feature][Checkpoint] Add check script for source/sink state class serialVersionUID missing (#9118)	https://github.com/apache/seatunnel/commit/4f5adeb1c7	2.3.11
[Improve] restruct connector common options (#8634)	https://github.com/apache/seatunnel/commit/f3499a6eeb	2.3.10
[Improve][dist]add shade check rule (#8136)	https://github.com/apache/seatunnel/commit/51ef800016	2.3.9
[Feature][Restapi] Allow metrics information to be associated to logical plan nodes (#7786)	https://github.com/apache/seatunnel/commit/6b7c53d03c	2.3.9
[Improve][API] Make sure the table name in TablePath not be null (#7252)	https://github.com/apache/seatunnel/commit/764d8b0bc8	2.3.7
[Feature][Kafka] Support multi-table source read (#5992)	https://github.com/apache/seatunnel/commit/60104602d1	2.3.6
[PulsarSource]Improve pulsar throughput performance. (#6234)	https://github.com/apache/seatunnel/commit/37461f4f3e	2.3.4
[Feature][Connector-v2][PulsarSink]Add Pulsar Sink Connector. (#4382)	https://github.com/apache/seatunnel/commit/543d2c5086	2.3.4
[Chore] Remove useless DeserializationFormatFactory and its implement (#5880)	https://github.com/apache/seatunnel/commit/f0511544ff	2.3.4
fix: update IDENTIFIER = Pulsar for pulsar-datasource on project:seatunnel-web (#5852)	https://github.com/apache/seatunnel/commit/3b6de3743e	2.3.4
[Improve][Common] Introduce new error define rule (#5793)	https://github.com/apache/seatunnel/commit/9d1b2582b2	2.3.4
Support config column/primaryKey/constraintKey in schema (#5564)	https://github.com/apache/seatunnel/commit/eac76b4e50	2.3.4
[Improve][CheckStyle] Remove useless 'SuppressWarnings' annotation of checkstyle. (#5260)	https://github.com/apache/seatunnel/commit/51c0d709ba	2.3.4
[Hotfix] Fix com.google.common.base.Preconditions to seatunnel shade one (#5284)	https://github.com/apache/seatunnel/commit/ed5eadcf73	2.3.3
[Feature][Json-format] support read format for pulsar (#4111)	https://github.com/apache/seatunnel/commit/7d61ae93e7	2.3.2
[hotfix][pulsar] Fix the bug that can't consume messages all the time. (#4125)	https://github.com/apache/seatunnel/commit/a6705cc5bf	2.3.2
[Feature] add cdc multiple table support & fix zeta bug	https://github.com/apache/seatunnel/commit/533ff2c2fa	2.3.1
[hotfix][pulsar] PulsarSource consumer ack exception. (#4237)	https://github.com/apache/seatunnel/commit/9725d675da	2.3.1
Merge branch 'dev' into merge/cdc	https://github.com/apache/seatunnel/commit/4324ee1912	2.3.1
[Improve][Project] Code format with spotless plugin.	https://github.com/apache/seatunnel/commit/423b583038	2.3.1
[Improve][Connector-v2][Pulsar] Set the name of the pulsar consumption thread. (#4182)	https://github.com/apache/seatunnel/commit/e567203f7d	2.3.1
[improve][api] Refactoring schema parse (#4157)	https://github.com/apache/seatunnel/commit/b2f573a13e	2.3.1
[Improve][build] Give the maven module a human readable name (#4114)	https://github.com/apache/seatunnel/commit/d7cd601051	2.3.1
[Improve][Project] Code format with spotless plugin. (#4101)	https://github.com/apache/seatunnel/commit/a2ab166561	2.3.1
[Bug][Connector-v2][PulsarSource]Fix pulsar option topic-pattern bug. (#3989)	https://github.com/apache/seatunnel/commit/aee2c580ea	2.3.1
[Feature][Connector] add get source method to all source connector (#3846)	https://github.com/apache/seatunnel/commit/417178fb84	2.3.1
[Feature][API & Connector & Doc] add parallelism and column projection interface (#3829)	https://github.com/apache/seatunnel/commit/b9164b8ba1	2.3.1
[Improve][Connector-V2][Pulsar] Unified exception for Pulsar source &… (#3590)	https://github.com/apache/seatunnel/commit/4fe9323419	2.3.0
[Hotfix][OptionRule] Fix option rule about all connectors (#3592)	https://github.com/apache/seatunnel/commit/226dc6a119	2.3.0
[Hotfix][Connector-V2][Pulsar] fix conditional options (#3504)	https://github.com/apache/seatunnel/commit/0066affacf	2.3.0
[Feature][Connector][pulsar] expose configurable options in Pulsar (#3341)	https://github.com/apache/seatunnel/commit/200faa7c29	2.3.0
[Connector][Dependency] Add Miss Dependency Cassandra And Change Kudu Plugin Name (#3432)	https://github.com/apache/seatunnel/commit/6ac6a0a0cd	2.3.0
[chore] fix pulsar consumer comment error (#3356)	https://github.com/apache/seatunnel/commit/91e632c526	2.3.0
[Connector-V2][ElasticSearch] Add ElasticSearch Source/Sink Factory (#3325)	https://github.com/apache/seatunnel/commit/38254e3f26	2.3.0
[hotfix][connector][pulsar] Fix not being able to mark #noMoreNewSplits when restoring (#2945)	https://github.com/apache/seatunnel/commit/5ad69076b3	2.3.0-beta
Move Handover to common module (#2877)	https://github.com/apache/seatunnel/commit/d94a874bcb	2.3.0-beta
[hotfix][connector-v2] fix pulsar source exceptions (#2820)	https://github.com/apache/seatunnel/commit/8ff0ba7015	2.2.0-beta
[#2606]Dependency management split (#2630)	https://github.com/apache/seatunnel/commit/fc047be69b	2.2.0-beta
[SeaTunnel]Simply seatunnel package pipeline. (#2563)	https://github.com/apache/seatunnel/commit/9d88b6221a	2.2.0-beta
[Improve][Connector-V2] Pulsar support user-defined schema (#2436)	https://github.com/apache/seatunnel/commit/16cabe6a35	2.2.0-beta
[improve][UT] Upgrade junit to 5.+ (#2305)	https://github.com/apache/seatunnel/commit/362319ff3e	2.2.0-beta
StateT of SeaTunnelSource should extend `Serializable` (#2214)	https://github.com/apache/seatunnel/commit/8c426ef850	2.2.0-beta
[doc][connector-v2] pulsar source options doc (#2128)	https://github.com/apache/seatunnel/commit/59ce8a2b32	2.2.0-beta
[api-draft][Optimize] Optimize module name (#2062)	https://github.com/apache/seatunnel/commit/f79e3112b1	2.2.0-beta

Pulsar

Support Those Engines​

Key features​

Description​

Supported DataSource Info​

Sink Options​

Parameter Interpretation​

client.service-url [String]​

admin.service-url [String]​

auth.plugin-class [String]​

auth.params [String]​

format [String]​

field_delimiter [String]​

semantics [Enum]​

transaction_timeout [Int]​

pulsar.config [Map]​

message.routing.mode [Enum]​

partition_key_fields [String]​

common options​

Task Example​

Simple​

Changelog​