Skip to main content
Version: 2.3.8

Http

Http source connector

Support Those Engines​

Spark
Flink
SeaTunnel Zeta

Key Features​

Description​

Used to read data from Http.

Key features​

Supported DataSource Info​

In order to use the Http connector, the following dependencies are required. They can be downloaded via install-plugin.sh or from the Maven central repository.

DatasourceSupported VersionsDependency
HttpuniversalDownload

Source Options​

NameTypeRequiredDefaultDescription
urlStringYes-Http request url.
schemaConfigNo-Http and seatunnel data structure mapping
schema.fieldsConfigNo-The schema fields of upstream data
json_fieldConfigNo-This parameter helps you configure the schema,so this parameter must be used with schema.
pageingConfigNo-This parameter is used for paging queries
pageing.page_fieldStringNo-This parameter is used to specify the page field name in the request parameter
pageing.total_page_sizeIntNo-This parameter is used to control the total number of pages
pageing.batch_sizeIntNo-The batch size returned per request is used to determine whether to continue when the total number of pages is unknown
pageing.start_page_numberIntNo1Specify the page number from which synchronization starts
content_jsonStringNo-This parameter can get some json data.If you only need the data in the 'book' section, configure content_field = "$.store.book.*".
formatStringNotextThe format of upstream data, now only support json text, default text.
methodStringNogetHttp request method, only supports GET, POST method.
headersMapNo-Http headers.
paramsMapNo-Http params,the program will automatically add http header application/x-www-form-urlencoded.
bodyStringNo-Http body,the program will automatically add http header application/json,body is jsonbody.
poll_interval_millisIntNo-Request http api interval(millis) in stream mode.
retryIntNo-The max retry times if request http return to IOException.
retry_backoff_multiplier_msIntNo100The retry-backoff times(millis) multiplier if request http failed.
retry_backoff_max_msIntNo10000The maximum retry-backoff times(millis) if request http failed
enable_multi_linesBooleanNofalse
connect_timeout_msIntNo12000Connection timeout setting, default 12s.
socket_timeout_msIntNo60000Socket timeout setting, default 60s.
common-optionsNo-Source plugin common parameters, please refer to Source Common Options for details

How to Create a Http Data Synchronization Jobs​

env {
parallelism = 1
job.mode = "BATCH"
}

source {
Http {
result_table_name = "http"
url = "http://mockserver:1080/example/http"
method = "GET"
format = "json"
schema = {
fields {
c_map = "map<string, string>"
c_array = "array<int>"
c_string = string
c_boolean = boolean
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
c_bytes = bytes
c_date = date
c_decimal = "decimal(38, 18)"
c_timestamp = timestamp
c_row = {
C_MAP = "map<string, string>"
C_ARRAY = "array<int>"
C_STRING = string
C_BOOLEAN = boolean
C_TINYINT = tinyint
C_SMALLINT = smallint
C_INT = int
C_BIGINT = bigint
C_FLOAT = float
C_DOUBLE = double
C_BYTES = bytes
C_DATE = date
C_DECIMAL = "decimal(38, 18)"
C_TIMESTAMP = timestamp
}
}
}
}
}

# Console printing of the read Http data
sink {
Console {
parallelism = 1
}
}

Parameter Interpretation​

format​

when you assign format is json, you should also assign schema option, for example:

upstream data is the following:

{
"code": 200,
"data": "get success",
"success": true
}

you should assign schema as the following:


schema {
fields {
code = int
data = string
success = boolean
}
}

connector will generate data as the following:

codedatasuccess
200get successtrue

when you assign format is text, connector will do nothing for upstream data, for example:

upstream data is the following:

{
"code": 200,
"data": "get success",
"success": true
}

connector will generate data as the following:

content
{"code": 200, "data": "get success", "success": true}

content_json​

This parameter can get some json data.If you only need the data in the 'book' section, configure content_field = "$.store.book.*".

If your return data looks something like this.

{
"store": {
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
},
"expensive": 10
}

You can configure content_field = "$.store.book.*" and the result returned looks like this:

[
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
}
]

Then you can get the desired result with a simpler schema,like

Http {
url = "http://mockserver:1080/contentjson/mock"
method = "GET"
format = "json"
content_field = "$.store.book.*"
schema = {
fields {
category = string
author = string
title = string
price = string
}
}
}

Here is an example:

json_field​

This parameter helps you configure the schema,so this parameter must be used with schema.

If your data looks something like this:

{ 
"store": {
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
},
"expensive": 10
}

You can get the contents of 'book' by configuring the task as follows:

source {
Http {
url = "http://mockserver:1080/jsonpath/mock"
method = "GET"
format = "json"
json_field = {
category = "$.store.book[*].category"
author = "$.store.book[*].author"
title = "$.store.book[*].title"
price = "$.store.book[*].price"
}
schema = {
fields {
category = string
author = string
title = string
price = string
}
}
}
}

pageing​

source {
Http {
url = "http://localhost:8080/mock/queryData"
method = "GET"
format = "json"
params={
page: "${page}"
}
content_field = "$.data.*"
pageing={
total_page_size=20
page_field=page
#when don't know the total_page_size use batch_size if read size<batch_size finish ,otherwise continue
#batch_size=10
}
schema = {
fields {
name = string
age = string
}
}
}
}

Changelog​

2.2.0-beta 2022-09-26​

  • Add Http Source Connector

new version​

  • [Feature][Connector-V2][HTTP] Use json-path parsing (3510)