跳到主要内容
版本:Next

Github

Github 源连接器

描述

用于从 Github 读取数据。

关键特性

选项

名称类型必填默认值
urlString-
access_tokenString-
methodStringget
schema.fieldsConfig-
formatStringjson
paramsMap-
bodyString-
json_fieldConfig-
content_jsonString-
poll_interval_millisint-
retryint-
retry_backoff_multiplier_msint100
retry_backoff_max_msint10000
enable_multi_linesbooleanfalse
common-optionsconfig-

url [String]

HTTP 请求 URL。

access_token [String]

GitHub个人访问令牌,请参阅:创建个人访问令牌 - Github文档

method [String]

HTTP 请求方法。目前支持 GETPOST

params [Map]

http 参数

body [String]

HTTP 请求体

poll_interval_millis [int]

流模式下请求 API 的间隔时间(毫秒)。

retry [int]

请求失败(IOException)时最大重试次数。

retry_backoff_multiplier_ms [int]

请求失败时的退避时间(毫秒)乘数。

retry_backoff_max_ms [int]

请求失败时的最大退避时间(毫秒)。

format [String]

上游数据的格式,现在仅支持json text,默认是json

若你的数据格式为 json,需同时配置 schema 选项,例如:

上游数据如下:

{
"code": 200,
"data": "get success",
"success": true
}

您应该配置 schema 为以下内容:


schema {
fields {
code = int
data = string
success = boolean
}
}

连接器将生成如下数据:

codedatasuccess
200get successtrue

若你设置格式为 text,连接器不会对上游数据做出任何改变,示例:

上游数据如下:

{
"code": 200,
"data": "get success",
"success": true
}

连接器将生成如下数据:

content
{"code": 200, "data": "get success", "success": true}

schema [Config]

fields [Config]

上游数据的字段定义。

content_json [String]

该参数可用于提取一些 json 数据。如果你只需要 “book” 部分的数据,可以配置 content_field = "$.store.book.*".

如果你的返回数据如下所示:

{
"store": {
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
},
"expensive": 10
}

你可以配置 content_field = "$.store.book.*" 并且结果返回如下:

[
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
}
]

然后你可以通过更简单的 schema 配置获取所需的结果,例如:

Http {
url = "http://mockserver:1080/contentjson/mock"
method = "GET"
format = "json"
content_field = "$.store.book.*"
schema = {
fields {
category = string
author = string
title = string
price = string
}
}
}

这是一个例子:

json_field [Config]

该参数用于帮助你配置 schema,因此必须与 schema 一起使用。

如果你的数据如下所示:

{
"store": {
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
},
"expensive": 10
}

你可以通过如下任务配置获取 “book” 部分的内容:

source {
Http {
url = "http://mockserver:1080/jsonpath/mock"
method = "GET"
format = "json"
json_field = {
category = "$.store.book[*].category"
author = "$.store.book[*].author"
title = "$.store.book[*].title"
price = "$.store.book[*].price"
}
schema = {
fields {
category = string
author = string
title = string
price = string
}
}
}
}

common options

源插件通用参数,请参考 常用选项获取详细说明。

示例

Github {
url = "https://api.github.com/orgs/apache/repos"
access_token = "xxxx"
method = "GET"
format = "json"
schema = {
fields {
id = int
name = string
description = string
html_url = string
stargazers_count = int
forks = int
}
}
}

变更日志

Change Log
ChangeCommitVersion
[improve] http connector options (#8969)https://github.com/apache/seatunnel/commit/63ff9f910a2.3.10
[Feature][Connector-V2] Support TableSourceFactory/TableSinkFactory on http (#5816)https://github.com/apache/seatunnel/commit/6f49ec6ead2.3.4
[Feature][Connector-V2][Github] Adding Github Source Connector (#4155)https://github.com/apache/seatunnel/commit/49d9172b102.3.1