Skip to main content
Version: 2.3.3

FakeSource

FakeSource connector

Description​

The FakeSource is a virtual data source, which randomly generates the number of rows according to the data structure of the user-defined schema, just for some test cases such as type conversion or connector new feature testing

Key features​

Options​

nametyperequireddefault value
schemaconfigyes-
rowsconfigno-
row.numintno5
split.numintno1
split.read-intervallongno1
map.sizeintno5
array.sizeintno5
bytes.lengthintno5
string.lengthintno5
string.fake.modestringnorange
tinyint.fake.modestringnorange
tinyint.mintinyintno0
tinyint.maxtinyintno127
tinyint.templatelistno-
smallint.fake.modestringnorange
smallint.minsmallintno0
smallint.maxsmallintno32767
smallint.templatelistno-
int.fake.templatestringnorange
int.minintno0
int.maxintno0x7fffffff
int.templatelistno-
bigint.fake.modestringnorange
bigint.minbigintno0
bigint.maxbigintno0x7fffffffffffffff
bigint.templatelistno-
float.fake.modestringnorange
float.minfloatno0
float.maxfloatno0x1.fffffeP+127
float.templatelistno-
double.fake.modestringnorange
double.mindoubleno0
double.maxdoubleno0x1.fffffffffffffP+1023
double.templatelistno-
common-optionsno-

schema [config]​

fields [Config]​

The schema of fake data that you want to generate

Examples​

schema = {
fields {
c_map = "map<string, array<int>>"
c_array = "array<int>"
c_string = string
c_boolean = boolean
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
c_decimal = "decimal(30, 8)"
c_null = "null"
c_bytes = bytes
c_date = date
c_timestamp = timestamp
c_row = {
c_map = "map<string, map<string, string>>"
c_array = "array<int>"
c_string = string
c_boolean = boolean
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
c_decimal = "decimal(30, 8)"
c_null = "null"
c_bytes = bytes
c_date = date
c_timestamp = timestamp
}
}
}

rows​

The row list of fake data output per degree of parallelism

example

rows = [
{
kind = INSERT
fields = [1, "A", 100]
},
{
kind = UPDATE_BEFORE
fields = [1, "A", 100]
},
{
kind = UPDATE_AFTER
fields = [1, "A_1", 100]
},
{
kind = DELETE
fields = [1, "A_1", 100]
}
]

row.num​

The total number of data generated per degree of parallelism

split.num​

the number of splits generated by the enumerator for each degree of parallelism

split.read-interval​

The interval(mills) between two split reads in a reader

map.size​

The size of map type that connector generated

array.size​

The size of array type that connector generated

bytes.length​

The length of bytes type that connector generated

string.length​

The length of string type that connector generated

string.fake.mode​

The fake mode of generating string data, support range and template, default range,if use configured it to template, user should also configured string.template option

string.template​

The template list of string type that connector generated, if user configured it, connector will randomly select an item from the template list

tinyint.fake.mode​

The fake mode of generating tinyint data, support range and template, default range,if use configured it to template, user should also configured tinyint.template option

tinyint.min​

The min value of tinyint data that connector generated

tinyint.max​

The max value of tinyint data that connector generated

tinyint.template​

The template list of tinyint type that connector generated, if user configured it, connector will randomly select an item from the template list

smallint.fake.mode​

The fake mode of generating smallint data, support range and template, default range,if use configured it to template, user should also configured smallint.template option

smallint.min​

The min value of smallint data that connector generated

smallint.max​

The max value of smallint data that connector generated

smallint.template​

The template list of smallint type that connector generated, if user configured it, connector will randomly select an item from the template list

int.fake.mode​

The fake mode of generating int data, support range and template, default range,if use configured it to template, user should also configured int.template option

int.min​

The min value of int data that connector generated

int.max​

The max value of int data that connector generated

int.template​

The template list of int type that connector generated, if user configured it, connector will randomly select an item from the template list

bigint.fake.mode​

The fake mode of generating bigint data, support range and template, default range,if use configured it to template, user should also configured bigint.template option

bigint.min​

The min value of bigint data that connector generated

bigint.max​

The max value of bigint data that connector generated

bigint.template​

The template list of bigint type that connector generated, if user configured it, connector will randomly select an item from the template list

float.fake.mode​

The fake mode of generating float data, support range and template, default range,if use configured it to template, user should also configured float.template option

float.min​

The min value of float data that connector generated

float.max​

The max value of float data that connector generated

float.template​

The template list of float type that connector generated, if user configured it, connector will randomly select an item from the template list

double.fake.mode​

The fake mode of generating float data, support range and template, default range,if use configured it to template, user should also configured double.template option

double.min​

The min value of double data that connector generated

double.max​

The max value of double data that connector generated

double.template​

The template list of double type that connector generated, if user configured it, connector will randomly select an item from the template list

common options​

Source plugin common parameters, please refer to Source Common Options for details

Example​

Auto generate data rows

FakeSource {
row.num = 10
map.size = 10
array.size = 10
bytes.length = 10
string.length = 10
schema = {
fields {
c_map = "map<string, array<int>>"
c_array = "array<int>"
c_string = string
c_boolean = boolean
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
c_decimal = "decimal(30, 8)"
c_null = "null"
c_bytes = bytes
c_date = date
c_timestamp = timestamp
c_row = {
c_map = "map<string, map<string, string>>"
c_array = "array<int>"
c_string = string
c_boolean = boolean
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
c_decimal = "decimal(30, 8)"
c_null = "null"
c_bytes = bytes
c_date = date
c_timestamp = timestamp
}
}
}
}

Using fake data rows

FakeSource {
schema = {
fields {
pk_id = bigint
name = string
score = int
}
}
rows = [
{
kind = INSERT
fields = [1, "A", 100]
},
{
kind = INSERT
fields = [2, "B", 100]
},
{
kind = INSERT
fields = [3, "C", 100]
},
{
kind = UPDATE_BEFORE
fields = [1, "A", 100]
},
{
kind = UPDATE_AFTER
fields = [1, "A_1", 100]
},
{
kind = DELETE
fields = [2, "B", 100]
}
]
}

Using template

FakeSource {
row.num = 5
string.fake.mode = "template"
string.template = ["tyrantlucifer", "hailin", "kris", "fanjia", "zongwen", "gaojun"]
tinyint.fake.mode = "template"
tinyint.template = [1, 2, 3, 4, 5, 6, 7, 8, 9]
smalling.fake.mode = "template"
smallint.template = [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
int.fake.mode = "template"
int.template = [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
bigint.fake.mode = "template"
bigint.template = [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
float.fake.mode = "template"
float.template = [40.0, 41.0, 42.0, 43.0]
double.fake.mode = "template"
double.template = [44.0, 45.0, 46.0, 47.0]
schema {
fields {
c_string = string
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
}
}
}

Use range

FakeSource {
row.num = 5
string.template = ["tyrantlucifer", "hailin", "kris", "fanjia", "zongwen", "gaojun"]
tinyint.min = 1
tinyint.max = 9
smallint.min = 10
smallint.max = 19
int.min = 20
int.max = 29
bigint.min = 30
bigint.max = 39
float.min = 40.0
float.max = 43.0
double.min = 44.0
double.max = 47.0
schema {
fields {
c_string = string
c_tinyint = tinyint
c_smallint = smallint
c_int = int
c_bigint = bigint
c_float = float
c_double = double
}
}
}

Changelog​

2.2.0-beta 2022-09-26​

  • Add FakeSource Source Connector

2.3.0-beta 2022-10-20​

  • [Improve] Supports direct definition of data values(row) (2839)
  • [Improve] Improve fake source connector: (2944)
    • Support user-defined map size
    • Support user-defined array size
    • Support user-defined string length
    • Support user-defined bytes length
  • [Improve] Support multiple splits for fake source connector (2974)
  • [Improve] Supports setting the number of splits per parallelism and the reading interval between two splits (3098)

next version​

  • [Feature] Support config fake data rows 3865
  • [Feature] Support config template or range for fake data 3932