Skip to main content
Version: 2.1.3

Fake

Description

Fake is mainly used to conveniently generate user-specified data, which is used as input for functional verification, testing, and performance testing of seatunnel.

note

Engine Supported and plugin name

  • Spark: Fake, FakeStream
  • Flink: FakeSource, FakeSourceStream
    • Flink Fake Source is mainly used to automatically generate data. The data has only two columns. The first column is of String type and the content is a random one from ["Gary", "Ricky Huo", "Kid Xiong"] . The second column is of Int type , which is the current 13-digit timestamp is used as input for functional verification and testing of seatunnel .

Options

note

These options is for Spark:FakeStream, and Spark:Fake do not have any options

nametyperequireddefault value
contentarrayno-
ratenumberyes-
common-optionsstringyes-

content [array]

List of test data strings

rate [number]

Number of test cases generated per second

common options [string]

Source plugin common parameters, please refer to Source Plugin for details

mock_data_schema Option [list[column_config]]

Config mock data's schema. Each is column_config option.

When mock_data_schema is not defined. Data will generate with schema like this:

mock_data_schema = [
{
name = "name",
type = "string",
mock_config = {
string_seed = ["Gary", "Ricky Huo", "Kid Xiong"]
size_range = [1,1]
}
}
{
name = "age",
type = "int",
mock_config = {
int_range = [1, 100]
}
}
]

column_config option type.

nametyperequireddefault valuesupport values
namestringyesstring-
typestringyesstringint,integer,byte,boolean,char,
character,short,long,float,double,
date,timestamp,decimal,bigdecimal,
bigint,int[],byte[],
boolean[],char[],character[],short[],
long[],float[],double[],string[],
binary,varchar
mock_configmock_configno--

mock_config Option

nametyperequireddefault valuesample
byte_rangelist[byte][size=2]no-[0,127]
boolean_seedlist[boolean]no-[true, true, false]
char_seedlist[char][size=2]no-['a','b','c']
date_rangelist[string][size=2]no-["1970-01-01", "2100-12-31"]
decimal_scaleintno-2
double_rangelist[double][size=2]no-[0.0, 10000.0]
float_rangelist[flout][size=2]no-[0.0, 10000.0]
int_rangelist[int][size=2]no-[0, 100]
long_rangelist[long][size=2]no-[0, 100000]
number_regexstringno-"[1-9]{1}\d?"
time_rangelist[int][size=6]no-[0,24,0,60,0,60]
size_rangelist[int][size=2]no-[6,10]
string_regexstringno-"[a-z0-9]{5}\@\w{3}\.[a-z]{3}"
string_seedlist[string]no-["Gary", "Ricky Huo", "Kid Xiong"]

mock_data_size Option [int]

Config mock data size.

mock_data_interval Option [int]

Config the data can mock with interval, The unit is SECOND.

Examples

Fake

Fake {
result_table_name = "my_dataset"
}

FakeStream

fakeStream {
content = ['name=ricky&age=23', 'name=gary&age=28']
rate = 5
}

The generated data is as follows, randomly extract the string from the content list

+-----------------+
|raw_message |
+-----------------+
|name=gary&age=28 |
|name=ricky&age=23|
+-----------------+