Version: 2.1.3

Fake

Description

Fake is mainly used to conveniently generate user-specified data, which is used as input for functional verification, testing, and performance testing of seatunnel.

note

Engine Supported and plugin name

Spark: Fake, FakeStream
Flink: FakeSource, FakeSourceStream
- Flink Fake Source is mainly used to automatically generate data. The data has only two columns. The first column is of String type and the content is a random one from ["Gary", "Ricky Huo", "Kid Xiong"] . The second column is of Int type , which is the current 13-digit timestamp is used as input for functional verification and testing of seatunnel .

Options

Spark
Flink

note

These options is for Spark:FakeStream, and Spark:Fake do not have any options

name	type	required	default value
content	array	no	-
rate	number	yes	-
common-options	string	yes	-

content [array]

List of test data strings

rate [number]

Number of test cases generated per second

name	type	required	default value
parallelism	`Int`	no	-
common-options	`string`	no	-
mock_data_schema	list [column_config]	no	see details.
mock_data_size	int	no	300
mock_data_interval	int (second)	no	1

parallelism [`Int`]

The parallelism of an individual operator, for Fake Source Stream

common options [string]

Source plugin common parameters, please refer to Source Plugin for details

mock_data_schema Option [list[column_config]]

Config mock data's schema. Each is column_config option.

When mock_data_schema is not defined. Data will generate with schema like this:

mock_data_schema = [
  {
    name = "name",
    type = "string",
    mock_config = {
      string_seed = ["Gary", "Ricky Huo", "Kid Xiong"]
      size_range = [1,1]
    }
  }
  {
    name = "age",
    type = "int",
    mock_config = {
      int_range = [1, 100]
    }
  }
]

column_config option type.

name	type	required	default value	support values
name	string	yes	string	-
type	string	yes	string	int,integer,byte,boolean,char, character,short,long,float,double, date,timestamp,decimal,bigdecimal, bigint,int[],byte[], boolean[],char[],character[],short[], long[],float[],double[],string[], binary,varchar
mock_config	mock_config	no	-	-

mock_config Option

name	type	required	default value	sample
byte_range	list[byte][size=2]	no	-	[0,127]
boolean_seed	list[boolean]	no	-	[true, true, false]
char_seed	list[char][size=2]	no	-	['a','b','c']
date_range	list[string][size=2]	no	-	["1970-01-01", "2100-12-31"]
decimal_scale	int	no	-	2
double_range	list[double][size=2]	no	-	[0.0, 10000.0]
float_range	list[flout][size=2]	no	-	[0.0, 10000.0]
int_range	list[int][size=2]	no	-	[0, 100]
long_range	list[long][size=2]	no	-	[0, 100000]
number_regex	string	no	-	"[1-9]{1}\d?"
time_range	list[int][size=6]	no	-	[0,24,0,60,0,60]
size_range	list[int][size=2]	no	-	[6,10]
string_regex	string	no	-	"[a-z0-9]{5}\@\w{3}\.[a-z]{3}"
string_seed	list[string]	no	-	["Gary", "Ricky Huo", "Kid Xiong"]

mock_data_size Option [int]

Config mock data size.

mock_data_interval Option [int]

Config the data can mock with interval, The unit is SECOND.

Examples

Spark
Flink

Fake

Fake {
    result_table_name = "my_dataset"
}

FakeStream

fakeStream {
    content = ['name=ricky&age=23', 'name=gary&age=28']
    rate = 5
}

The generated data is as follows, randomly extract the string from the content list

+-----------------+
|raw_message      |
+-----------------+
|name=gary&age=28 |
|name=ricky&age=23|
+-----------------+

FakeSourceStream

source {
    FakeSourceStream {
        result_table_name = "fake"
        field_name = "name,age"
    }
}

FakeSource

source {
    FakeSource {
        result_table_name = "fake"
        field_name = "name,age"
        mock_data_size = 100 // will generate 100 rows mock data.
    }
}

Fake

Description​

Options​

content [array]​

rate [number]​

parallelism [Int]​

common options [string]​

mock_data_schema Option [list[column_config]]​

mock_data_size Option [int]​

mock_data_interval Option [int]​

Examples​

Fake​

FakeStream​

FakeSourceStream​

FakeSource​

Description

Options

content [array]

rate [number]

parallelism [`Int`]

common options [string]

mock_data_schema Option [list[column_config]]

mock_data_size Option [int]

mock_data_interval Option [int]

Examples

Fake

FakeStream

FakeSourceStream

FakeSource