Fake
Description
Fake
is mainly used to conveniently generate user-specified data, which is used as input for functional verification, testing, and performance testing of seatunnel.
Engine Supported and plugin name
- Spark: Fake, FakeStream
- Flink: FakeSource, FakeSourceStream
- Flink
Fake Source
is mainly used to automatically generate data. The data has only two columns. The first column is ofString type
and the content is a random one from["Gary", "Ricky Huo", "Kid Xiong"]
. The second column is ofInt type
, which is the current 13-digit timestamp is used as input for functional verification and testing ofseatunnel
.
- Flink
Options
- Spark
- Flink
These options is for Spark:FakeStream
, and Spark:Fake
do not have any options
name | type | required | default value |
---|---|---|---|
content | array | no | - |
rate | number | yes | - |
common-options | string | yes | - |
content [array]
List of test data strings
rate [number]
Number of test cases generated per second
name | type | required | default value |
---|---|---|---|
parallelism | Int | no | - |
common-options | string | no | - |
mock_data_schema | list [column_config] | no | see details. |
mock_data_size | int | no | 300 |
mock_data_interval | int (second) | no | 1 |
parallelism [Int
]
The parallelism of an individual operator, for Fake Source Stream
common options [string]
Source plugin common parameters, please refer to Source Plugin for details
mock_data_schema Option [list[column_config]]
Config mock data's schema. Each is column_config option.
When mock_data_schema is not defined. Data will generate with schema like this:
mock_data_schema = [
{
name = "name",
type = "string",
mock_config = {
string_seed = ["Gary", "Ricky Huo", "Kid Xiong"]
size_range = [1,1]
}
}
{
name = "age",
type = "int",
mock_config = {
int_range = [1, 100]
}
}
]
column_config option type.
name | type | required | default value | support values |
---|---|---|---|---|
name | string | yes | string | - |
type | string | yes | string | int,integer,byte,boolean,char, character,short,long,float,double, date,timestamp,decimal,bigdecimal, bigint,int[],byte[], boolean[],char[],character[],short[], long[],float[],double[],string[], binary,varchar |
mock_config | mock_config | no | - | - |
mock_config Option
name | type | required | default value | sample |
---|---|---|---|---|
byte_range | list[byte][size=2] | no | - | [0,127] |
boolean_seed | list[boolean] | no | - | [true, true, false] |
char_seed | list[char][size=2] | no | - | ['a','b','c'] |
date_range | list[string][size=2] | no | - | ["1970-01-01", "2100-12-31"] |
decimal_scale | int | no | - | 2 |
double_range | list[double][size=2] | no | - | [0.0, 10000.0] |
float_range | list[flout][size=2] | no | - | [0.0, 10000.0] |
int_range | list[int][size=2] | no | - | [0, 100] |
long_range | list[long][size=2] | no | - | [0, 100000] |
number_regex | string | no | - | "[1-9]{1}\d?" |
time_range | list[int][size=6] | no | - | [0,24,0,60,0,60] |
size_range | list[int][size=2] | no | - | [6,10] |
string_regex | string | no | - | "[a-z0-9]{5}\@\w{3}\.[a-z]{3}" |
string_seed | list[string] | no | - | ["Gary", "Ricky Huo", "Kid Xiong"] |
mock_data_size Option [int]
Config mock data size.
mock_data_interval Option [int]
Config the data can mock with interval, The unit is SECOND.
Examples
- Spark
- Flink
Fake
Fake {
result_table_name = "my_dataset"
}
FakeStream
fakeStream {
content = ['name=ricky&age=23', 'name=gary&age=28']
rate = 5
}
The generated data is as follows, randomly extract the string from the content
list
+-----------------+
|raw_message |
+-----------------+
|name=gary&age=28 |
|name=ricky&age=23|
+-----------------+