Fake
Description
Fake is mainly used to conveniently generate user-specified data, which is used as input for functional verification, testing, and performance testing of seatunnel.
Engine Supported and plugin name
- Spark: Fake, FakeStream
- Flink: FakeSource, FakeSourceStream
- Flink
Fake Sourceis mainly used to automatically generate data. The data has only two columns. The first column is ofString typeand the content is a random one from["Gary", "Ricky Huo", "Kid Xiong"]. The second column is ofInt type, which is the current 13-digit timestamp is used as input for functional verification and testing ofseatunnel.
- Flink
Options
- Spark
- Flink
These options is for Spark:FakeStream, and Spark:Fake do not have any options
| name | type | required | default value |
|---|---|---|---|
| content | array | no | - |
| rate | number | yes | - |
| common-options | string | yes | - |
content [array]
List of test data strings
rate [number]
Number of test cases generated per second
| name | type | required | default value |
|---|---|---|---|
| parallelism | Int | no | - |
| common-options | string | no | - |
| mock_data_schema | list [column_config] | no | see details. |
| mock_data_size | int | no | 300 |
| mock_data_interval | int (second) | no | 1 |
parallelism [Int]
The parallelism of an individual operator, for Fake Source Stream
common options [string]
Source plugin common parameters, please refer to Source Plugin for details
mock_data_schema Option [list[column_config]]
Config mock data's schema. Each is column_config option.
When mock_data_schema is not defined. Data will generate with schema like this:
mock_data_schema = [
{
name = "name",
type = "string",
mock_config = {
string_seed = ["Gary", "Ricky Huo", "Kid Xiong"]
size_range = [1,1]
}
}
{
name = "age",
type = "int",
mock_config = {
int_range = [1, 100]
}
}
]
column_config option type.
| name | type | required | default value | support values |
|---|---|---|---|---|
| name | string | yes | string | - |
| type | string | yes | string | int,integer,byte,boolean,char, character,short,long,float,double, date,timestamp,decimal,bigdecimal, bigint,int[],byte[], boolean[],char[],character[],short[], long[],float[],double[],string[], binary,varchar |
| mock_config | mock_config | no | - | - |
mock_config Option
| name | type | required | default value | sample |
|---|---|---|---|---|
| byte_range | list[byte][size=2] | no | - | [0,127] |
| boolean_seed | list[boolean] | no | - | [true, true, false] |
| char_seed | list[char][size=2] | no | - | ['a','b','c'] |
| date_range | list[string][size=2] | no | - | ["1970-01-01", "2100-12-31"] |
| decimal_scale | int | no | - | 2 |
| double_range | list[double][size=2] | no | - | [0.0, 10000.0] |
| float_range | list[flout][size=2] | no | - | [0.0, 10000.0] |
| int_range | list[int][size=2] | no | - | [0, 100] |
| long_range | list[long][size=2] | no | - | [0, 100000] |
| number_regex | string | no | - | "[1-9]{1}\d?" |
| time_range | list[int][size=6] | no | - | [0,24,0,60,0,60] |
| size_range | list[int][size=2] | no | - | [6,10] |
| string_regex | string | no | - | "[a-z0-9]{5}\@\w{3}\.[a-z]{3}" |
| string_seed | list[string] | no | - | ["Gary", "Ricky Huo", "Kid Xiong"] |
mock_data_size Option [int]
Config mock data size.
mock_data_interval Option [int]
Config the data can mock with interval, The unit is SECOND.
Examples
- Spark
- Flink
Fake
Fake {
result_table_name = "my_dataset"
}
FakeStream
fakeStream {
content = ['name=ricky&age=23', 'name=gary&age=28']
rate = 5
}
The generated data is as follows, randomly extract the string from the content list
+-----------------+
|raw_message |
+-----------------+
|name=gary&age=28 |
|name=ricky&age=23|
+-----------------+