Assert
Assert sink connector
Description
A sink plugin which can assert illegal data by user defined rules
Key Features
Options
| Name | Type | Required | Default | 
|---|---|---|---|
| rules | ConfigMap | yes | - | 
| rules.field_rules | string | yes | - | 
| rules.field_rules.field_name | string|ConfigMap | yes | - | 
| rules.field_rules.field_type | string | no | - | 
| rules.field_rules.field_value | ConfigList | no | - | 
| rules.field_rules.field_value.rule_type | string | no | - | 
| rules.field_rules.field_value.rule_value | numeric | no | - | 
| rules.field_rules.field_value.equals_to | boolean|numeric|string|ConfigList|ConfigMap | no | - | 
| rules.row_rules | string | yes | - | 
| rules.row_rules.rule_type | string | no | - | 
| rules.row_rules.rule_value | string | no | - | 
| rules.catalog_table_rule | ConfigMap | no | - | 
| rules.catalog_table_rule.primary_key_rule | ConfigMap | no | - | 
| rules.catalog_table_rule.primary_key_rule.primary_key_name | string | no | - | 
| rules.catalog_table_rule.primary_key_rule.primary_key_columns | ConfigList | no | - | 
| rules.catalog_table_rule.constraint_key_rule | ConfigList | no | - | 
| rules.catalog_table_rule.constraint_key_rule.constraint_key_name | string | no | - | 
| rules.catalog_table_rule.constraint_key_rule.constraint_key_type | string | no | - | 
| rules.catalog_table_rule.constraint_key_rule.constraint_key_columns | ConfigList | no | - | 
| rules.catalog_table_rule.constraint_key_rule.constraint_key_columns.constraint_key_column_name | string | no | - | 
| rules.catalog_table_rule.constraint_key_rule.constraint_key_columns.constraint_key_sort_type | string | no | - | 
| rules.catalog_table_rule.column_rule | ConfigList | no | - | 
| rules.catalog_table_rule.column_rule.name | string | no | - | 
| rules.catalog_table_rule.column_rule.type | string | no | - | 
| rules.catalog_table_rule.column_rule.column_length | int | no | - | 
| rules.catalog_table_rule.column_rule.nullable | boolean | no | - | 
| rules.catalog_table_rule.column_rule.default_value | string | no | - | 
| rules.catalog_table_rule.column_rule.comment | comment | no | - | 
| rules.table-names | ConfigList | no | - | 
| rules.tables_configs | ConfigList | no | - | 
| rules.tables_configs.table_path | String | no | - | 
| common-options | no | - | 
rules [ConfigMap]
Rule definition of user's available data. Each rule represents one field validation or row num validation.
field_rules [ConfigList]
field rules for field validation
field_name [string]
field name(string)
field_type [string | ConfigMap]
Field type declarations should adhere to this guide.
field_value [ConfigList]
A list value rule define the data value validation
rule_type [string]
The following rules are supported for now
- NOT_NULL value can't be null
- NULL value can be null
- MIN define the minimum value of data
- MAX define the maximum value of data
- MIN_LENGTH define the minimum string length of a string data
- MAX_LENGTH define the maximum string length of a string data
- MIN_ROW define the minimun number of rows
- MAX_ROW define the maximum number of rows
rule_value [numeric]
The value related to rule type. When the rule_type is MIN, MAX, MIN_LENGTH, MAX_LENGTH, MIN_ROW or MAX_ROW, users need to assign a value to the rule_value.
equals_to [boolean | numeric | string | ConfigList | ConfigMap]
equals_to is used to compare whether the field value is equal to the configured expected value. You can assign values of all types to equals_to. These types are detailed here. For instance, if one field is a row with three fields, and the declaration of row type is {a = array<string>, b = map<string, decimal(30, 2)>, c={c_0 = int, b = string}}, users can assign the value [["a", "b"], { k0 = 9999.99, k1 = 111.11 }, [123, "abcd"]] to equals_to.
The way of defining field values is consistent with FakeSource.
equals_tocannot be applied tonulltype fields. However, users can use the rule typeNULLfor verification, such as{rule_type = NULL}.
catalog_table_rule [ConfigMap]
Used to assert the catalog table is same with the user defined table.
table-names [ConfigList]
Used to assert the table should be in the data.
tables_configs [ConfigList]
Used to assert the multiple tables should be in the data.
table_path [String]
The path of the table.
common options
Sink plugin common parameters, please refer to Sink Common Options for details
Example
Simple
the whole config obey with hocon style
Assert {
    rules =
      {
        row_rules = [
          {
            rule_type = MAX_ROW
            rule_value = 10
          },
          {
            rule_type = MIN_ROW
            rule_value = 5
          }
        ],
        field_rules = [{
          field_name = name
          field_type = string
          field_value = [
            {
              rule_type = NOT_NULL
            },
            {
              rule_type = MIN_LENGTH
              rule_value = 5
            },
            {
              rule_type = MAX_LENGTH
              rule_value = 10
            }
          ]
        }, {
          field_name = age
          field_type = int
          field_value = [
            {
              rule_type = NOT_NULL
              equals_to = 23
            },
            {
              rule_type = MIN
              rule_value = 32767
            },
            {
              rule_type = MAX
              rule_value = 2147483647
            }
          ]
        }
        ]
        catalog_table_rule {
            primary_key_rule = {
                primary_key_name = "primary key"
                primary_key_columns = ["id"]
            }
            constraint_key_rule = [
                        {
                        constraint_key_name = "unique_name"
                        constraint_key_type = UNIQUE_KEY
                        constraint_key_columns = [
                            {
                                constraint_key_column_name = "id"
                                constraint_key_sort_type = ASC
                            }
                        ]
                        }
            ]
            column_rule = [
               {
                name = "id"
                type = bigint
               },
              {
                name = "name"
                type = string
              },
              {
                name = "age"
                type = int
              }
            ]
        }
      }
  }
Complex
Here is a more complex example about equals_to. The example involves FakeSource. You may want to learn it, please read this document.
source {
  FakeSource {
    row.num = 1
    schema = {
      fields {
        c_null = "null"
        c_string = string
        c_boolean = boolean
        c_tinyint = tinyint
        c_smallint = smallint
        c_int = int
        c_bigint = bigint
        c_float = float
        c_double = double
        c_decimal = "decimal(30, 8)"
        c_date = date
        c_timestamp = timestamp
        c_time = time
        c_bytes = bytes
        c_array = "array<int>"
        c_map = "map<time, string>"
        c_map_nest = "map<string, {c_int = int, c_string = string}>"
        c_row = {
          c_null = "null"
          c_string = string
          c_boolean = boolean
          c_tinyint = tinyint
          c_smallint = smallint
          c_int = int
          c_bigint = bigint
          c_float = float
          c_double = double
          c_decimal = "decimal(30, 8)"
          c_date = date
          c_timestamp = timestamp
          c_time = time
          c_bytes = bytes
          c_array = "array<int>"
          c_map = "map<string, string>"
        }
      }
    }
    rows = [
      {
        kind = INSERT
        fields = [
          null, "AAA", false, 1, 1, 333, 323232, 3.1, 9.33333, 99999.99999999, "2012-12-21", "2012-12-21T12:34:56", "12:34:56",
          "bWlJWmo=",
          [0, 1, 2],
          "{ 12:01:26 = v0 }",
          { k1 = [123, "BBB-BB"]},
          [
            null, "AAA", false, 1, 1, 333, 323232, 3.1, 9.33333, 99999.99999999, "2012-12-21", "2012-12-21T12:34:56", "12:34:56",
            "bWlJWmo=",
            [0, 1, 2],
            { k0 = v0 }
          ]
        ]
      }
    ]
    plugin_output = "fake"
  }
}
sink{
  Assert {
    plugin_input = "fake"
    rules =
      {
        row_rules = [
          {
            rule_type = MAX_ROW
            rule_value = 1
          },
          {
            rule_type = MIN_ROW
            rule_value = 1
          }
        ],
        field_rules = [
            {
                field_name = c_null
                field_type = "null"
                field_value = [
                    {
                        rule_type = NULL
                    }
                ]
            },
            {
                field_name = c_string
                field_type = string
                field_value = [
                    {
                        rule_type = NOT_NULL
                        equals_to = "AAA"
                    }
                ]
            },
            {
                field_name = c_boolean
                field_type = boolean
                field_value = [
                    {
                        rule_type = NOT_NULL
                        equals_to = false
                    }
                ]
            },
            {
                field_name = c_tinyint
                field_type = tinyint
                field_value = [
                    {
                        rule_type = NOT_NULL
                        equals_to = 1
                    }
                ]
            },
            {
                field_name = c_smallint
                field_type = smallint
                field_value = [
                    {
                        rule_type = NOT_NULL
                        equals_to = 1
                    }
                ]
            },
            {
                field_name = c_int
                field_type = int
                field_value = [
                    {
                        rule_type = NOT_NULL
                        equals_to = 333
                    }
                ]
            },
            {
                field_name = c_bigint
                field_type = bigint
                field_value = [
                    {
                        rule_type = NOT_NULL
                        equals_to = 323232
                    }
                ]
            },
            {
                field_name = c_float
                field_type = float
                field_value = [
                    {
                        rule_type = NOT_NULL
                        equals_to = 3.1
                    }
                ]
            },
            {
                field_name = c_double
                field_type = double
                field_value = [
                    {
                        rule_type = NOT_NULL
                        equals_to = 9.33333
                    }
                ]
            },
            {
                field_name = c_decimal
                field_type = "decimal(30, 8)"
                field_value = [
                    {
                        rule_type = NOT_NULL
                        equals_to = 99999.99999999
                    }
                ]
            },
            {
                field_name = c_date
                field_type = date
                field_value = [
                    {
                        rule_type = NOT_NULL
                        equals_to = "2012-12-21"
                    }
                ]
            },
            {
                field_name = c_timestamp
                field_type = timestamp
                field_value = [
                    {
                        rule_type = NOT_NULL
                        equals_to = "2012-12-21T12:34:56"
                    }
                ]
            },
            {
                field_name = c_time
                field_type = time
                field_value = [
                    {
                        rule_type = NOT_NULL
                        equals_to = "12:34:56"
                    }
                ]
            },
            {
                field_name = c_bytes
                field_type = bytes
                field_value = [
                      {
                          rule_type = NOT_NULL
                          equals_to = "bWlJWmo="
                      }
                ]
            },
            {
                field_name = c_array
                field_type = "array<int>"
                field_value = [
                    {
                        rule_type = NOT_NULL
                        equals_to = [0, 1, 2]
                    }
                ]
            },
            {
                field_name = c_map
                field_type = "map<time, string>"
                field_value = [
                    {
                        rule_type = NOT_NULL
                        equals_to = "{ 12:01:26 = v0 }"
                    }
                ]
            },
            {
                field_name = c_map_nest
                field_type = "map<string, {c_int = int, c_string = string}>"
                field_value = [
                    {
                        rule_type = NOT_NULL
                        equals_to = { k1 = [123, "BBB-BB"] }
                    }
                ]
            },
            {
                field_name = c_row
                field_type = {
                    c_null = "null"
                    c_string = string
                    c_boolean = boolean
                    c_tinyint = tinyint
                    c_smallint = smallint
                    c_int = int
                    c_bigint = bigint
                    c_float = float
                    c_double = double
                    c_decimal = "decimal(30, 8)"
                    c_date = date
                    c_timestamp = timestamp
                    c_time = time
                    c_bytes = bytes
                    c_array = "array<int>"
                    c_map = "map<string, string>"
                }
                field_value = [
                    {
                        rule_type = NOT_NULL
                        equals_to = [
                           null, "AAA", false, 1, 1, 333, 323232, 3.1, 9.33333, 99999.99999999, "2012-12-21", "2012-12-21T12:34:56", "12:34:56",
                           "bWlJWmo=",
                           [0, 1, 2],
                           { k0 = v0 }
                        ]
                    }
                ]
            }
        ]
    }
  }
}
Assert Multiple Tables
check multiple tables
env {
  parallelism = 1
  job.mode = BATCH
}
source {
  FakeSource {
    tables_configs = [
      {
        row.num = 16
        schema {
          table = "test.table1"
          fields {
            c_int = int
            c_bigint = bigint
          }
        }
      },
      {
        row.num = 17
        schema {
          table = "test.table2"
          fields {
            c_string = string
            c_tinyint = tinyint
          }
        }
      }
    ]
  }
}
transform {
}
sink {
  Assert {
    rules =
      {
        tables_configs = [
          {
            table_path = "test.table1"
            row_rules = [
              {
                rule_type = MAX_ROW
                rule_value = 16
              },
              {
                rule_type = MIN_ROW
                rule_value = 16
              }
            ],
            field_rules = [{
              field_name = c_int
              field_type = int
              field_value = [
                {
                  rule_type = NOT_NULL
                }
              ]
            }, {
              field_name = c_bigint
              field_type = bigint
              field_value = [
                {
                  rule_type = NOT_NULL
                }
              ]
            }]
          },
          {
            table_path = "test.table2"
            row_rules = [
              {
                rule_type = MAX_ROW
                rule_value = 17
              },
              {
                rule_type = MIN_ROW
                rule_value = 17
              }
            ],
            field_rules = [{
              field_name = c_string
              field_type = string
              field_value = [
                {
                  rule_type = NOT_NULL
                }
              ]
            }, {
              field_name = c_tinyint
              field_type = tinyint
              field_value = [
                {
                  rule_type = NOT_NULL
                }
              ]
            }]
          }
        ]
      }
  }
}
Changelog
Change Log
| Change | Commit | Version | 
|---|---|---|
| [Improve][API] Add metadata schema into catalog table (#9586) | https://github.com/apache/seatunnel/commit/385814e7f1 | 2.3.12 | 
| [Improve] restruct connector common options (#8634) | https://github.com/apache/seatunnel/commit/f3499a6eeb | 2.3.10 | 
| [improve] add assert options (#8620) | https://github.com/apache/seatunnel/commit/b159cc0c75 | 2.3.10 | 
| [Feature][API] Support timestamp with timezone offset (#8367) | https://github.com/apache/seatunnel/commit/e18bfeabd2 | 2.3.9 | 
| [fix][connector-v2][connector-assert] Optimize Assert Sink verification method (#8356) | https://github.com/apache/seatunnel/commit/5c9159d7cd | 2.3.9 | 
| [Improve][dist]add shade check rule (#8136) | https://github.com/apache/seatunnel/commit/51ef800016 | 2.3.9 | 
| [Feature][File] Support config null format for text file read (#8109) | https://github.com/apache/seatunnel/commit/2dbf02df47 | 2.3.9 | 
| [Feature][Transform-V2] Support transform with multi-table (#7628) | https://github.com/apache/seatunnel/commit/72c9c4576d | 2.3.9 | 
| [Improve][API] Unified tables_configs and table_list (#8100) | https://github.com/apache/seatunnel/commit/84c0b8d660 | 2.3.9 | 
| [Fix][API] Fix column length can not be long (#8039) | https://github.com/apache/seatunnel/commit/16cf632d3e | 2.3.9 | 
| [Feature][Restapi] Allow metrics information to be associated to logical plan nodes (#7786) | https://github.com/apache/seatunnel/commit/6b7c53d03c | 2.3.9 | 
| [Feature][Connector-V2] Assert support multi-table check (#7687) | https://github.com/apache/seatunnel/commit/c4778a2497 | 2.3.8 | 
| [Feature][Transform] Add embedding transform (#7534) | https://github.com/apache/seatunnel/commit/3310cfcd34 | 2.3.8 | 
| [Improve][Connector] Add multi-table sink option check (#7360) | https://github.com/apache/seatunnel/commit/2489f6446b | 2.3.7 | 
| [Feature][Core] Support using upstream table placeholders in sink options and auto replacement (#7131) | https://github.com/apache/seatunnel/commit/c4ca74122c | 2.3.6 | 
| [Hotfix] fix http source can not read yyyy-MM-dd HH:mm:ss format bug & Improve DateTime Utils (#6601) | https://github.com/apache/seatunnel/commit/19888e7969 | 2.3.5 | 
| [Feature][Connector-V2][Assert] Support field type assert and field value equality assert for full data types (#6275) | https://github.com/apache/seatunnel/commit/576919bfab | 2.3.4 | 
| [Feature][Connector-V2][Assert] Support check the precision and scale of Decimal type. (#6110) | https://github.com/apache/seatunnel/commit/dd64ed52d4 | 2.3.4 | 
| [Hotfix][SQL Transform] Fix cast to timestamp, date, time bug (#5812) | https://github.com/apache/seatunnel/commit/de181de02a | 2.3.4 | 
| [Improve][Common] Introduce new error define rule (#5793) | https://github.com/apache/seatunnel/commit/9d1b2582b2 | 2.3.4 | 
| [Improve] Remove use SeaTunnelSink::getConsumedTypemethod and mark it as deprecated (#5755) | https://github.com/apache/seatunnel/commit/8de7408100 | 2.3.4 | 
| [Improve] Add default implement for SeaTunnelSink::setTypeInfo(#5682) | https://github.com/apache/seatunnel/commit/86cba87450 | 2.3.4 | 
| [Fix] Fix log error when multi-table sink close (#5683) | https://github.com/apache/seatunnel/commit/fea4b6f268 | 2.3.4 | 
| Support config tableIdentifier for schema (#5628) | https://github.com/apache/seatunnel/commit/652921fb75 | 2.3.4 | 
| [Feature] Add table-namesfrom FakeSource/Assert to produce/assert multi-table (#5604) | https://github.com/apache/seatunnel/commit/2c67cd8f3e | 2.3.4 | 
| [Improve] Remove useless ReadonlyConfig flatten feature (#5612) | https://github.com/apache/seatunnel/commit/243edfef3d | 2.3.4 | 
| Support config column/primaryKey/constraintKey in schema (#5564) | https://github.com/apache/seatunnel/commit/eac76b4e50 | 2.3.4 | 
| [Improve][connector-assert]support 'DECIMAL' type and fix 'Number' type precision issue (#5479) | https://github.com/apache/seatunnel/commit/d308e27733 | 2.3.4 | 
| [Improve][CheckStyle] Remove useless 'SuppressWarnings' annotation of checkstyle. (#5260) | https://github.com/apache/seatunnel/commit/51c0d709ba | 2.3.4 | 
| [Feature][Transform] Add SimpleSQL transform plugin (#4148) | https://github.com/apache/seatunnel/commit/b914d49abf | 2.3.1 | 
| [Improve][build] Give the maven module a human readable name (#4114) | https://github.com/apache/seatunnel/commit/d7cd601051 | 2.3.1 | 
| [Improve][Project] Code format with spotless plugin. (#4101) | https://github.com/apache/seatunnel/commit/a2ab166561 | 2.3.1 | 
| [Hotfix][OptionRule] Fix option rule about all connectors (#3592) | https://github.com/apache/seatunnel/commit/226dc6a119 | 2.3.0 | 
| [Improve][Connector-V2][Assert] Unified exception for assert connector (#3331) | https://github.com/apache/seatunnel/commit/e74c9bc6fd | 2.3.0 | 
| [improve][connector] The Factory#factoryIdentifier must be consistent with PluginIdentifierInterface#getPluginName (#3328) | https://github.com/apache/seatunnel/commit/d9519d696a | 2.3.0 | 
| [Improve][Connector-V2] Add Clickhouse and Assert Source/Sink Factory (#3306) | https://github.com/apache/seatunnel/commit/9e4a128381 | 2.3.0 | 
| [Feature][Connector-v2] improve assert sink connector (#2844) | https://github.com/apache/seatunnel/commit/967fec0e93 | 2.3.0-beta | 
| [DEV][Api] Replace SeaTunnelContext with JobContext and remove singleton pattern (#2706) | https://github.com/apache/seatunnel/commit/cbf82f755c | 2.2.0-beta | 
| [improve][UT] Upgrade junit to 5.+ (#2305) | https://github.com/apache/seatunnel/commit/362319ff3e | 2.2.0-beta | 
| [checkstyle] Improved validation scope of MagicNumber (#2194) | https://github.com/apache/seatunnel/commit/6d08b5f369 | 2.2.0-beta | 
| [API-DRAFT][MERGE] update license and pom.xml | https://github.com/apache/seatunnel/commit/5ae8865b7c | 2.2.0-beta | 
| add assert sink to Api draft (#2071) | https://github.com/apache/seatunnel/commit/fc640b52bd | 2.2.0-beta |