Search…
Model Configuration
Define a policy to discover and label sensitive data including personally identifiable information, credentials, and even custom regular expressions inside text, logs, and other structured data.
The classify API policy structure has two notable sections. First, the models array will have one item that is keyed by classify.
Within the classify object:
    A data_source is required
      This parameter can be overloaded via the command line interface (CLI)
      At this time, csv and plain-text data formats are supported.
    A labels array is required to specify named entities to search for, including:
      Supported entities. See the full list.
      Namespaces for custom regular expressions (optional)
1
schema_version: "1.0"
2
name: "my-awesome-model"
3
models:
4
- classify:
5
data_source: "_"
6
labels:
7
- person_name
8
- credit_card_number
9
- phone_number
10
- us_social_security_number
11
- email_address
Copied!

Custom Predictors and Data Labeling

‌Within the config, you may optionally specify a label_predictors object where you can define custom predictors that will create custom entity labels.
‌This example creates a custom regular expression for a custom user id format:
1
schema_version: "1.0"
2
name: "classify-my-data"
3
4
# ... classify model defined here ...
5
6
label_predictors:
7
namespace: acme
8
regex:
9
user_id:
10
patterns:
11
- score: high
12
regex: "user_[\\d]{5}"
Copied!
If you wish to create custom predictors, you must provide a namespace which will be used when constructing the labels used.
    regex: Create your own regular expressions to match and yield custom labels. The value for this property should be an object that is keyed by the labels you wish to create. For each label you wish to create, you should provide an array of patterns. Patterns are objects consisting of:
      score: One of high, med, low. These map to floating point values of .8, ,5 and .2 respectively. If omitted the default is high.
      regex: The actual regex that will be used to match. When crafting your regex and testing it, ensure that it is compatible with Python 3.
‌In the example above, the namespace and the keys of the regex object are combined to create your custom labels. For above, the label acme/user_id will be created when a match occurs.
You can now combine the label_predictors with your classify policy. For example:
1
schema_version: "1.0"
2
name: "my-awesome-model"
3
models:
4
- classify:
5
data_source: "_"
6
labels:
7
- acme/*
8
9
label_predictors:
10
namespace: acme
11
regex:
12
user_id:
13
patterns:
14
- score: high
15
regex: "user_[\\d]{5}"
Copied!
Last modified 10d ago