In this tutorial, we will create a transform policy to identify and redact or replace PII with fake values. We will then use the CLI to transform a dataset and examine the results.
Save your configuration to a local file named redact_pii.yaml. Click the link to see all supported info types. The policy below searches for sensitive PII values as defined by Experian (including a custom regex for user IDs), replacing them with fake values when possible, or redacting with a user-defined character.
Transform results are downloaded to the local directory in CSV format to the file data.gz. Our policy is set to replace names, addresses, and emails with fake entities, and to redact the user ID regular expression with a character replacement.
Let's examine the transformed results from the command line.
For use cases such as training machine learning models on customer support logs, it is often desirable to replace PII with fake values to maintain semantics in the original data. However, this is not always desirable. Try updating the transformation policy to simply redact all sensitive values with an "*" character.