Relational Transform
Transform multi-table databases to redact PII while maintaining referential integrity.
Gretel Relational Transforms leverages our Transform capabilities to detect and transform sensitive entities throughout your database. You can effortlessly extract sensitive columns and apply a range of transformations at scale, such as masking, hashing, tokenization, or even replacement. By transforming key values, Gretel Relational Transforms goes the extra mile to ensure your database is private and secure, while maintaining referential integrity and statistical accuracy.
In addition to transforming your database, Gretel Relational also makes it easy to transform then synthesize a database for maximum privacy assurances (think GDPR compliance). We'll discuss how to Transform and Synthesize a Database below.
In the Relational page, we covered the process for installing Gretel Relational, defining your source database, and creating a Relational model. A brief recap of the code can be found below, again using our telecommunications Demo Database as an example. This example shows defining our source data using a SQLite connector. For more information on using other connectors or defining data manually, refer to Define Source Data.
from gretel_trainer.relational import *
connector = sqlite_conn("telecom.db")
relational_data = connector.extract()
multitable = MultiTable(
relational_data,
#project_display_name = "multi-table",
#gretel_model = "amplify",
#refresh_interval = 60,
)
The first step in relational transforms is choosing or defining a transform model config. The snippet below demonstrates a few different ways you can provide a config, including a local path, a URL, or a Gretel blueprint config.
local_config = "/path/to/transforms_config.yaml"
remote_config = "https://gretel-blueprints-pub.s3.amazonaws.com/rdb/users_policy.yaml"
blueprint_config = "transform/default"
Pass the transform config to
train_transform_models
to begin training. By default transforms will run on all tables in the RelationalData
instance, but this can be scoped to a subset of tables using one of the optional only
or ignore
parameters.multitable.train_transforms(
blueprint_config,
# only={"table_a", "table_b"},
# ignore={"table_x", "table_y"},
)
Once
train_transforms
has started, logs showing the status of each table's model are updated periodically according to the refresh_interval
set in the MultiTable
instance. When training begins, a model for each table will appear in your project under the name {table}-transforms
.Once training is complete, you can generate transformed data. Relational Transforms can be used alone or in combination with Relational Synthetics. If you intend to train synthetic models on the transformed output, instead of training on the source data, add the argument
in_place=True
.multitable.run_transforms()
You can also run other data through the trained transform model. For example:
multitable.run_transforms(data={"events": some_other_events_dataframe})
To transform data you plan to then synthesize, add the argument
in_place=True
to run_transforms
. Note: This will modify the data in the RelationalData
instance. Below is a code snippet for transforming and synthesizing the telecom database.from gretel_trainer.relational import *
from gretel_client.projects.models import read_model_config
# Input data from database
db_path = "telecom.db"
sqlite = sqlite_conn(path=db_path)
relational_data = sqlite.extract()
# Create relational model
multitable = MultiTable(
relational_data,
#project_display_name="multi-table",
#gretel_model="amplify",
#refresh_interval=60,
)
# Transform
multitable.train_transforms("transform/default")
multitable.run_transforms(in_place=True)
# Synthesize
multitable.train()
multitable.generate()
# Write output back to database
out_db_path = "output.db"
out_conn = sqlite(path=out_db_path)
out_conn.save(multitable.synthetic_output_tables)
The transformed data is automatically written to the working directory as
transformed_{table}.csv
. These files are also uploaded to the Gretel Cloud in an archive file called transform_outputs.tar.gz
. You can find and download this file under the "Data Sources" tab in your project. You can optionally write the transformed data to a database using a Connector
. The process for using output Connector
s is detailed here.
Last modified 1d ago