Relational
Synthesize and Transform multi-table databases with Gretel Relational.
With Gretel Relational, you can synthesize and transform multiple tables or entire SQL databases, while ensuring referential integrity, accuracy, and privacy.
Depending on your use case, you can leverage Gretel's generative AI models to:
- Transform and Synthesize a Database to ensure compliance with GDPR, CCPA, and other privacy regulations
Gretel Relational is currently delivered through Gretel's Trainer SDK, which provides client-side multi-model management for complex use cases. These features are released as part of this SDK as public-previews before being developed into Gretel's platform APIs.
To install the Trainer SDK, you should be running Python 3.9 or higher and can install via PyPi:
pip install -U gretel-trainer
If you wish to utilize database connectors, we recommend you install with connector support:
pip install -U gretel-trainer[connectors]
Detailed Walkthrough
Relational Synthetics Demo
Relational Transform Demo
- 1.Define Source Data automatically using one of our Database Connectors or manually via individual table files.
- 4.View results
- 5.Optionally, write output to database via Connector
For examples throughout the Relational documentation, we'll refer to our demo telecommunications database, shown below:

ERD for mock telecommunications database
The source relational data for a Gretel Relational model is captured in a
RelationalData
object. The RelationalData
instance stores information including the table data, primary and foreign keys, and the relationships between tables.There are two ways to create the
RelationalData
instance:- 1.
- 2.Manually, by providing table data in individual files and defining keys and relationships.
When you use a database
Connector
, the keys and relationships between tables are automatically extracted from the database to a RelationalData
instance. Gretel Relational supports 30+ connectors, and provides first-class helper functions for popular database warehouses such as postgresql, sqlite, mysql, mariadb, and snowflake.
from gretel_trainer.relational import sqlite_conn
# Download demo SQLite database:
# !curl -o "telecom.db" "https://gretel-blueprints-pub.s3.us-west-2.amazonaws.com/rdb/ecom_xf.db"
db_path = "telecom.db"
sqlite = sqlite_conn(path=db_path)
relational_data = sqlite.extract()
Alternatively, you can create the
RelationalData
instance from individual files - one per table. When using this approach, you will need to manually define the keys and relationships between tables, as shown below.Here, an empty
RelationalData
instance is created. Then, the tables with their primary keys and data are added using add_table
, and relationships between those tables are added using add_foreign_key_constraint
.The files are loaded into Pandas DataFrames, one per table. The example below shows manually defining
RelationalData
from CSVs. However any tabular data file type that is supported by Pandas can be used by adjusting the pd.read_csv
function to be pd.read_parquet
, pd.read_json
, etc.from gretel_trainer.relational import RelationalData
import pandas as pd
file_dir = "/path/to/table_files"
tables = [
#("table_name", "primary_key")
("account", "account_id"),
("client", "client_id"),
("invoice", "invoice_id"),
("location", "location_id"),
("payment", "payment_id"),
]
foreign_keys = [
{
"table": "account",
"constrained_columns": ["client_id"],
"referred_table": "client",
"referred_columns": ["client_id"],
},
{
"table": "location",
"constrained_columns": ["client_id"],
"referred_table": "client",
"referred_columns": ["client_id"],
},
{
"table": "invoice",
"constrained_columns": ["account_id"],
"referred_table": "account",
"referred_columns": ["account_id"],
},
{
"table": "payment",
"constrained_columns": ["invoice_id"],
"referred_table": "invoice",
"referred_columns": ["invoice_id"],
},
]
relational_data = RelationalData()
for table, pk in tables:
relational_data.add_table(name=table, primary_key=pk, data=pd.read_csv(f"{csv_dir}/{table}.csv"))
for foreign_key in foreign_keys:
relational_data.add_foreign_key(**foreign_key)
After the source data has been defined, the relational model and associated project are created using
MultiTable
. The MultiTable
class is the interface to working with relational data. It is used to control model training and data generation, and to view and evaluate results. A RelationalData
instance is required to initialize the MultiTable
object. Several other options can be configured; the defaults are shown below as comments. multitable = MultiTable(
relational_data,
#project_display_name = "multi-table",
#gretel_model = "amplify",
#refresh_interval = 60,
)
project_display_name
- the display name for the project created- A new project is created for each new
MultiTable
instance, essentially one per Relational model
refresh_interval
- the frequency in seconds with which to check models' statuses and update logsgretel_model
- the synthetic model to use."amplify"
(default),"actgan"
, or"lstm"
.
When a
MultiTable
instance is created, a log appears with the message:Created project `{project_display_name}` with unique name `{project_name}`.
A local working directory is created and named after the unique project name. This directory is where temporary files and permanent outputs from the
MultiTable
model will be stored. Additionally, you can view the project in Console using the link https://console.gretel.ai/{project_name}
and see outputs under the Data Sources tab at https://console.gretel.ai/{project_name}/data_sources
.Now that your training database and relational model are set up, you can synthesize, transform, or transform then synthesize your database.
- Synthesize database
- Transform database
- Transform and Synthesize database
Once data generation for Relational Synthetics, Relational Transform, or both has completed, you can view your output database and evaluate the results.
The output data is automatically written to the working directory. For Relational Transform, there is a
transformed_{table}.csv
file per table. For Relational Synthetics, there is a synth_{table}.csv
. file per table, one relational_report.html
file, and a cross-table and individual synthetic report (in JSON and HTML formats) per table. Output files are also uploaded to the Gretel Cloud in archives file called
transform_outputs.tar.gz
for transforms and synthetic_outputs.tar.gz
for synthetics. Created with every Relational Synthetics generation, the Gretel Relational Report provides unique accuracy and privacy scores to help you verify the quality of your synthetic database. Read more about it at the link above.
To view the Relational Report in the notebook:
import IPython
from smart_open import open
report_path = str(multitable._working_dir / multitable._synthetics_run.identifier / "relational_report.html")
IPython.display.HTML(data=open(report_path).read())
Lastly, you have the option of writing the generated data to a database using a connector. The examples below show writing the synthesized and the transformed telecommunications databases to new connectors using SQLite. For information about connecting to your own database and more code snippets, check out Write to Output to Database via Connector.
output_db_path = "synthetic_telecom.db"
output_conn = sqlite_conn(output_db_path)
output_conn.save(multitable.synthetic_output_tables)
output_db_path = "transformed_telecom.db"
output_conn = sqlite_conn(output_db_path)
output_conn.save(multitable.transform_output_tables)
Last modified 1d ago