PostgreSQL
Connect to your PostgreSQL databases.
Getting Started
Prerequisites to create a PostgreSQL based workflow. You will need
A source PostgreSQL connection.
(optional) A list of tables OR SQL queries.
(optional) A destination PostgreSQL connection.
For the source database connection, we recommend using a backup or clone with read-only permissions, instead of connecting directly to your production database.
Do not use your input database connection as an output connector. This action can result in the unintended overwriting of existing data.
Create a Connection
A postgres
connection is created using the following parameters:
Connection Creation Parameters
Creating Connections
First, create a file on your local computer containing the connection credentials. This file should also include type
, name
, config
, and credentials
. The config
and credentials
fields should contain fields that are specific to the connection being created.
Below is an example PostgreSQL connection:
Now that you've created the credentials file, use the CLI to create the connection
PostgreSQL Source Action
The postgres_source
action reads data from your PostgreSQL database. It can be used to extract:
an entire database, OR
selected tables from a database, OR
the results of SQL query/queries against a database.
Each time the workflow is run the source action will extract the most recent data from the source database.
When combined in a workflow, the data extracted from the postgres_source
action is used to train models and generate synthetic data with the gretel_tabular
action, and can be written to an output database with the postgres_destination
action.
For the source database connection, we recommend using a backup or clone with read-only permissions, instead of connecting directly to your production database.
Inputs
The postgres_source
action takes slightly different inputs depending on the type of data you wish to extract. Flip through the tabs below to see the input config parameters and example action YAMLs for each type of extraction.
Entire Database
Example Source Action YAML
Outputs
Whether you are extracting an entire database, selected tables, or querying against a database, the postgres_source
action always provides a single output, dataset
.
The output of a postgres_source
action can be used as the input to a gretel_tabular
action in order to transform and/or synthesize a database.
PostgreSQL Destination Action
The postgres_destination
action can be used to write gretel_tabular
action outputs to PostgreSQL destination databases.
Inputs
Whether you are writing an entire database, selected tables, or table(s) created via SQL query, the postgres_destination
action always takes the same input, dataset
.
Example Destination Action YAML
Sync Modes
There are multiple strategies for writing records into the destination database. These strategies are configured from the sync.mode
field on a destination config.
sync.mode
may be one of truncate
, replace
, or append
.
Sync Mode: Truncate
When sync.mode
is configured with truncate
, records are first truncated from the destination table using the TRUNCATE TABLE
DML command.
When sync mode is configured with truncate
the destination table must already exist in the database.
Sync Mode: Replace
When sync.mode
is configured with replace
, the destination table is first dropped and then recreated using a schema inferred from the input dataset.
When the schema is inferred from the input dataset, certain column types or constraints may not be maintained from the source table. If you want to maintain the same schema from your source database, please use sync mode truncate
.
When sync mode is configured with replace
the destination table does not need to exist in the destination.
To respect foreign key constraints and referential integrity, tables without foreign keys are inserted first, and tables with foreign key references are inserted last.
When applying table DML for truncate
or replace
, operations are applied in reverse insertion order. This is to ensure records aren't deleted with incoming foreign key references.
It's also important to note: all table data is first dropped from the database before inserting new records back in. These operations are not atomic, so there may be periods of time when the destination database is in an incomplete state.
Sync Mode: Append
When sync.mode
is configured with append
, the destination action will simply insert records into the table, leaving any existing records in place.
When using the append
sync mode, referential integrity is difficult to maintain. It's only recommended to use append
mode when syncing adhoc queries to a destination table.
If append
mode is configured with a source that syncs an entire database, it's likely the destination will be unable to insert records while maintaining foreign key constraints or referential integrity, causing the action to fail.
Example Workflow Configs
Create a synthetic version of your MySQL database.
The following config will extract the entire database, train and run a synthetic model, then write the outputs of the model back to a destination PostgreSQL database while maintaining referential integrity.
Last updated