Snowflake
Connect to your Snowflake Data Warehouse.
Last updated
Connect to your Snowflake Data Warehouse.
Last updated
Prerequisites to create a Snowflake based workflow. You will need
A source Snowflake connection.
(optional) A list of tables OR SQL queries.
(optional) A destination Snowflake connection.
Do not use your input data warehouse connection as an output connector. This action can result in the unintended overwriting of existing data.
There are two ways to authenticate a Gretel Snowflake connection, each methoed requires different fields for a connection creation:
A snowflake
connection authenticated via username/password is created using the following parameters:
First, create a file on your local computer containing the connection credentials. This file should also include type
, name
, config
, and credentials
. The config
and credentials
fields should contain fields that are specific to the connection being created.
Below is an example Snowflake connection:
Now that you've created the credentials file, use the CLI to create the connection
A snowflake
connection authenticated via External OAuth is created using the following parameters:
External OAuth is currently only supported via CLI/SDK.
First, create a file on your local computer containing the connection credentials. This file should also include type
, name
, config
, and credentials
. The config
and credentials
fields should contain fields that are specific to the connection being created.
Below is an example Snowflake External OAuth connection:
Now that you've created the credentials file, use the CLI to create the connection
The Snowflake source action requires enough access to read from tables and access schema metadata. The following SQL script will create a Snowflake user suitable for a Gretel Snowflake source.
The snowflake destination action requires enough permissions to write to the destination schema.
If your destination database and schema do not already exist, create those first.
Next configure a user for the Snowflake destination. This user must have OWNERSHIP
permissions in order to write data to the destination schema.
The following SQL script will create a Snowflake user suitable for a Gretel Snowflake destination.
The snowflake_source
action reads data from your Snowflake database. It can be used to extract:
an entire database, OR
selected tables from a database, OR
the results of SQL query/queries against a database.
Each time the workflow is run the source action will extract the most recent data from the source database.
When combined in a workflow, the data extracted from the snowflake_source
action is used to train models and generate synthetic data with the gretel_tabular
action, and can be written to an output database with the snowflake_destination
action.
For the source database connection, we recommend using a backup or clone with read-only permissions, instead of connecting directly to your production database.
The snowflake_source
action takes slightly different inputs depending on the type of data you wish to extract. Flip through the tabs below to see the input config parameters and example action YAMLs for each type of extraction.
Entire Database
Example Source Action YAML
Whether you are extracting an entire database, selected tables, or querying against a database, the snowflake_source
action always provides a single output, dataset
.
The output of a snowflake_source
action can be used as the input to a gretel_tabular
action in order to transform and/or synthesize a database.
The snowflake_destination
action can be used to write gretel_tabular
action outputs to Snowflake destination databases.
Whether you are writing an entire database, selected tables, or table(s) created via SQL query, the snowflake_destination
action always takes the same input, dataset
.
Example Destination Action YAML
There are multiple strategies for writing records into the destination database. These strategies are configured from the sync.mode
field on a destination config.
sync.mode
may be one of truncate
, replace
, or append
.
When sync.mode
is configured with truncate
, records are first truncated from the destination table using the TRUNCATE TABLE
DML command.
When sync mode is configured with truncate
the destination table must already exist in the database.
When sync.mode
is configured with replace
, the destination table is first dropped and then recreated using the schema from the source table.
If the source table is from Snowflake, the DDL is extracted using the GET_DDL
metadata function. If the source table is from a non Snowflake source, the destination table schema is inferred based on the column types of the database.
When sync mode is configured with replace
the destination table does not need to exist in the destination.
To respect foreign key constraints and referential integrity, tables without foreign keys are inserted first, and tables with foreign key references are inserted last.
When applying table DML for truncate
or replace
, operations are applied in reverse insertion order. This is to ensure records aren't deleted with incoming foreign key references.
It's also important to note: all table data is first dropped from the database before inserting new records back in. These operations are not atomic, so there may be periods of time when the destination database is in an incomplete state.
When sync.mode
is configured with append
, the destination action will simply insert records into the table, leaving any existing records in place.
When using the append
sync mode, referential integrity is difficult to maintain. It's only recommended to use append
mode when syncing adhoc queries to a destination table.
If append
mode is configured with a source that syncs an entire database, it's likely the destination will be unable to insert records while maintaining foreign key constraints or referential integrity.
Create a synthetic version of your Snowflake database.
The following config will extract the entire Snowflake database, train and run a synthetic model, then write the outputs of the model back to a destination Snowflake database while maintaining referential integrity.
name
Display name of your choosing used to identify your connection within Gretel.
my-snowflake-connection
host
Fully qualified domain name (FQDN) used to establish connection to database server.
account_identifier.snowflakecomputing.com
username
Unique identifier associated with specific account authorized to access database.
john
password
Security credential to authenticate username.
...
database
Name of database to connect to.
MY_DATABASE
warehouse
Name of warehouse.
MY_WAREHOUSE
(optional) schema
Optional Name of schema.
MY_SCHEMA
(optional) params
Optional JDBC URL parameters that can be used for advanced configuration.
role=MY_ROLE
name
Display name of your choosing used to identify your connection within Gretel.
my-snowflake-connection
host
Fully qualified domain name (FQDN) used to establish connection to database server.
account_identifier.snowflakecomputing.com
username
Unique identifier associated with specific account authorized to access database.
john
password
Security credential to authenticate username.
...
database
Name of database to connect to.
MY_DATABASE
warehouse
Name of warehouse.
MY_WAREHOUSE
oauth_client_id
Unique identifier associated with the authentication application.
oauth_grant_type
method through which oauth token will be acquired
ex. "password"
oauth_scope
scope given to request token
oauth_url
endpoint to fetch access token from
(optional) schema
Optional Name of schema.
MY_SCHEMA
(optional) params
Optional JDBC URL parameters that can be used for advanced configuration.
role=MY_ROLE
Type
snowflake_source
Connection
snowflake
sync.mode
full
- extracts all records from tables in database
(coming soon) subset
- extract percentage of records from tables in database
sync.mode
full
- extracts all records from selected tables in database
(coming soon) subset
- extract percentage of records from selected tables in database
Sequence of mappings that lists the table(s) in the database to extract. name
- table name
name
- name of query; will be treated as name of resulting table
query
- SQL statement used to query connected database
Additional name
and query
mappings can be provided to include multiple SQL queries
dataset
A reference to the data extracted from the database, including tables and relationships/schema.
Type
snowflake_destination
Connection
snowflake
dataset
A reference to the table(s) generated by Gretel and (if applicable) the relationship schema extracted from the source database.
sync.mode
replace
- overwrites any existing data in table(s) at destination
append
- add generated data to existing table(s); only supported for query-created tables without primary keys