Snowflake

Connect to your Snowflake Data Warehouse.

Getting Started

Prerequisites to create a Snowflake based workflow. You will need

A source Snowflake connection.
(optional) A list of tables OR SQL queries.
(optional) A destination Snowflake connection.

Do not use your input data warehouse connection as an output connector. This action can result in the unintended overwriting of existing data.

Configuring a Snowflake Connection

There are two ways to authenticate a Gretel Snowflake connection, each methoed requires different fields for a connection creation:

Standard Authentication

A snowflake connection authenticated via username/password is created using the following parameters:

name

Display name of your choosing used to identify your connection within Gretel.

my-snowflake-connection

host

Fully qualified domain name (FQDN) used to establish connection to database server.

account_identifier.snowflakecomputing.com

username

Unique identifier associated with specific account authorized to access database.

john

password

Security credential to authenticate username.

...

database

Name of database to connect to.

MY_DATABASE

warehouse

Name of warehouse.

MY_WAREHOUSE

(optional) schema

Optional Name of schema.

MY_SCHEMA

(optional) params

Optional JDBC URL parameters that can be used for advanced configuration.

role=MY_ROLE

Creating Connections

First, create a file on your local computer containing the connection credentials. This file should also include type, name , config, and credentials. The config and credentials fields should contain fields that are specific to the connection being created.

Below is an example Snowflake connection:

{
    "type": "snowflake",
    "name": "my-snowflake-connection",
    "config": {
        "host": "account_identifier.snowflakecomputing.com",
        "username": "john",
        "database": "MY_DATABASE",
        "warehouse": "MY_WAREHOUSE",
        "schema": "MY_SCHEMA",
        "params": "role=MY_ROLE"
    },
    "credentials": {
        "password": "..."
    }
}

Now that you've created the credentials file, use the CLI to create the connection

gretel connections create --project [project id] --from-file [credential_file.json]

Click the New Connection button.
Step 1, choose the Type for the Connection - Snowflake.
Step 2, choose the Project for your Connection.
Step 3, fill in the credentials and select Add Connection.

from gretel_client import create_or_get_unique_project
from gretel_client.config import get_session_config
from gretel_client.rest_v1.api.connections_api import ConnectionsApi
from gretel_client.rest_v1.models import (
    CreateConnectionRequest,
    UpdateConnectionRequest,
)

session = get_session_config()
connection_api = session.get_v1_api(ConnectionsApi)

project = create_or_get_unique_project(name="snowflake-workflow")

connection = connection_api.create_connection(
    CreateConnectionRequest(
        name="my-snowflake-connection",
        project_id=project.project_guid,
        type="snowflake",
        config={
            "host": "account_identifier.snowflakecomputing.com",
            "username": "john",
            "database": "MY_DATABASE",
            "warehouse": "MY_WAREHOUSE",
            #"schema": "MY_SCHEMA",
            #"params": "role=MY_ROLE",
        },
        # note: best practice is to read in credentials from a file
        # or secret instead of directly embedding sensitive values
        # in python code.
        credentials={
            "password": "...",
        },
    )
)

External OAuth

name

Display name of your choosing used to identify your connection within Gretel.

my-snowflake-connection

host

Fully qualified domain name (FQDN) used to establish connection to database server.

account_identifier.snowflakecomputing.com

username

Unique identifier associated with specific account authorized to access database.

john

password

Security credential to authenticate username.

...

database

Name of database to connect to.

MY_DATABASE

warehouse

Name of warehouse.

MY_WAREHOUSE

oauth_client_id

Unique identifier associated with the authentication application.

oauth_grant_type

method through which oauth token will be acquired

ex. "password"

oauth_scope

scope given to request token

oauth_url

endpoint to fetch access token from

(optional) schema

Optional Name of schema.

MY_SCHEMA

(optional) params

Optional JDBC URL parameters that can be used for advanced configuration.

role=MY_ROLE

Creating Connections

External OAuth is currently only supported via CLI/SDK.

Below is an example Snowflake External OAuth connection:

{
    "type": "snowflake",
    "name": "my-snowflake-connection-oauth",
    "config": {
        "host": "account_identifier.snowflakecomputing.com",
        "username": "john",
        "database": "MY_DATABASE",
        "warehouse": "MY_WAREHOUSE",
        "schema": "MY_SCHEMA",
        "oauth_client_id": "MY_OAUTH_CLIENT_ID",
        "oauth_grant_type": "GRANT_TYPE",
        "oauth_scope": "OAUTH_SCOPE",
        "oauth_url": "OAUTH_URL",
        "params": "role=MY_ROLE",
    },
    "credentials": {
        "password": "...",
        "oauth_client_secret": "...",
    },
    "auth_strategy": "oauth"
}

Now that you've created the credentials file, use the CLI to create the connection

gretel connections create --project [project id] --from-file [credential_file.json]

from gretel_client import create_or_get_unique_project
from gretel_client.config import get_session_config
from gretel_client.rest_v1.api.connections_api import ConnectionsApi
from gretel_client.rest_v1.models import (
    CreateConnectionRequest,
    UpdateConnectionRequest,
)

session = get_session_config()
connection_api = session.get_v1_api(ConnectionsApi)

project = create_or_get_unique_project(name="snowflake-workflow")

connection = connection_api.create_connection(
    CreateConnectionRequest(
        name="my-snowflake-connection",
        project_id=project.project_guid,
        type="snowflake",
        config={
            "host": "account_identifier.snowflakecomputing.com",
            "username": "john",
            "database": "MY_DATABASE",
            "warehouse": "MY_WAREHOUSE",
            "oauth_client_id": "MY_OAUTH_CLIENT_ID",
            "oauth_grant_type": "GRANT_TYPE",
            "oauth_scope": "OAUTH_SCOPE",
            "oauth_url": "OAUTH_URL",
        },
        # note: best practice is to read in credentials from a file
        # or secret instead of directly embedding sensitive values
        # in python code.
        credentials={
            "password": "...",
            "oauth_client_secret": "...",
        },
        auth_strategy="oauth",
    )
)

Permissions

Source Connection Permissions

The Snowflake source action requires enough access to read from tables and access schema metadata. The following SQL script will create a Snowflake user suitable for a Gretel Snowflake source.

-- Create the Gretel source role
CREATE ROLE IF NOT EXISTS GRETEL_SOURCE_ROLE;

-- Snowflake best practice. This ensures other roles can modify or drop objects created by the custom role
GRANT ROLE GRETEL_SOURCE_ROLE TO ROLE SYSADMIN;

-- Create the Gretel source user
CREATE USER IF NOT EXISTS GRETEL_SOURCE_USER
    PASSWORD = 'my secure password' -- be sure to change this to a more secure password
    DEFAULT_ROLE = GRETEL_SOURCE_ROLE
    DEFAULT_WAREHOUSE= "<your warehouse>";

-- Grant schema access to the Gretel source role
GRANT USAGE ON DATABASE "<your database>" TO ROLE GRETEL_SOURCE_ROLE;
GRANT USAGE ON SCHEMA "<your database>.<your schema>" TO ROLE GRETEL_SOURCE_ROLE;
GRANT SELECT ON ALL TABLES IN SCHEMA "<your database>.<your schema>" TO ROLE GRETEL_SOURCE_ROLE;
GRANT SELECT ON FUTURE TABLES IN SCHEMA "<your database>.<your schema>" TO ROLE GRETEL_SOURCE_ROLE;

Destination Connection Permissions

The snowflake destination action requires enough permissions to write to the destination schema.

If your destination database and schema do not already exist, create those first.

CREATE DATABASE "<your destination db>";
CREATE SCHEMA "<your destination schema>";

Next configure a user for the Snowflake destination. This user must have OWNERSHIP permissions in order to write data to the destination schema.

The following SQL script will create a Snowflake user suitable for a Gretel Snowflake destination.

-- Create the Gretel destination role
CREATE ROLE IF NOT EXISTS GRETEL_DESTINATION_ROLE;

-- Snowflake best practice. This ensures other roles can modify or drop objects created by the custom role
GRANT ROLE GRETEL_DESTINATION_ROLE TO ROLE SYSADMIN;

-- Create the Gretel destination user
CREATE USER IF NOT EXISTS GRETEL_DESTINATION_USER
    PASSWORD = 'my secure password' -- be sure to change this to a more secure password
    DEFAULT_ROLE = GRETEL_DESTINATION_ROLE
    DEFAULT_WAREHOUSE = "<your warehouse>";

-- Grant ownership to the Gretel destination user
GRANT OWNERSHIP ON SCHEMA "<your database>.<your schema>" TO ROLE GRETEL_DESTINATION_ROLE;

Snowflake Source Action

Type

snowflake_source

Connection

snowflake

The snowflake_source action reads data from your Snowflake database. It can be used to extract:

an entire database, OR
selected tables from a database, OR
the results of SQL query/queries against a database.

Each time the workflow is run the source action will extract the most recent data from the source database.

When combined in a workflow, the data extracted from the snowflake_source action is used to train models and generate synthetic data with the gretel_tabular action, and can be written to an output database with the snowflake_destination action.

For the source database connection, we recommend using a backup or clone with read-only permissions, instead of connecting directly to your production database.

Inputs

The snowflake_source action takes slightly different inputs depending on the type of data you wish to extract. Flip through the tabs below to see the input config parameters and example action YAMLs for each type of extraction.

Entire Database

sync.mode

full - extracts all records from tables in database

(coming soon) subset - extract percentage of records from tables in database

Example Source Action YAML

actions:
  - name: extract-database
    type: snowflake_source
    connection: conn_1
    config:
      sync:
          mode: full

Selected Tables

sync.mode

full - extracts all records from selected tables in database

(coming soon) subset - extract percentage of records from selected tables in database

Sequence of mappings that lists the table(s) in the database to extract. name - table name

Example Source Action YAML

actions:
  - name: extract-selected-tables
    type: snowflake_source
    connection: conn_1
    config:
      tables:
        - name: client
        - name: location
      sync:
          mode: full

SQL Query/Queries

name - name of query; will be treated as name of resulting table

query - SQL statement used to query connected database

Additional name and query mappings can be provided to include multiple SQL queries

Example Source Action YAML

actions:
  - name: extract-sql-queries
    type: snowflake_source
    connection: conn_1
    config:
      queries:
        - name: people
          query: select first_name, last_name from client
        - name: cities
          query: select city, state, country from location

Outputs

Whether you are extracting an entire database, selected tables, or querying against a database, the snowflake_source action always provides a single output, dataset.

dataset

The output of a snowflake_source action can be used as the input to a gretel_tabular action in order to transform and/or synthesize a database.

Snowflake Destination Action

Type

snowflake_destination

Connection

snowflake

The snowflake_destination action can be used to write gretel_tabular action outputs to Snowflake destination databases.

Inputs

Whether you are writing an entire database, selected tables, or table(s) created via SQL query, the snowflake_destination action always takes the same input, dataset.

dataset

sync.mode

replace - overwrites any existing data in table(s) at destination

append - add generated data to existing table(s); only supported for query-created tables without primary keys

Example Destination Action YAML

actions:
...
  - name: snowflake-write
    type: snowflake_destination
    connection: conn_2
    input: synthesize
    config:
      sync:
        mode: replace
      dataset: {outputs.synthesize.dataset}

Sync Modes

There are multiple strategies for writing records into the destination database. These strategies are configured from the sync.mode field on a destination config.

sync.mode may be one of truncate, replace, or append.

Sync Mode: Truncate

When sync.mode is configured with truncate, records are first truncated from the destination table using the TRUNCATE TABLE DML command.

When sync mode is configured with truncate the destination table must already exist in the database.

Sync Mode: Replace

When sync.mode is configured with replace, the destination table is first dropped and then recreated using the schema from the source table.

If the source table is from Snowflake, the DDL is extracted using the GET_DDL metadata function. If the source table is from a non Snowflake source, the destination table schema is inferred based on the column types of the database.

When sync mode is configured with replace the destination table does not need to exist in the destination.

To respect foreign key constraints and referential integrity, tables without foreign keys are inserted first, and tables with foreign key references are inserted last.

When applying table DML for truncate or replace, operations are applied in reverse insertion order. This is to ensure records aren't deleted with incoming foreign key references.

It's also important to note: all table data is first dropped from the database before inserting new records back in. These operations are not atomic, so there may be periods of time when the destination database is in an incomplete state.

Sync Mode: Append

When sync.mode is configured with append, the destination action will simply insert records into the table, leaving any existing records in place.

When using the append sync mode, referential integrity is difficult to maintain. It's only recommended to use append mode when syncing adhoc queries to a destination table.

If append mode is configured with a source that syncs an entire database, it's likely the destination will be unable to insert records while maintaining foreign key constraints or referential integrity.

Example Workflow Configs

Create a synthetic version of your Snowflake database.

The following config will extract the entire Snowflake database, train and run a synthetic model, then write the outputs of the model back to a destination Snowflake database while maintaining referential integrity.

name: sample-snowflake-workflow-full-db

actions:
  - name: snowflake-read
    type: snowflake_source
    connection: conn_1
    config:
      sync:
          mode: full

  - name: synthesize
    type: gretel_tabular
    input: snowflake-read
    config:
      project_id: proj_1
      train:
        model: "synthetics/tabular-actgan"
        dataset: {outputs.snowflake-read.dataset}
      run:
        num_records_multiplier: 1.0

  - name: snowflake-write
    type: snowflake_destination
    connection: conn_2
    input: synthesize
    config:
      sync:
        mode: replace
      dataset: {outputs.synthesize.dataset}

Create a synthetic version of selected tables from your Snowflake database

The following config will extract two tables from your database, train and run a synthetic model, then write the outputs of the model back to a destination Snowflake database while maintaining any key relationships between the tables.

name: sample-snowflake-workflow-selected-tables

actions:
  - name: snowflake-read
    type: snowflake_source
    connection: conn_1
    config:
      tables:
        - name: client
        - name: location
      sync:
          mode: full

  - name: synthesize
    type: gretel_tabular
    input: snowflake-read
    config:
      project_id: proj_1
      train:
        model: "synthetics/tabular-actgan"
        dataset: {outputs.snowflake-read.dataset}
      run:
        num_records_multiplier: 1.0

  - name: snowflake-write
    type: snowflake_destination
    connection: conn_2
    input: synthesize
    config:
      sync:
        mode: replace
      dataset: {outputs.synthesize.dataset}

Create a synthetic version of a dataset formed by querying your MS SQL database

The following config will execute a SQL query against your Snowflake database to create a table containing data from across the database. Then, it will train and run a synthetic model to generate a synthetic table.

name: sample-snowflake-workflow-sql-query

actions:
  - name: snowflake-read
    type: snowflake_source
    connection: conn_1
    config:
      queries:
        - name: status_by_location
          query: SELECT location.city, location.state, location.zip, account.vip_status
            FROM client JOIN account ON client.client_id = account.client_id
            JOIN location ON client.client_id = location.client_id

  - name: synthesize
    type: gretel_tabular
    input: snowflake-read
    config:
      project_id: proj_1
      train:
        model: "synthetics/tabular-actgan"
        dataset: {outputs.snowflake-read.dataset}
      run:
        num_records_multiplier: 1.0

PreviousData Warehouse NextBigQuery

Last updated 11 months ago

Was this helpful?

Snowflake

Connect to your Snowflake Data Warehouse.

Getting Started

Prerequisites to create a Snowflake based workflow. You will need

A source Snowflake connection.
(optional) A list of tables OR SQL queries.
(optional) A destination Snowflake connection.

Do not use your input data warehouse connection as an output connector. This action can result in the unintended overwriting of existing data.

Configuring a Snowflake Connection

There are two ways to authenticate a Gretel Snowflake connection, each methoed requires different fields for a connection creation:

Standard Authentication

A snowflake connection authenticated via username/password is created using the following parameters:

name

Display name of your choosing used to identify your connection within Gretel.

my-snowflake-connection

host

Fully qualified domain name (FQDN) used to establish connection to database server.

account_identifier.snowflakecomputing.com

username

Unique identifier associated with specific account authorized to access database.

john

password

Security credential to authenticate username.

...

database

Name of database to connect to.

MY_DATABASE

warehouse

Name of warehouse.

MY_WAREHOUSE

(optional) schema

Optional Name of schema.

MY_SCHEMA

(optional) params

Optional JDBC URL parameters that can be used for advanced configuration.

role=MY_ROLE

Creating Connections

Below is an example Snowflake connection:

{
    "type": "snowflake",
    "name": "my-snowflake-connection",
    "config": {
        "host": "account_identifier.snowflakecomputing.com",
        "username": "john",
        "database": "MY_DATABASE",
        "warehouse": "MY_WAREHOUSE",
        "schema": "MY_SCHEMA",
        "params": "role=MY_ROLE"
    },
    "credentials": {
        "password": "..."
    }
}

Now that you've created the credentials file, use the CLI to create the connection

gretel connections create --project [project id] --from-file [credential_file.json]

Navigate to the using the menu item in the left sidebar.
Click the New Connection button.
Step 1, choose the Type for the Connection - Snowflake.
Step 2, choose the Project for your Connection.
Step 3, fill in the credentials and select Add Connection.

from gretel_client import create_or_get_unique_project
from gretel_client.config import get_session_config
from gretel_client.rest_v1.api.connections_api import ConnectionsApi
from gretel_client.rest_v1.models import (
    CreateConnectionRequest,
    UpdateConnectionRequest,
)

session = get_session_config()
connection_api = session.get_v1_api(ConnectionsApi)

project = create_or_get_unique_project(name="snowflake-workflow")

connection = connection_api.create_connection(
    CreateConnectionRequest(
        name="my-snowflake-connection",
        project_id=project.project_guid,
        type="snowflake",
        config={
            "host": "account_identifier.snowflakecomputing.com",
            "username": "john",
            "database": "MY_DATABASE",
            "warehouse": "MY_WAREHOUSE",
            #"schema": "MY_SCHEMA",
            #"params": "role=MY_ROLE",
        },
        # note: best practice is to read in credentials from a file
        # or secret instead of directly embedding sensitive values
        # in python code.
        credentials={
            "password": "...",
        },
    )
)

External OAuth

A snowflake connection authenticated via is created using the following parameters:

name

Display name of your choosing used to identify your connection within Gretel.

my-snowflake-connection

host

Fully qualified domain name (FQDN) used to establish connection to database server.

account_identifier.snowflakecomputing.com

username

Unique identifier associated with specific account authorized to access database.

john

password

Security credential to authenticate username.

...

database

Name of database to connect to.

MY_DATABASE

warehouse

Name of warehouse.

MY_WAREHOUSE

oauth_client_id

Unique identifier associated with the authentication application.

oauth_grant_type

method through which oauth token will be acquired

ex. "password"

oauth_scope

scope given to request token

oauth_url

endpoint to fetch access token from

(optional) schema

Optional Name of schema.

MY_SCHEMA

(optional) params

Optional JDBC URL parameters that can be used for advanced configuration.

role=MY_ROLE

Creating Connections

External OAuth is currently only supported via CLI/SDK.

Below is an example Snowflake External OAuth connection:

{
    "type": "snowflake",
    "name": "my-snowflake-connection-oauth",
    "config": {
        "host": "account_identifier.snowflakecomputing.com",
        "username": "john",
        "database": "MY_DATABASE",
        "warehouse": "MY_WAREHOUSE",
        "schema": "MY_SCHEMA",
        "oauth_client_id": "MY_OAUTH_CLIENT_ID",
        "oauth_grant_type": "GRANT_TYPE",
        "oauth_scope": "OAUTH_SCOPE",
        "oauth_url": "OAUTH_URL",
        "params": "role=MY_ROLE",
    },
    "credentials": {
        "password": "...",
        "oauth_client_secret": "...",
    },
    "auth_strategy": "oauth"
}

Now that you've created the credentials file, use the CLI to create the connection

gretel connections create --project [project id] --from-file [credential_file.json]

from gretel_client import create_or_get_unique_project
from gretel_client.config import get_session_config
from gretel_client.rest_v1.api.connections_api import ConnectionsApi
from gretel_client.rest_v1.models import (
    CreateConnectionRequest,
    UpdateConnectionRequest,
)

session = get_session_config()
connection_api = session.get_v1_api(ConnectionsApi)

project = create_or_get_unique_project(name="snowflake-workflow")

connection = connection_api.create_connection(
    CreateConnectionRequest(
        name="my-snowflake-connection",
        project_id=project.project_guid,
        type="snowflake",
        config={
            "host": "account_identifier.snowflakecomputing.com",
            "username": "john",
            "database": "MY_DATABASE",
            "warehouse": "MY_WAREHOUSE",
            "oauth_client_id": "MY_OAUTH_CLIENT_ID",
            "oauth_grant_type": "GRANT_TYPE",
            "oauth_scope": "OAUTH_SCOPE",
            "oauth_url": "OAUTH_URL",
        },
        # note: best practice is to read in credentials from a file
        # or secret instead of directly embedding sensitive values
        # in python code.
        credentials={
            "password": "...",
            "oauth_client_secret": "...",
        },
        auth_strategy="oauth",
    )
)

Permissions

Source Connection Permissions

The Snowflake source action requires enough access to read from tables and access schema metadata. The following SQL script will create a Snowflake user suitable for a Gretel Snowflake source.

-- Create the Gretel source role
CREATE ROLE IF NOT EXISTS GRETEL_SOURCE_ROLE;

-- Snowflake best practice. This ensures other roles can modify or drop objects created by the custom role
GRANT ROLE GRETEL_SOURCE_ROLE TO ROLE SYSADMIN;

-- Create the Gretel source user
CREATE USER IF NOT EXISTS GRETEL_SOURCE_USER
    PASSWORD = 'my secure password' -- be sure to change this to a more secure password
    DEFAULT_ROLE = GRETEL_SOURCE_ROLE
    DEFAULT_WAREHOUSE= "<your warehouse>";

-- Grant schema access to the Gretel source role
GRANT USAGE ON DATABASE "<your database>" TO ROLE GRETEL_SOURCE_ROLE;
GRANT USAGE ON SCHEMA "<your database>.<your schema>" TO ROLE GRETEL_SOURCE_ROLE;
GRANT SELECT ON ALL TABLES IN SCHEMA "<your database>.<your schema>" TO ROLE GRETEL_SOURCE_ROLE;
GRANT SELECT ON FUTURE TABLES IN SCHEMA "<your database>.<your schema>" TO ROLE GRETEL_SOURCE_ROLE;

Destination Connection Permissions

The snowflake destination action requires enough permissions to write to the destination schema.

If your destination database and schema do not already exist, create those first.

CREATE DATABASE "<your destination db>";
CREATE SCHEMA "<your destination schema>";

Next configure a user for the Snowflake destination. This user must have OWNERSHIP permissions in order to write data to the destination schema.

The following SQL script will create a Snowflake user suitable for a Gretel Snowflake destination.

-- Create the Gretel destination role
CREATE ROLE IF NOT EXISTS GRETEL_DESTINATION_ROLE;

-- Snowflake best practice. This ensures other roles can modify or drop objects created by the custom role
GRANT ROLE GRETEL_DESTINATION_ROLE TO ROLE SYSADMIN;

-- Create the Gretel destination user
CREATE USER IF NOT EXISTS GRETEL_DESTINATION_USER
    PASSWORD = 'my secure password' -- be sure to change this to a more secure password
    DEFAULT_ROLE = GRETEL_DESTINATION_ROLE
    DEFAULT_WAREHOUSE = "<your warehouse>";

-- Grant ownership to the Gretel destination user
GRANT OWNERSHIP ON SCHEMA "<your database>.<your schema>" TO ROLE GRETEL_DESTINATION_ROLE;

Snowflake Source Action

Type

snowflake_source

Connection

snowflake

The snowflake_source action reads data from your Snowflake database. It can be used to extract:

an entire database, OR
selected tables from a database, OR
the results of SQL query/queries against a database.

Each time the workflow is run the source action will extract the most recent data from the source database.

For the source database connection, we recommend using a backup or clone with read-only permissions, instead of connecting directly to your production database.

Inputs

Entire Database

sync.mode

full - extracts all records from tables in database

(coming soon) subset - extract percentage of records from tables in database

Example Source Action YAML

actions:
  - name: extract-database
    type: snowflake_source
    connection: conn_1
    config:
      sync:
          mode: full

Selected Tables

sync.mode

full - extracts all records from selected tables in database

(coming soon) subset - extract percentage of records from selected tables in database

tables:
    - name:
    - ...

Sequence of mappings that lists the table(s) in the database to extract. name - table name

Example Source Action YAML

actions:
  - name: extract-selected-tables
    type: snowflake_source
    connection: conn_1
    config:
      tables:
        - name: client
        - name: location
      sync:
          mode: full

SQL Query/Queries

queries:
  - name:
    query:
  ...

name - name of query; will be treated as name of resulting table

query - SQL statement used to query connected database

Additional name and query mappings can be provided to include multiple SQL queries

Example Source Action YAML

actions:
  - name: extract-sql-queries
    type: snowflake_source
    connection: conn_1
    config:
      queries:
        - name: people
          query: select first_name, last_name from client
        - name: cities
          query: select city, state, country from location

Outputs

Whether you are extracting an entire database, selected tables, or querying against a database, the snowflake_source action always provides a single output, dataset.

dataset

A to the data extracted from the database, including tables and relationships/schema.

The output of a snowflake_source action can be used as the input to a gretel_tabular action in order to transform and/or synthesize a database.

Snowflake Destination Action

Type

snowflake_destination

Connection

snowflake

The snowflake_destination action can be used to write gretel_tabular action outputs to Snowflake destination databases.

Inputs

Whether you are writing an entire database, selected tables, or table(s) created via SQL query, the snowflake_destination action always takes the same input, dataset.

dataset

A to the table(s) generated by Gretel and (if applicable) the relationship schema extracted from the source database.

sync.mode

replace - overwrites any existing data in table(s) at destination

append - add generated data to existing table(s); only supported for query-created tables without primary keys

Example Destination Action YAML

actions:
...
  - name: snowflake-write
    type: snowflake_destination
    connection: conn_2
    input: synthesize
    config:
      sync:
        mode: replace
      dataset: {outputs.synthesize.dataset}

Sync Modes

There are multiple strategies for writing records into the destination database. These strategies are configured from the sync.mode field on a destination config.

sync.mode may be one of truncate, replace, or append.

Sync Mode: Truncate

When sync.mode is configured with truncate, records are first truncated from the destination table using the TRUNCATE TABLE DML command.

When sync mode is configured with truncate the destination table must already exist in the database.

Sync Mode: Replace

When sync.mode is configured with replace, the destination table is first dropped and then recreated using the schema from the source table.

When sync mode is configured with replace the destination table does not need to exist in the destination.

To respect foreign key constraints and referential integrity, tables without foreign keys are inserted first, and tables with foreign key references are inserted last.

When applying table DML for truncate or replace, operations are applied in reverse insertion order. This is to ensure records aren't deleted with incoming foreign key references.

Sync Mode: Append

When sync.mode is configured with append, the destination action will simply insert records into the table, leaving any existing records in place.

When using the append sync mode, referential integrity is difficult to maintain. It's only recommended to use append mode when syncing adhoc queries to a destination table.

Example Workflow Configs

Create a synthetic version of your Snowflake database.

name: sample-snowflake-workflow-full-db

actions:
  - name: snowflake-read
    type: snowflake_source
    connection: conn_1
    config:
      sync:
          mode: full

  - name: synthesize
    type: gretel_tabular
    input: snowflake-read
    config:
      project_id: proj_1
      train:
        model: "synthetics/tabular-actgan"
        dataset: {outputs.snowflake-read.dataset}
      run:
        num_records_multiplier: 1.0

  - name: snowflake-write
    type: snowflake_destination
    connection: conn_2
    input: synthesize
    config:
      sync:
        mode: replace
      dataset: {outputs.synthesize.dataset}

Create a synthetic version of selected tables from your Snowflake database

name: sample-snowflake-workflow-selected-tables

actions:
  - name: snowflake-read
    type: snowflake_source
    connection: conn_1
    config:
      tables:
        - name: client
        - name: location
      sync:
          mode: full

  - name: synthesize
    type: gretel_tabular
    input: snowflake-read
    config:
      project_id: proj_1
      train:
        model: "synthetics/tabular-actgan"
        dataset: {outputs.snowflake-read.dataset}
      run:
        num_records_multiplier: 1.0

  - name: snowflake-write
    type: snowflake_destination
    connection: conn_2
    input: synthesize
    config:
      sync:
        mode: replace
      dataset: {outputs.synthesize.dataset}

Create a synthetic version of a dataset formed by querying your MS SQL database

name: sample-snowflake-workflow-sql-query

actions:
  - name: snowflake-read
    type: snowflake_source
    connection: conn_1
    config:
      queries:
        - name: status_by_location
          query: SELECT location.city, location.state, location.zip, account.vip_status
            FROM client JOIN account ON client.client_id = account.client_id
            JOIN location ON client.client_id = location.client_id

  - name: synthesize
    type: gretel_tabular
    input: snowflake-read
    config:
      project_id: proj_1
      train:
        model: "synthetics/tabular-actgan"
        dataset: {outputs.snowflake-read.dataset}
      run:
        num_records_multiplier: 1.0

PreviousData Warehouse NextBigQuery

Last updated 11 months ago

Was this helpful?