Prerequisites to create an Oracle Database based workflow. You will need
A source Oracle Database connection.
(optional) A list of tables OR SQL queries.
(optional) A destination Oracle Database connection OR object storage connection.
For the source database connection, we recommend using a backup or clone with read-only permissions, instead of connecting directly to your production database.
Do not use your input database connection as an output connector. This action can result in the unintended overwriting of existing data.
Create a Connection
A oracle connection is created using the following parameters:
Connection Creation Parameters
Creating Connections
First, create a file on your local computer containing the connection credentials. This file should also include type, name , config, and credentials. The config and credentials fields should contain fields that are specific to the connection being created.
Navigate to the Connections page using the menu item in the left sidebar.
Click the New Connection button.
Step 1, choose the Type for the Connection - Oracle Database.
Step 2, choose the Project for your Connection.
Step 3, fill in the credentials and select Add Connection.
from gretel_client import create_or_get_unique_projectfrom gretel_client.config import get_session_configfrom gretel_client.rest_v1.api.connections_api import ConnectionsApifrom gretel_client.rest_v1.models import ( CreateConnectionRequest, UpdateConnectionRequest,)session =get_session_config()connection_api = session.get_v1_api(ConnectionsApi)project =create_or_get_unique_project(name="oracle-workflow")connection = connection_api.create_connection(CreateConnectionRequest( name="my-oracle-connection", project_id=project.project_guid, type="oracle", config={"username": "john","host": "myserver.example.com","service_name": "my_database",#"port": 1521,#"instance_name": "myinstance",#"params": "key1=value1;key2=value2", },# note: best practice is to read in credentials from a file# or secret instead of directly embedding sensitive values# in python code. credentials={"password": "...", }, ))
Permissions
In Oracle, the CREATE SCHEMA command does not create a new, standalone schema. Instead, one creates a user. When the user is created, a schema is also automatically created for that user. When the user logs in, that schema is used by default for the session. In order to prevent name clashes or data accidents, we encourage you to create separate Oracle users for the Source and Destination connections.
Source Connection Permissions
The Oracle source action requires enough access to read from tables and access schema metadata. The following SQL script will create an Oracle user suitable for a Gretel Oracle source.
-- Create the userCREATEUSERuserIDENTIFIED BYpassword-- change to something more secureDEFAULT TABLESPACE SYSTEM;-- Required to log inGRANTCREATESESSIONON user;
Destination Connection Permissions
The following SQL script will create an Oracle user suitable for a Gretel Oracle destination. It will write to its own schema.
-- Create the userCREATEUSERuserIDENTIFIED BYpassword-- change to something more secureDEFAULT TABLESPACE SYSTEM-- change if you have a separate tablespaceQUOTA UNLIMITEDONSYSTEM; -- change to limit amount of space allocated-- Required to log inGRANTCREATESESSIONON user;-- Required for writesGRANTCREATETABLEON user;
The oracle_source action reads data from your Oracle database. It can be used to extract:
an entire database, OR
selected tables from a database, OR
the results of SQL query/queries against a database.
Each time the workflow is run the source action will extract the most recent data from the source database.
When combined in a workflow, the data extracted from the oracle_source action is used to train models and generate data with the gretel_tabular action, and can be written to an output database with the oracle_destination action. Your generated data can also be written to object storage connections, for more information see Writing to Object Storage.
For the source database connection, we recommend using a backup or clone with read-only permissions, instead of connecting directly to your production database.
Inputs
The oracle_source action takes slightly different inputs depending on the type of data you wish to extract. Flip through the tabs below to see the input config parameters and example action YAMLs for each type of extraction.
actions: - name:extract-sql-queriestype:oracle_sourceconnection:conn_1config:queries: - name:peoplequery:select first_name, last_name from client - name:citiesquery:select city, state, country from location
Outputs
Whether you are extracting an entire database, selected tables, or querying against a database, the oracle_source action always provides a single output, dataset.
The output of a oracle_source action can be used as the input to a gretel_tabular action in order to transform and/or synthesize a database.
Oracle Database Destination Action
The oracle_destination action can be used to write gretel_tabular action outputs to Oracle destination databases.
Inputs
Whether you are writing an entire database, selected tables, or table(s) created via SQL query, the oracle_destination action always takes the same input, dataset.
Sync Modes
There are multiple strategies for writing records into the destination database. These strategies are configured from the sync.mode field on a destination config.
sync.mode may be one of truncate, replace, or append.
Sync Mode: Truncate
When sync.mode is configured with truncate, records are first truncated from the destination table using the TRUNCATE TABLE DML command.
When sync mode is configured with truncate the destination table must already exist in the database.
Sync Mode: Replace
When sync.mode is configured with replace, the destination table is first dropped and then recreated using the schema from the source table.
If the source table is from Oracle, the DDL is extracted using the GET_DDL interface from the DBMS_METADATA package. If the source table is from a non Oracle source, the destination table schema is inferred based on the column types of the source schema (if present) or data.
When sync mode is configured with replace the destination table does not need to exist in the destination.
To respect foreign key constraints and referential integrity, tables without foreign keys are inserted first, and tables with foreign key references are inserted last.
When applying table DML for truncate or replace, operations are applied in reverse insertion order. This is to ensure records aren't deleted with incoming foreign key references.
It's also important to note: all table data is first dropped from the database before inserting new records back in. These operations are not atomic, so there may be periods of time when the destination database is in an incomplete state.
Sync Mode: Append
When sync.mode is configured with append, the destination action will simply insert records into the table, leaving any existing records in place.
When using the append sync mode, referential integrity is difficult to maintain. It's only recommended to use append mode when syncing adhoc queries to a destination table.
If append mode is configured with a source that syncs an entire database, it's likely the destination will be unable to insert records while maintaining foreign key constraints or referential integrity, causing the action to fail.
You can also write your output dataset to an object storage connection like Amazon S3 or Google Cloud Storage. Whether you are writing an entire database, selected tables, or table(s) created via SQL query, the {object_storage}_destination action always takes the same inputs - filename and input, and path. Additionally, S3 and GCS take bucket and Azure Blob takes container.
Create a synthetic version of your Oracle database.
The following config will extract the entire Oracle database, train and run a synthetic model, then write the outputs of the model back to a destination Oracle database while maintaining referential integrity.
Create a synthetic version of selected tables from your Oracle database
The following config will extract two tables from your database, train and run a synthetic model, then write the outputs of the model back to a destination Oracle database while maintaining any key relationships between the tables.
Create a synthetic version of a dataset formed by querying your Oracle database and write to S3
The following config will execute a SQL query against your Oracle database to create a table containing data from across the database. Then, it will train and run a synthetic model to generate a synthetic table. Finally, the generated data will be written to an Amazon S3 bucket.
Create a synthetic version of your Oracle database and write the results to GCS.
The following config will extract the entire Oracle database, train and run a synthetic model, then write the output tables to an output Google Cloud Storage bucket while maintaining referential integrity.