Ask or search…
K
Comment on page

Database Connectors

Read from and write to databases using connectors with Gretel Relational.

Overview

Database connectors can be used to automatically define source relational data and/or to write the generated synthesized/transformed to a database or data warehouse.
With over 30 available connectors, Gretel Relational supports connectors for any database supported by SQLAlchemy, and has first-class helper functions for PostgreSQL, SQLite, MySQL, MariaDB, Snowflake, and more.

Setting up a database connector

The general format for connecting to a database, whether to read from, write to, or both, is:
import sqlalchemy
from gretel_trainer.relational import Connector
conn_string = # General format: "{connection}://{user}:{password}@{host}:{port}/{database}"
engine = sqlalchemy.create_engine(conn_string)
connector = Connector(engine)
The format of conn_string - the connection string, also known as database URL - may be different depending on the database. For example, a user named "Scott" connecting to a MySQL database called foo would have URL: conn_string = "mysql://scott:password@localhost/foo". But, if foo was a SQLite database instead the URL would be: conn_string="sqlite:///foo.db. SQLite connects to file-based databases, so foo.db is the relative path of the database.
To make this process easier, we have First-Class Helper Functions for a number popular databases to simply the create_engine step. For other databases, refer to the SQLAlchemy docs to check the connection string format for your specific database.

Database Driver Installation

To create a SQLAlchemy engine for a given database, you often need to have a driver for that database installed. Sometimes this only requires installing a Python package (e.g., for Snowflake you just need to pip install snowflake-sqlalchemy). Other times (e.g. MySQL) you need to install a Python package (pip install mysqlclient) and need the database itself installed on your machine (brew install mysql), see below for more details on MySQL.
You can install optional extras packages with the trainer for specific databases. Examples:
pip install gretel-trainer[postgres]
pip install gretel-trainer[snowflake]
pip install gretel-trainer[mysql]
A full list of optional Connector extras packages can be found here.
Gretel Relational is designed for SQL databases that contain structured tables with defined relationships. You can make SQLAlchemy engines for all sorts of databases, some of which may not be supported by SQL or may not contain the type of database supported by Gretel Relational.

MySQL Installation

Before installing Gretel Trainer with MySQL support, you will need to install operating system level drivers first.
For Mac:
brew install mysql
For Linux (Debian based systems)
apt install python3-dev default-libmysqlclient-dev build-essential -y

Read from a database

After creating a connector, the extract method is used to read from the database and define RelationalData.
relational_data = connector.extract()
If you are using a database connection, but do not want to include all the tables in the database, you can use .extract(only=["table1", "table2", ...]) to define which tables to include or .extract(ignore=["table3"]) to specify which tables to exclude.

Write to a database

The same process for creating an output connector is used as for the source data connector. You can create a new connector or write the synthetic data to the same database connector used for source input. The save method is used to write to a database.
If you're writing back to the same database as your source, pass a prefix: str argument to the save method to avoid overwriting your source tables!

Write to an existing connector

# Synthetics
connector.save(
multitable.synthetic_output_tables,
#prefix="synth_"
)
# Transforms
connector.save(
multitable.transform_output_tables,
#prefix="trans_"
)

Write to a new connector

import sqlalchemy
from gretel_trainer.relational import Connector
# Note: Connector string (or URL) format may be different depending on database. Refer to SQLAlchemy docs for format for a specific database
out_conn_string = # General format: {connection}://{user}:{password}@{host}:{port}/{database}"
engine = sqlalchemy.create_engine(out_conn_string)
out_connector = Connector(engine)
# Synthetics
out_conn.save(
multitable.synthetic_output_tables,
#prefix="synth_"
)
# Transforms
out_conn.save(
multitable.transform_output_tables,
#prefix="trans_"
)

First-Class Helper Functions

Helper functions are available for a number of popular databases to simplify the process of writing a connector string to create an engine.

SQLite

from gretel_trainer.relational import sqlite_conn
db_path = "/path/to/database/example.db"
sqlite = sqlite_conn(path=db_path)
relational_data = sqlite.extract()

PostgreSQL

from gretel_trainer.relational import postgres_conn
postgres = postgres_conn(user="", password="", host="", port=1234, database="")
relational_data=postgres.extract()

MySQL

from gretel_trainer.relational import mysql_conn
mysql = mysql_conn(user="my_user", password="", host="", port=1234, database="")
relational_data=mysql.extract()

MariaDB

from gretel_trainer.relational import mariadb_conn
mariadb = mariadb_conn(user="", password="", host="", port=1234, database="")
relational_data=mariadb.extract()

Snowflake

from gretel_trainer.relational import snowflake_conn
snowflake = snowflake_conn(
user="",
password="",
account_identifier="",
database="",
schema="",
warehouse="",
role=""
)
relational_data=snowflake.extract()