Comment on page
Database Connectors
Read from and write to databases using connectors with Gretel Relational.
Database connectors can be used to automatically define source relational data and/or to write the generated synthesized/transformed to a database or data warehouse.
With over 30 available connectors, Gretel Relational supports connectors for any database supported by SQLAlchemy, and has first-class helper functions for PostgreSQL, SQLite, MySQL, MariaDB, Snowflake, and more.
The general format for connecting to a database, whether to read from, write to, or both, is:
import sqlalchemy
from gretel_trainer.relational import Connector
conn_string = # General format: "{connection}://{user}:{password}@{host}:{port}/{database}"
engine = sqlalchemy.create_engine(conn_string)
connector = Connector(engine)
The format of
conn_string
- the connection string, also known as database URL - may be different depending on the database. For example, a user named "Scott" connecting to a MySQL database called foo
would have URL: conn_string = "mysql://scott:password@localhost/foo"
. But, if foo
was a SQLite database instead the URL would be: conn_string="sqlite:///foo.db
. SQLite connects to file-based databases, so foo.db
is the relative path of the database.To make this process easier, we have First-Class Helper Functions for a number popular databases to simply the
create_engine
step. For other databases, refer to the SQLAlchemy docs to check the connection string format for your specific database. To create a SQLAlchemy engine for a given database, you often need to have a driver for that database installed. Sometimes this only requires installing a Python package (e.g., for Snowflake you just need to
pip install snowflake-sqlalchemy
). Other times (e.g. MySQL) you need to install a Python package (pip install mysqlclient
) and need the database itself installed on your machine (brew install mysql
), see below for more details on MySQL.You can install optional extras packages with the trainer for specific databases. Examples:
pip install gretel-trainer[postgres]
pip install gretel-trainer[snowflake]
pip install gretel-trainer[mysql]
Gretel Relational is designed for SQL databases that contain structured tables with defined relationships. You can make SQLAlchemy engines for all sorts of databases, some of which may not be supported by SQL or may not contain the type of database supported by Gretel Relational.
Before installing Gretel Trainer with MySQL support, you will need to install operating system level drivers first.
For Mac:
brew install mysql
For Linux (Debian based systems)
apt install python3-dev default-libmysqlclient-dev build-essential -y
After creating a connector, the
extract
method is used to read from the database and define RelationalData
.relational_data = connector.extract()
If you are using a database connection, but do not want to include all the tables in the database, you can use
.extract(only=["table1", "table2", ...])
to define which tables to include or .extract(ignore=["table3"])
to specify which tables to exclude.The same process for creating an output connector is used as for the source data connector. You can create a new connector or write the synthetic data to the same database connector used for source input. The
save
method is used to write to a database.If you're writing back to the same database as your source, pass a
prefix: str
argument to the save
method to avoid overwriting your source tables!# Synthetics
connector.save(
multitable.synthetic_output_tables,
#prefix="synth_"
)
# Transforms
connector.save(
multitable.transform_output_tables,
#prefix="trans_"
)
import sqlalchemy
from gretel_trainer.relational import Connector
# Note: Connector string (or URL) format may be different depending on database. Refer to SQLAlchemy docs for format for a specific database
out_conn_string = # General format: {connection}://{user}:{password}@{host}:{port}/{database}"
engine = sqlalchemy.create_engine(out_conn_string)
out_connector = Connector(engine)
# Synthetics
out_conn.save(
multitable.synthetic_output_tables,
#prefix="synth_"
)
# Transforms
out_conn.save(
multitable.transform_output_tables,
#prefix="trans_"
)
Helper functions are available for a number of popular databases to simplify the process of writing a connector string to create an engine.
from gretel_trainer.relational import sqlite_conn
db_path = "/path/to/database/example.db"
sqlite = sqlite_conn(path=db_path)
relational_data = sqlite.extract()
from gretel_trainer.relational import postgres_conn
postgres = postgres_conn(user="", password="", host="", port=1234, database="")
relational_data=postgres.extract()
from gretel_trainer.relational import mysql_conn
mysql = mysql_conn(user="my_user", password="", host="", port=1234, database="")
relational_data=mysql.extract()
from gretel_trainer.relational import mariadb_conn
mariadb = mariadb_conn(user="", password="", host="", port=1234, database="")
relational_data=mariadb.extract()
from gretel_trainer.relational import snowflake_conn
snowflake = snowflake_conn(
user="",
password="",
account_identifier="",
database="",
schema="",
warehouse="",
role=""
)
relational_data=snowflake.extract()
Last modified 7mo ago