June 2024

Release notes for the Gretel Platform, June 2024

2024.6.9

Task: Add support for setting crawl limits when configuring Gretel Workflow object storage connectors. To set a limit, configure limit on object storage source connector.

Task: Improve Workflow config validation. Workflow action names are now validated to ensure uniqueness within a Workflow config.
Task: Gretel BigQuery connections can now be created without specifying a dataset. You can instead configure the BigQuery dataset by passing bq_dataset when configuring a bigquery_source action.
Fix: Fix to database subsetting. When collecting batches of data, those batches previously needed to contain the same set of columns. This constraint would sometime break subsetting if columns were sparsely populated.
Task: Hybrid Model docker images have now been consolidated into a single Model image.
Task: Hybrid Workflow docker images have now been consolidated into a single Workflow image.
Task: Intermediate Workflow artifacts are now immediately cleaned up when a Workflow completes. When a Workflow is configured with a sink, any intermediate model artifacts produced by the Workflow are cleaned up and removed when the Workflow completes.
Task: GPT-x, update config validation to limit epsilon to be between 0.1 and 100.
Task: GPT-x, ensure sampling probability is never larger than 1.0.
Fix: When writing objects to Azure Blob Storage, block sizes were written in chunks that were too small, leading to errors when writing larger object. Objects are now written in larger 25mb blocks.

Task: Standardize Tv2 column properties. The column object can be used to access specific properties of a column that is being evaluated in Tv2. See the Tv2 reference for more details.
Task: Update Tv2 to maintain referential integrity. By default, the gretel_tabular action when using Tv2 will ensure that Pk/FK columns are not transformed. By setting run.encode_keys: true within the action, keys will be transformed to integers or UUIDs.
Fix: Fix in gretel_tabular where null foreign-keys can be included when using subsetting.
Fix: Fix for Synthetic Quality Score for field correlation stability when missing values are in the data.
Fix: Fix for enforcing Teams runtime limits (max objects crawled, max bytes processed) on Workflows. These limits were previously being loaded from specific users, this is now fixed so they limits are loaded by Team if the user is a member of one.

Feature: 🚀 Hello Navigator Fine-Tuning! Our newest multi-modal model is live!
- Check out the blog for even more details!
- This model is available via the models-navigator_ft container for Hybrid customers.

Task: For GPT-x, the delta hyperparam will only be automatically updated if dp: true. Previously it was updated regardless of DP being enabled which was unnecessary.
Task: Improvements to the SQS Text Statistical Score for measuring quality of synthetic natural language data.

Task: Improve prompt validation for Gretel Navigator.
Task: When using Tv2 with gretel_tabular columns will no longer be attempted to be ordered in their original order. This causes issues when Tv2 configs are adding or removing columns.

Task: Tv2 NER will utilize GPUs when available.
Task: Databricks destination connector optimizations.
Task: Better handling for foreign key column with null values in gretel_tabular.

Last updated 3 months ago

Was this helpful?