Re: Dataform write different project ID

bastian_moraga- · 12-22-2023 05:06 AM

Hi everyone

I am currently new at working with dataform in bigquery

I would like to know how I can make my sqlx files write to a dataset outside the project defined by default.

This is because in my company the resulting table MUST exist in a specific project and dataset outside of the ones I am using for the staging process

Project1.Dataset1(only staging files)
Project2.Dataset2 (only final tables)

ms4446

To write SQLX files to a dataset outside the default project in Google Cloud Dataform:

1. Override Project and Dataset in SQLX Table Definition:

Open the SQLX table definition file.

Within the config block, specify:

config { ... database: "Project2", 
schema: "Dataset2" }

2. Consider Using the Open-Source Dataform CLI:

Install: npm i -g @dataform/cli@^2.3.2
Initialize: dataform init
Compile and run scripts locally, targeting different projects and datasets.

Additional Considerations:

Default Location: Set defaultLocation in dataform.jsonfor consistency.
Multiple Locations: Consider separate Dataform projects for each location.
Data Transfer: Consolidate datasets in a single location if possible.
Permissions: Ensure the service account has necessary permissions.
Testing and Deployment: Test configurations thoroughly and monitor for errors.

davidregalado25

This part:

config { ... database: "Project2", 
schema: "Dataset2" }

Is not working:

This is my dataform.json file:

--
Best regards
David Regalado
Web | Linkedin | Cloudskillsboost

ms4446

There are a couple of things to address in your example to make your SQLX file correctly write to the intended project and dataset:

1. Typos and Syntax:

Dataset Name: Be meticulous about the table name (test_david_ILS_categories) – ensure it's spelled correctly and used consistently throughout your files.
Database and Schema Placement: The database (for project ID) and schema (for dataset ID) should reside within the config block and be correctly indented.

2. Project and Dataset Targeting:

Configuration Alignment: Your dataform.json looks correct, setting the default database to "mvp". Make sure your SQLX file's configuration aligns with this.

Key Points:

Indentation: YAML syntax (used in the config block) relies on proper indentation. Check that your configuration is correctly structured.
Consistency: Mismatched table names lead to errors. Maintain consistency throughout your code and configuration.
Database and Schema: The database targets the project, and schema targets the dataset. Ensure you've specified these correctly.

Important Note on Data Loading:

Dataform's Focus: Dataform is primarily designed for transforming data that is already within BigQuery. It doesn't directly support commands like LOAD DATA OVERWRITE.
Loading Tools: To get data from CSV files into BigQuery, use tools like:
- BigQuery UI (for simple uploads)
- bq command-line tool (for automation and scripting)
- BigQuery API (for integration with other applications)

Example using bq:

 
bq load --source_format=CSV --autodetect \ 
mvp.ILS.test_david_ILS_categories \ 
gs://data_model_generator/categories_2024_02*.csv 

Use SQLX files to define transformations and table structures within BigQuery. For the initial data loading step, use dedicated BigQuery loading tools. Accuracy and consistency in your configurations and references are key to avoiding errors and maintaining data integrity.