Dataform write different project ID

Hi everyone

I am currently new at working with dataform in bigquery

I would like to know how I can make my sqlx files write to a dataset outside the project defined by default.

This is because in my company the resulting table MUST exist in a specific project and dataset outside of the ones I am using for the staging process

Project1.Dataset1(only staging files)
Project2.Dataset2 (only final tables) 

1 3 302
3 REPLIES 3

To write SQLX files to a dataset outside the default project in Google Cloud Dataform:

1. Override Project and Dataset in SQLX Table Definition:

  • Open the SQLX table definition file.
  • Within the config block, specify:
    SQL
    config { ... database: "Project2", 
    schema: "Dataset2" }

2. Consider Using the Open-Source Dataform CLI:

  • Install: npm i -g @dataform/cli@^2.3.2
  • Initialize: dataform init
  • Compile and run scripts locally, targeting different projects and datasets.

Additional Considerations:

  • Default Location: Set defaultLocation in dataform.jsonfor consistency.
  • Multiple Locations: Consider separate Dataform projects for each location.
  • Data Transfer: Consolidate datasets in a single location if possible.
  • Permissions: Ensure the service account has necessary permissions.
  • Testing and Deployment: Test configurations thoroughly and monitor for errors.

 

This part:

config { ... database: "Project2", 
schema: "Dataset2" }

Is not working:

Screenshot 2024-04-09 at 13.12.11.png

This is my dataform.json file:

Screenshot 2024-04-09 at 13.15.41.png

--
Best regards
David Regalado
Web | Linkedin | Cloudskillsboost

There are a couple of things to address in your example to make your SQLX file correctly write to the intended project and dataset:

1. Typos and Syntax:

  • Dataset Name: Be meticulous about the table name (test_david_ILS_categories) – ensure it's spelled correctly and used consistently throughout your files.
  • Database and Schema Placement: The database (for project ID) and schema (for dataset ID) should reside within the config block and be correctly indented.

2. Project and Dataset Targeting:

  • Configuration Alignment: Your dataform.json looks correct, setting the default database to "mvp". Make sure your SQLX file's configuration aligns with this.

Key Points:

  • Indentation: YAML syntax (used in the config block) relies on proper indentation. Check that your configuration is correctly structured.
  • Consistency: Mismatched table names lead to errors. Maintain consistency throughout your code and configuration.
  • Database and Schema: The database targets the project, and schema targets the dataset. Ensure you've specified these correctly.

Important Note on Data Loading:

  • Dataform's Focus: Dataform is primarily designed for transforming data that is already within BigQuery. It doesn't directly support commands like LOAD DATA OVERWRITE.

  • Loading Tools: To get data from CSV files into BigQuery, use tools like:

    • BigQuery UI (for simple uploads)
    • bq command-line tool (for automation and scripting)
    • BigQuery API (for integration with other applications)

Example using bq:

 
bq load --source_format=CSV --autodetect \ 
mvp.ILS.test_david_ILS_categories \ 
gs://data_model_generator/categories_2024_02*.csv