Incremental Table w/ Changing columns

Hi there - 

I use dataform to build an incremental model ("requests_snapshot") that stores snapshots on a model that I output to end-users ("stg_requests"). This is so that we can audit how values w/in this model have changed over time. I am frequently adding fields to the output model which causes the total execution to fail because the snapshot model does not contain the field that was added to the output model. I usually fix this by manually adding the new field to the snapshot's schema. All previous snapshots will just show null for the new field and the execution will populate it going forward. Gross, but it works. 

Today is the first time I actually need to remove a field from the output. But I would like to keep the field in the snapshot model because it was captured at that moment and I don't want to just lose that data.  So I've finally hit the limit of the manual workaround.

I'm looking for a way to make my incremental model handle upstream field changes a bit more gracefully. I want it to:

  • Fill fields that no longer exist upstream w/ nulls
  • Add new fields to the snapshot model when they appear upstream for the first time

My current script is really basic, so I'm hoping this is a relatively easy fix: 

 

config {
    schema: "analytics_snapshots",
    type: "incremental",
    bigquery: {
        partitionBy: "day_of", 
        requirePartitionFilter: true
    }, 
}

with new_requests as (
    select 
        *
    from ${ref("stg_requests")}
)

SELECT 
    current_date as day_of,
    *
FROM new_requests
${when(incremental(), 
`WHERE current_date > (select MAX(day_of) FROM ${self()} where day_of is not null)`) }

 

 Any thoughts? 

Solved Solved
1 1 76
1 ACCEPTED SOLUTION

I don't know if there is an elegant way and would be very interested in it myself.

Whenever upstream structural changes occur on an incremental table I have been manually rebuilding it each time. I can't recall 100% but I believe I asked the same thing or read it in the documentation that this was the only course of action at the time. 

 

Edit: Found it and there are some workarounds. 

View solution in original post

1 REPLY 1

I don't know if there is an elegant way and would be very interested in it myself.

Whenever upstream structural changes occur on an incremental table I have been manually rebuilding it each time. I can't recall 100% but I believe I asked the same thing or read it in the documentation that this was the only course of action at the time. 

 

Edit: Found it and there are some workarounds.