New incremental feature

moebe · 04-12-2021 03:51 AM

Hi all

I wanted to share my thoughts on the new "incremental refresh" feature.
It's actually a great thing.

Unfortunately, I think the current solution is not sufficiently well thought out and not very useful for some use cases.

I always refer to the combination of aggregate awareness and incremental refresh.

How does Aggregate Awareness basically work?
I have my Explore, which joins master data and transaction data (dimensions and facts) with each other.
Then I select my dimensional fields and my key figures and create a denormalised table.

In the old world, this data was always completely renewed when, for example, the datagroup was triggered.

Now, with the incremental feature, it is possible to reload only the last X days, for example.

I will try to show this with an example.
(For simplification: Imagine today is January 31th 2021)

For simplicity we do not change the granularity (so no real aggregation) but just join both tables together, as we do it in aggregate awareness)

The result would look like:

Now i can build a simple Year to Date line chart to show the success of the sales person

Great!

Incremental refresh now allows us to generate the complete table not every day, but - let's say the last 7 days.
Of course, this results in a considerable performance gain and thus a cost saving!
Very cool.
Unfortunately, there is a huge catch.
The refresh periods are not dynamic.
I have to define the key - in our example the "valuation_date" and the increment - let's say 7 days).

What happens now:

The last 7 days are recalculated, all older entries remain unchanged.

Now, however, James got married on 29 January and took his wife's name. His name is no longer Miller but Ubels.

If we assume that we are not working with an SCD (slowly changing dimension) on the dimensions. Looks at the table since 30 January (he reported it the day after the marriage).

So what happened to the Aggregated Table on 31 January 2021 when incremental refresh is turned on?

And what does the visualisation look like now?

The result is neither technically correct nor functional.
Either all entries since 01 January 2021 would have to be changed retroactively to Ubels (view from currently valid truth).
Or James Miller may only appear as James Ubels from 30 January.

There is no perfect solution to this problem because aggregate awarenesse is a highly complex issue and there can/may/should/must be different ways of looking at the same data.

But what could Looker do to get around the problem.

Two possibilities:

manualy refresh.

Allow the client to manually recreate the aggregated table.

Using incremental logic ( as it can be done today).
As a full reload

Then I, as a customer, would have the possibility to update the table on demand (e.g. in case of important master data changes).

automatically with a further parameter

full_refresh_period

This could be defined like the datagroup.

In summary:

Since in the aggregate awareness approach master data and key figures are written flat in a table, any master data change on the basis means an incorrect/unattractive representation in the final Explore.

What do you think?
How do you handle such issues in your company?
Do you have this issues at all?