[Retired] Using Native Derived Tables

lloydtabb · 07-14-2017 01:02 PM

This article has been retired as the information is now in documentation on this page.

If you reached here from the product, our apologies – we’re working on getting that link redirected to this LookML reference page.

drewgillson

This is awesome! This will save your users so much time. I’ve done a lot of painstaking copying and pasting to generate derived tables from explore results (and then replacing generated table names with ${table.SQL_TABLE_NAME} syntax… this is a WAY better solution

devwiredau

This is honestly one of the best features Looker has ever launched! Big time saver. One weird thing I found so far is when using date as a dimension. The timezone conversion doesn’t work correctly so I had to create a new time dimension in my original view and use CONVERT_TIMEZONE before referencing it in the NDT. This seemed to work well.

Does anyone know if you can join one of these NDTs back to another table?

rufus

Hi Devin,

We’re excited about this feature too. Glad to hear it’s useful for you. Do you mind sharing the syntax that didn’t work and the solution you used either here or in an email to help.looker.com? We’d be interested in taking a look so we can document the behaviour and work on a fix if needed.

Thanks!

devwiredau

Yeah no problem. I’ll send an email to support.

mplooker

Does anyone know if you can join one of these NDTs back to another table?

@devwiredau this is exactly the question i had. And as an extension, can we join the NDT back to the original explore_source explore from which it is defined?

steven5

You can definitely join one of these NDTs back to another table. Once defined, they really just behave exactly like the current sql-defined derived tables.

@mplooker as for joining it back to the original explore_source (great question!) you should indeed be able to do so, so long as you do not require fields from the explore in the list of columns that would cause an infinite dependency loop (which Looker should detect in validation and complain about). For example, this dependency loop would occur if two different views using explore_source were joined back into their original explore and each used a column which depended on a column of the other.

mplooker

thanks @steven5!! very helpful.

powellandy

Will there be (or is there already) support for joins, unions, and templated filters in the Native Derived Table definition?

lloydtabb

@powellandy native derived tables produce a view that can be joined, even into the original explore. Unions aren’t supported at the moment (but you can reference them using the ${table.SQL_TABLE_NAME} syntax, so you can use Native Derived tables in SQL derived tables that have a UNION). Templated filters are probably the most experimental part of this, I know a pretty critical bug in them was fixed yesterday (as of 4.18.6). Here is some example code for templated filters.

github.com

looker/bq_thelook/blob/master/filtered_lookml_dt.view.lkml

include: "users.explore.lkml"

explore: filtered_lookml_dt {}

view: filtered_lookml_dt {
  derived_table: {
    explore_source: users {
      column: age {field: users.age}
      column: people {field: users.count}
      bind_filters: {
        to_field: users.created_date
        from_field: filtered_lookml_dt.filter_date
      }
    }
  }

  filter: filter_date {
    type: date
  }
  dimension: age {type: number}

This file has been truncated. show original

mplooker

@lloydtabb is there any chance this relates to templated filters being applied at certain points of the SQL generation but not others?

i was recently experiencing a weird situation when I tried to define a PDT (PDT1) that contained a templated_filter (TF1), defined another PDT (PDT2pdt1) that referenced the initial PDT, defined an NDT that performed some agg on the second PDT (NDT1pdt2), and then finally joined the NDT (NDT1pdt2) back to the second PDT (PDT2). the resulting SQL would properly evaluate the templated_filter (TF1) within the initial PDT (PDT1) but not within the NDT (NDT1pdt2) … and would look something like this:

with PDT1 as 
(
    select  accountname
    from SOMESQLTABLE1
    where [****PROPERLY EVALUATED TF1****]
    group by accountname
),
PDT2 as
(
    select 
    date, 
    accountname, 
    sum(value) as sumvalue
    from SOMESQLTABLE2
    inner join PDT1
    on SOMESQLTABLE2.accountname = PDT1.accountname
    group by date, accountname
)

select 
PDT2.date, 
PDT2.accountname, 
PDT2.sumvalue,
NTD1pdt2.sumvalue_avg 
from PDT2
left outer join 
( 
    select 
	date, 
	avg(sumvalue) over (partition by date) as sumvalue_avg 
	from SOMESQLTABLE2
    inner join 
    (
    	select  accountname
    	from SOMESQLTABLE1
    	where [****IMPROPERLY EVALUATED TF1: 1=1 ****] 
    	group by accountname
    ) as PDT1hashcode
	on SOMESQLTABLE2.accountname = PDT1hashcode.accountname
) as NDT1pdt2
on PDT2.date = NDT1pdt2.date

lloydtabb

@mplooker can you craft a simple LookML example, we’d be glad to take a look. What version of Looker are you running? There were definitely some bugs bugs before 4.18.6

mplooker

@lloydtabb great, thanks! here’s a quick and dirty illustration of what i’m talking about… not 1:1 the exact situation i tried to describe above but the same issue seems to arise (in the resulting SQL, i’ve annotated the templated_filter clauses that are generated with asterisks) :

LOOKML:

explore: PDT2 {
  join: PDT1 {
    sql_on:  ${PDT1.accountname}=${PDT2.accountname} ;;
    type:  inner
    relationship: one_to_one
  }
  join: NDT1pdt2 {
    sql_on: ${PDT2.date} = ${NDT1pdt2.date} ;;
    type:  left_outer
    relationship: one_to_one
  }
}


view: PDT1 {
  derived_table: {
    sql:
          select uid, accountname from
          (
            select 1 uid, 'x' accountname UNION ALL
            select 2 uid, 'y' accountname
          ) a1
          where a1.accountname = {% parameter accountname %}
              ;;
  }

  dimension: PDT1uid {
    hidden: yes
    type: number
    value_format_name: id
    sql: ${TABLE}.uid ;;
    primary_key: yes
  }

  dimension: accountname {
    hidden: no
    type: string
    sql: ${TABLE}.accountname ;;
  }
}


view: PDT2 {
  derived_table: {
    sql:
        select a2.uid,  a2.date, a2.accountname,  a2.value from
        (
          select 1 uid, 100 value, 'x' accountname, '20161230' date UNION ALL
          select 2 uid, 200 value, 'x' accountname, '20161230' date UNION ALL
          select 3 uid, 500 value, 'x' accountname, '20161230' date UNION ALL
          select 4 uid, 800 value, 'y' accountname, '20161230' date UNION ALL
          select 5 uid, 250 value, 'y' accountname, '20161230' date UNION ALL
          select 6 uid, 100 value, 'x' accountname, '20161231' date UNION ALL
          select 7 uid, 400 value, 'x' accountname, '20161231' date UNION ALL
          select 8 uid, 600 value, 'x' accountname, '20161231' date UNION ALL
          select 9 uid, 100 value, 'y' accountname, '20161231' date
        ) a2
        inner join ${PDT1.SQL_TABLE_NAME} pdt1
        on pdt1.accountname = a2.accountname;;
  }

  dimension: PDT2uid {
    hidden: yes
    type: number
    value_format_name: id
    sql: ${TABLE}.uid ;;
    primary_key: yes
  }


  dimension: date {
    type: date
    sql: ${TABLE}.date ;;
  }

  dimension: accountname {
    hidden: yes
    type: string
    sql: ${TABLE}.accountname ;;
  }
  
  measure: sumvalue {
    type: sum
    sql: ${TABLE}.value ;;
    value_format_name: decimal_2
  }
}


view: NDT1pdt2 {
  derived_table: {
    explore_source: PDT2 {
      column: date {}
      column: sumvalue {}
      derived_column: sumvalue_avg {
        sql: AVG(sumvalue) OVER (PARTITION BY date) ;;
      }
    }
  }

  dimension: date {
    hidden: yes
    type: date
  }
  
  measure: sumvalue_avg {
    type: average
    value_format_name: decimal_2
  }
}

RESULTING SQL:

WITH PDT1 AS (select uid, accountname from
          (
            select 1 uid, 'x' accountname UNION ALL
            select 2 uid, 'y' accountname
          ) a1
        *********  where a1.accountname = 'x' *********
              )
  ,  PDT2 AS (select a2.uid,  a2.date, a2.accountname,  a2.value from
        (
          select 1 uid, 100 value, 'x' accountname, '20161230' date UNION ALL
          select 2 uid, 200 value, 'x' accountname, '20161230' date UNION ALL
          select 3 uid, 500 value, 'x' accountname, '20161230' date UNION ALL
          select 4 uid, 800 value, 'y' accountname, '20161230' date UNION ALL
          select 5 uid, 250 value, 'y' accountname, '20161230' date UNION ALL
          select 6 uid, 100 value, 'x' accountname, '20161231' date UNION ALL
          select 7 uid, 400 value, 'x' accountname, '20161231' date UNION ALL
          select 8 uid, 600 value, 'x' accountname, '20161231' date UNION ALL
          select 9 uid, 100 value, 'y' accountname, '20161231' date
        ) a2
        inner join PDT1 pdt1
        on pdt1.accountname = a2.accountname)
  ,  NDT1pdt2 AS (
SELECT as53f08f27e18.*,
       AVG(sumvalue) OVER (PARTITION BY date)  AS sumvalue_avg
FROM (SELECT 
	CONVERT(VARCHAR(10),PDT2.date ,120) AS date,
	COALESCE(SUM(PDT2.value ), 0) AS sumvalue
FROM (select a2.uid,  a2.date, a2.accountname,  a2.value from
        (
          select 1 uid, 100 value, 'x' accountname, '20161230' date UNION ALL
          select 2 uid, 200 value, 'x' accountname, '20161230' date UNION ALL
          select 3 uid, 500 value, 'x' accountname, '20161230' date UNION ALL
          select 4 uid, 800 value, 'y' accountname, '20161230' date UNION ALL
          select 5 uid, 250 value, 'y' accountname, '20161230' date UNION ALL
          select 6 uid, 100 value, 'x' accountname, '20161231' date UNION ALL
          select 7 uid, 400 value, 'x' accountname, '20161231' date UNION ALL
          select 8 uid, 600 value, 'x' accountname, '20161231' date UNION ALL
          select 9 uid, 100 value, 'y' accountname, '20161231' date
        ) a2
        inner join (select uid, accountname from
          (
            select 1 uid, 'x' accountname UNION ALL
            select 2 uid, 'y' accountname
          ) a1
          ****** where a1.accountname = '' ******
              ) pdt1
        on pdt1.accountname = a2.accountname) AS PDT2

GROUP BY CONVERT(VARCHAR(10),PDT2.date ,120)) as53f08f27e18
)
SELECT 
TOP 500 
	CONVERT(VARCHAR(10),PDT2.date ,120) AS "pdt2.date_1",
	PDT1.accountname  AS "pdt1.accountname_1",
	AVG(NDT1pdt2.sumvalue_avg) AS "ndt1pdt2.sumvalue_avg_1",
	COALESCE(SUM(PDT2.value ), 0) AS "pdt2.sumvalue_1"
FROM PDT2
INNER JOIN PDT1 ON PDT1.accountname=PDT2.accountname 
LEFT JOIN NDT1pdt2 ON (CONVERT(VARCHAR(10),PDT2.date ,120)) = (CONVERT(VARCHAR(10),NDT1pdt2.date,120)) 

WHERE 
	(PDT1.accountname = 'x')
GROUP BY CONVERT(VARCHAR(10),PDT2.date ,120),PDT1.accountname 
ORDER BY 3 DESC

lloydtabb

It seems to be working as expected for me in 4.18.6. What version are you running? I know we fixed some bugs around this recently.

SELECT 
	PDT1.accountname  AS pdt1_accountname_1,
	DATE(PDT2.date ) AS pdt2_date_1,
	AVG(NDT1pdt2.sumvalue_avg) AS ndt1pdt2_sumvalue_avg_1,
	COALESCE(CAST(SUM(PDT2.value ) AS FLOAT), 0) AS pdt2_sumvalue_1
FROM (select a2.uid,  a2.date, a2.accountname,  a2.value from
        (
          select 1 uid, 100 value, 'x' accountname, '20161230' date UNION ALL
          select 2 uid, 200 value, 'x' accountname, '20161230' date UNION ALL
          select 3 uid, 500 value, 'x' accountname, '20161230' date UNION ALL
          select 4 uid, 800 value, 'y' accountname, '20161230' date UNION ALL
          select 5 uid, 250 value, 'y' accountname, '20161230' date UNION ALL
          select 6 uid, 100 value, 'x' accountname, '20161231' date UNION ALL
          select 7 uid, 400 value, 'x' accountname, '20161231' date UNION ALL
          select 8 uid, 600 value, 'x' accountname, '20161231' date UNION ALL
          select 9 uid, 100 value, 'y' accountname, '20161231' date
        ) a2
        inner join (select uid, accountname from
          (
            select 1 uid, 'x' accountname UNION ALL
            select 2 uid, 'y' accountname
          ) a1
          where a1.accountname = 'test'
              ) pdt1
        on pdt1.accountname = a2.accountname) AS PDT2

...

powellandy

Hi Lloyd, sorry, maybe I wasn’t clear enough. I meant if I currently have a sql-defined PDT which joins two or more tables together, can I write a corresponding LookML-defined PDT which does this join, or is it currently not supported?

lloydtabb

Yes, for sure. in The example Model for Native Derived Tables The user/order fact table below joins users, products, inventory items to product the SQL

The normal ‘explore’ object for order_items, defines the joins.

The Native Derived Table:

include: "order_items.explore"

view: user_order_facts {
  derived_table: {
    explore_source: order_items {
      column: user_id {field:order_items.user_id}
      column: lifetime_revenue {field:order_items.total_revenue}
      column: lifetime_number_of_orders {field:order_items.order_count}
      column: lifetime_product_categories {field:products.category_list}
      column: lifetime_brands {field:products.brand_list}
    }
  }
  dimension: user_id {hidden:yes}
  dimension: lifetime_revenue {type:number}
  dimension: lifetime_number_of_orders {type:number}
  dimension: lifetime_product_categories {}
  dimension: lifetime_brands {}
}

Produces the following SQL

SELECT 
	order_items.user_id AS user_id,
	COALESCE(SUM(order_items.sale_price ), 0) AS lifetime_revenue,
	COUNT(DISTINCT order_items.order_id ) AS lifetime_number_of_orders,
	STRING_AGG(DISTINCT CAST(products.category  AS STRING), '|RECORD|') AS lifetime_product_categories,
	STRING_AGG(DISTINCT CAST(products.brand  AS STRING), '|RECORD|') AS lifetime_brands
FROM thelook_web_analytics.order_items  AS order_items
LEFT JOIN thelook_web_analytics.inventory_items  AS inventory_items ON order_items.inventory_item_id = inventory_items.id 
LEFT JOIN thelook_web_analytics.products  AS products ON inventory_items.product_id = products.id 

GROUP BY 1

murilonigris

This a great feature!! We’re trying it out already!

Just one question, is it necessary to declare all columns inside the ‘explore_source’, or is there a way to retrieve all columns available by the explore?

For example, if we create a NDT that is retrieving columns from an explore that uses the view ‘orders’, when we create a new measure inside ‘orders’ view, do we also have to declare this column inside the ‘explore_source’ of the NDT?

Vincent1

Hi @murilonigris,

That is correct, if you want to access a field to aggregate on using that new measure you want to create, that field must be declared in the explore_source:.

Cheers,
Vincent

segahm

this is pretty awesome. what an interesting way to preserve lookml definitions.

ajhong91

I noticed that the include is for a “.explore.lkml” file. How do I create one of those and save it like that? I am only aware of .view and .model files. Alternative, how can i include only the explore file from within the model to prevent validation errors?

rufus

@ajhong91, as per Lloyd’s comment above we can create an explore file in the same way we would create a new view/model file we just need to add explore.lkml as an extension in the filename.

To add an explore file, we recommend choosing + and Create View, then specifying the explore’s name using an “explore.lkml” extension. Looker displays the file in the Other section of the LookML IDE.

clrcrl

@devwiredau - can you share your workaround here? I’m experiencing the same problem

graham2

Has anyone gotten this to work by including the model file rather than by creating a separate explore file?

I get a duplicate explore definition error, which I assume is created by circular dependencies, i.e. model M includes all views, including view V, which includes model M in order to use one of its explores as an explore source.

Is there a way to fix this without moving the explore into its own file and being careful about the view includes in that one?

Adding onto this question because it’s related. I just noticed that this whole NDT approach forces every model (defined in the default way) to include every explore file. That’s because the models include all the views, and the NDT view includes the explore it’s based on.

Should we stop importing *.view and start listing them explicitly? I’d still like to be able to use an NDT-defined view in a model without exposing its underlying explore, but that seems like a minor point.

I also want to add on that I am loving this new feature overall. It’s easily the most exciting advancement in LookML in the three years that we’ve been working with you guys. Making every query potentially reusable as a view is big deal. It’s a step toward LookML becoming a full replacement for SQL.

graham2

Probably a more important question than my other one. Is it possible to use a custom filter in an NDT? It doesn’t appear that the syntax supports it, yet.

graham2

@rufus, I’m not sure if @devwiredau ever wrote into support, but I’m running into the exact same issue with timezones. Here’s an overview of the syntax:

In view v1:

dimension_group: event_time {
  type: time
  convert_tz: yes
  timeframes: [
    time,
    date,
    week,
    month,
    raw
  ]
  sql: ${TABLE}.event_time ;;
}

Then explore based on event_time_date, resulting SQL for the dimension is:

DATE(CONVERT_TIMEZONE('UTC', 'America/New_York', v1.event_time))

All good so far.

But if I create view v2 based on this explore with column event_time_date (even using the Looker provided native derived table syntax based on the explore), the resulting SQL for view v2 is missing the CONVERT_TIMEZONE piece. @devwiredau’s workaround is the same one I used, and it works great, but it should be unnecessary, and I also don’t love hardcoding the reporting time zone in the LOOKML.

lloydtabb

Yes, look at the bind_fitlers: syntax. Here is an example:

https://github.com/looker/bq_thelook/blob/master/filtered_lookml_dt.view.lkml

lloydtabb

We’re working on this. There are a couple of problems here. When you convert a time to a date, there is some implied to be some timezone. The problem is that on a ephemeral derived table, you likely want the reporting timezone and on a persistent one, you actually need to make a decision. This is also compounded that Looker treats dates as string. We’re likely to make some change here, but we don’t want to do it twice so we are going a little slowly and carefully here.

graham2

Thanks, @lloydtabb. I definitely get the potential gotchas here, so it’s good to hear you guys are being careful.

Going by the principle of least surprise, I’d say I’d expect an explore of a view based on an NDT to match exactly the results in the explore it was based on, especially if created that view by copying generated LookML.

If it’s a timestamp field, then that works just fine if the timezone conversion is done in the final query (but it certainly can’t be done at both the derived table building and the query steps). But if it’s a date field, then it has to be done at the derived table building step for the aggregates to be done with the right time boundaries.

graham2

I was thinking about the “Custom Filter” query syntax (Looker Expression) in an explore, which is necessary when, e.g. comparing one dimension to another. Is that possible?

It’s not a big deal to define a derived dimension in a view file somewhere, but it’s sometimes nice to keep cross-view calculations out of view files completely.

rufus

Custom filter expression syntax is specific to custom filters. Native Derived Tables perform the same function as writing a SQL query, but are expressed natively in the LookML language, so you should be able to compare one field to another. Do you have an example of what you’re trying to do here?

graham2

Hi @rufus. I generated the NDT LookML and noted that all the custom filter syntax disappeared.

I think in this case it was confirming a lead capture date was prior to a purchase date. This is possible via custom filter syntax, but I couldn’t figure out how I to do it in NDT syntax.

rufus

Hey @graham2, could you send an example to help.looker.com? We’d be happy to take a look and figure out how to make it work.

quinn_morrison

Hi @graham2,

I previously saw these duplicate explore errors when trying to include a model file in a NDT, and this issue has been reported. However, the errors that I experienced have since disappeared, and I’ve been able to include model files in my NDTs (with the *.view import in the model). Have you tried including a model in an NDT recently, and are you still experiencing these errors?

Morguitaine

rufus, can you help me too? I have some questions and I wanted to ask you if I could get some answers as well? i’m currently doing nothing at home but searching for drug reviews because of my recent health issues. in the meantime wanted to learn and this site seems an amazing place. thanks in advance. and by the way thanks everyone as i found some help through the posts here.

rufus

@Morguitaine of course!

Discourse is a great place to ask general questions and to check whether other users have experience that can help with what you’re working on (or to just share something cool you’ve done in Looker). However, sometimes it can be beneficial to send questions directly to help.looker.com, especially if they are related to your particular data model.

Send over an email and we’ll be happy to help.

Mark_Goodwin

Hi @quinn_morrison

I recently tried including the model file just as @graham2, and I also got the duplicate view error. My work around was to create a new “explore.lkml” file that includes the explore I need, but with a different name. I had to use the “from:” definition to allow it to happen, like this:

explore: actions_ndt {
from: actions
persist_for: “2 hours”
group_label: “Actions”
label: “Analyze Communications Hub Activity”

Is this the right way to approach this? Or at least what’s the best practice? I already have the actions explore in another model file and I don’t want to take it out.

Mark

quinn_morrison

Hi @Mark_Goodwin,

Thanks for sharing this! Best practices for utilizing includes in NDTs varies from project to project. Often, when there are multiple models in the project, you will see these duplicate explore errors when including a model file in the NDT view. In these scenarios, its best to take your approach, involving creating another file in which you define the explore that will be used in the NDT, and including this in the NDT view.

Another option is to create your NDT view file with no includes defined. Without specifying any additional includes, you will still be able to create and explore your NDT normally. The only draw back is that the validator will throw and error for not including a file with the explore_source definition in the NDT view file. However, if you do wish to go this route, you have the option to turn off requiring LookML validation to commit in your instance admin panel, and simply live with this Lookml error.

Hope this helps!

Quinn

Ayrton_Barros

Hello guys, I’m trying to use the bind_filter feature with a string dimension, and I’m not having any luck 😦
Anyone around here had success with it? Tks!

Morgan1

Hi @Ayrton_Barros,
If you’d like to come on chat or send us an email at help.looker.com, we’d be happy to take a deeper look!

devwiredau

Hey @graham2 did your convert_timezone hack break recently with one of the recent releases? Mine did and I’m trying to figure out how to fix.

@lloydtabb Did you guys push a solution for this? Do you have any explanation on how to deal with this now because the convert_timezone doesn’t work anymore.

This was resolved. No big issue with the new timezone functionality.