We’re announcing a new project that we’ve just open sourced: LookML-Gen for Python
##Overview
If you have dynamic data definitions or if aspects of your system are driven by changing metadata, reporting on it is often cumbersome and not at all user friendly. Entity-Attribute-Value (EAV) models are a classic example of this. (See this Looker discource post for background.)
As adopters of Looker know, LookML provides a powerful way to describe data. The Looker platform processes LookML and allows end users to explore data according to the descriptions and relationships found in LookML files.
LookML-Gen allows Python developers to examine dynamic data definitions then programmatically generate LookML to describe that data. By outputting LookML, end users get the benefit of reporting in a way that makes sense to them, picking dimensions and measures with meaningful names.
Simple Example Code
Install it:
pip install lookml-gen
Use it:
from lookmlgen.view import View
from lookmlgen.field import Dimension, DimensionGroup, Measure
from lookmlgen.base_generator import GeneratorFormatOptions
view_name = 'my_view'
view = View(view_name, sql_table_name='my_table')
view.add_field(Dimension('id', type='number', primary_key=True))
view.add_field(DimensionGroup('created'))
view.add_field(Dimension('name'))
view.add_field(Dimension('quantity', type='number'))
view.add_field(Measure('total_quantity', sql='${TABLE}.quantity', type='sum'))
with open('%s.view.lkml' % view_name, 'w') as file:
view.generate_lookml(file, GeneratorFormatOptions(view_fields_alphabetical=False))
After execution, the file ‘my_view.view.lkml’ will contain:
# STOP! This file was generated by an automated process.
# Any edits you make will be lost the next time it is
# re-generated.
view: my_view {
sql_table_name: my_table ;;
dimension: id {
type: number
primary_key: yes
sql: ${TABLE}.id ;;
}
dimension_group: created {
type: time
timeframes: ["time", "date", "week", "month"]
datatype: datetime
sql: ${TABLE}.created ;;
}
dimension: name {
sql: ${TABLE}.name ;;
}
dimension: quantity {
type: number
sql: ${TABLE}.quantity ;;
}
measure: total_quantity {
type: sum
sql: ${TABLE}.quantity ;;
}
}
##Real-World Use
Generating LookML for actual dynamic data would examine definitions found in the database and create fields in a view, roughly like this:
from lib.lookml_gen.view import View, DerivedTable
from lib.lookml_gen.field import Dimension, DimensionGroup, Measure
# imports for database access not shown...
view_name = 'my_dynamic_view'
view = View(name=view_name, label='My Dynamic View')
view.add_field(Dimension(name='entity_id', type='number', primary_key=True))
# Query for data definitions that are relevant to this view
query_results = db_connection.query('SELECT id, name FROM attribute_definition') #' Comment to fix markdown formatting
pdt_sql_lines = ['SELECT attribute_value.entity_id,']
for row in query_results:
pdt_sql_lines.append('\n MAX(IF(attribute_definition.id = {0}, attribute_value.value, NULL)) AS `{1}`'.format(row['id'], get_column_name(row['name']))
view.add_field(Dimension(row['name']))
pdt_sql_lines.append(
'\nFROM attribute_definition'
'\nLEFT JOIN attribute_value ON attribute_definition.id = attribute_value.attribute_definition_id'
'\nGROUP BY attribute_value.entity_id') #' Comment to fix markdown formatting
pdt_sql = ''.join([line for line in pdt_sql_lines])
dt = DerivedTable(pdt_sql, sql_trigger_value="SELECT DATE(CONVERT_TZ(NOW(), 'UTC', 'US/Pacific'))",
indexes=['entity_id'])
view.set_derived_table(dt)
with open('%s.view.lkml' % view_name, 'w') as file:
view.generate_lookml(file, GeneratorFormatOptions(newline_between_items=False, omit_default_field_type=False))
This would generate a LookML View for a Persistent Derived Table (PDT) that lays out EAV data in a one-entity per row fashion, with column names that match the attribute names, like this:
# STOP! This file was generated by an automated process. Any edits you make
# will be lost the next time it is re-generated.
view: my_dynamic_view {
label: "My Dynamic View"
derived_table: {
sql:
SELECT attribute_value.entity_id,
MAX(IF(attribute_definition.id = 1000000075, attribute_value.value, NULL)) AS `business_region`,
MAX(IF(attribute_definition.id = 1000000076, attribute_value.value, NULL)) AS `follow_up_requested`
FROM attribute_definition
LEFT JOIN attribute_value ON attribute_value ON attribute_definition.id = attribute_value.attribute_definition_id
GROUP BY attribute_value.entity_id ;;
sql_trigger_value: SELECT DATE(CONVERT_TZ(NOW(), 'UTC', 'US/Pacific')) ;;
indexes: ["entity_id"]
}
dimension: entity_id {
primary_key: yes
hidden: yes
type: number
sql: ${TABLE}.entity_id ;;
}
dimension: business_region {
label: "Business Region"
sql: ${TABLE}.business_region ;;
}
dimension: follow_up_requested {
label: "Follow-up Requested"
sql: ${TABLE}.follow_up_requested ;;
}
}
This example code assumes all values are string types, but it could easily be enhanced to examine a type column from the data definitions and generate LookML ‘type: number’ or dimension_group with ‘type: time’. (The implementation for the get_column_name() function isn’t shown, but it would replace spaces with underscores in attribute names and convert/remove any characters that shouldn’t appear in a column name.)
On top of this, you could write code to automate updating the repo where your models are stored. At SymphonyRM, we have periodic jobs that:
- Clone a repo containing LookML models
- Generate LookML views for our dynamic data (using LookML-Gen)
- Check if the views have changed
- If so, create a new branch, commit and push the changes, then create a Github pull request
Our team gets an email notification for the pull request, we do a quick review before merging, and then we activate the change in our production Looker instance. At some point, we’ll fully automate this process.
##Status
The early 0.1 version of LookML-Gen is available now. Full LookML support is far from complete: at the moment only basic aspects of Views and Fields are supported and there is no support for Explores yet. However, it does cover much of the common functionality, including Persistent Derived Tables, and we’ll be continuing to enhance it. The code can easily be extended and we’d love to get pull requests to fill out additional functionality.
The open source project is available at: https://github.com/symphonyrm/lookml-gen
Documentation is here: http://lookml-gen.readthedocs.io/
##Coming Soon
We’ll be adding support for generating sections of explores, i.e. joins. (Joins for dynamic data often comprise only part of an explore so the automated generation should process an existing model file and replace only the relevant joins, perhaps indicated by start/end marker comments.)
##Feedback
We’d love to get thoughts and feedback from the Looker community. You can reply to this thread, add issues on the Github project, or email jschmid at symphonyrm dot com