Jupyter Notebooks and Looker

  • 18 October 2019
  • 6 replies

Userlevel 2

Hi all! We’re getting a little more sophisticated with our data science work here at Zearn, and I thought I’d see if you all have some advice. Most of this work happens in Python or R using Jupyter notebooks based on data pulled from Looker, and I’m wondering how best to integrate Looker. In particular, I have two questions:

  1. Does anyone have best practices for pulling data into notebooks from Looker? One challenge we run into using the API is that some of the queries we want can take a very long time to run. If the data won’t change (e.g. we’re pulling a fixed time period), we usually just download the results from Looker into a CSV and use that, but I don’t love the way that decouples the data from the source.

  2. Hosting and sharing notebooks: It’s easy to link people to looks or explores in Looker when we want to share something in Looker, but obviously we can’t do this with notebooks. Are there tools folks like to do something similar with notebooks?

Thanks in advance!

6 replies

Userlevel 6

For queries that take a while yet the data is not changing just setup cache so you don’t have to wait for it.

Userlevel 2

Thanks Ian! Unfortunately we can’t set a high cache because other queries in the explores we use might change daily.

Though this got me thinking… maybe I will make a second explore that inherits everything from our base explore but has a different cache setting, and we can know to use that for historical queries. I wonder if there is a better way?

Userlevel 6

Extend, remove the stuff which changes regularly and set a high cache time. It’s an option anyway, it’s only a bit of a pain if the main explore changes regularly. If you pay per query this won’t be any good for you either but you could pre heat the cache every morning incase someone needs it?

Userlevel 3

I’ve found pandas/notebooks more useful that Looker’s derived tables, which makes me wonder … why/how are you going from looker to notebooks?

also if you make a derived db in looker persist, you can read that database in Jupyter

Userlevel 2

Good question @mprogano. Mainly we have a bunch of metrics (and joins between tables) defined in Looker and don’t want to have to reinvent the wheel in pandas. Two reasons for that:

  1. Pure effort/analyst time writing the code to do it

  2. Correctness - we’ve QAd the stuff in our LookML extensively and for complicated queries it’s much safer to use that output than writing custom code

Maybe one option is taking more advantage of Looker’s NDTs to persist those queries in the db.

Userlevel 3

yeah, that’s fair. I’d totally said read it from persist tables written by Looker seems to be the most sensible thing (as long as you can balance changes/updates on both sides)