We have a certain use case in mind and I’d like to get feedback from the community.
Historically, reports had been generated by executing queries and pasting the results into Excel - error prone, time consuming, etc.
We’ve now modeled much of the query business logic and are able to replicate the same datasets and visualizations using Looker, but are trying to understand our options for scheduling / packaging the results for delivery. Looker seems to excel as an interactive tool, but the scheduling / exporting options seem limited, and it’s not always feasible to bring every data consumer into the tool as a user (board members, executives, external clients, etc.).
For a given deliverable that we are prototyping, we’ve organized all of the content into a single space, with saved Looks (~10 now, but will scale to as many as 50), and a single saved (user defined) dashboard containing all the relevant visualizations from those Looks. I was hoping that we could bulk schedule all of the contents of the space to export (CSVs for each of the Looks, PDF for the dashboard) into a single S3 folder. If I’m being greedy, my ultimate desire would be some monthly export process such that on May 1, all files (PDF and CSVs) from the associated content (dashboard and looks) in a given space end up in a zip file together that’s dated, so we can organize and archive the historic files.
Two options that we’ve come up with - conceptually - both involve some scripting:
Approach 1: Dump exports to staging area, then re-organize files
- (Manually) schedule exports of the looks and the dashboard to an S3 bucket [Deliverable123 / Recent].
- Schedule a task (cron, leveraging AWS CLI etc.) to run after Step 1, which moves all files from [Deliverable123 / Recent] >> [Deliverable123 / 2017-05], zips the contents of [Deliverable123 / 2017-05] to a single file, and clears the contents of [Deliverable123 / Recent].
Besides having to write and maintain a script to do Step 2, the biggest downside of this approach is that management of individual schedules is still manual in the UI, so setting up anything new takes effort, they can be set up inconsistently, and things like changing a bucket name, etc. will be fairly tedious.
Approach 2: Manage schedules and destinations with Looker API
- Leverage the Looker API to manage schedules on a per-space basis (in pseudo-code, something like: for each look in this space, set the S3 destination = [Deliverable123 / 2017-05], set format = CSV, set require changed results = “no”, etc.). We could run the script each time we want to re-configure the destination (once a month or on demand)
- Have a manual or scripted process to organize all of the individual files from a given S3 bucket. S3 unfortunately doesn’t offer a “Download All” option from their UI, so this may still require some scripting.
Approach 2 likely scales better, and once we’ve gotten familiar with the Looker API, it may offer us the ability to re-generate a set of files after data replication issues, etc., without individual schedules. But at this point we don’t have familiarity with the API so there will certainly be a learning curve, and potentially 2 scripts to manage instead of 1.
Before we go too far down either of these paths, I wanted to hear if anybody had attempted or accomplished something similar, either through one of the methods above, or via some other method I haven’t considered.
Something else worth mentioning is - while we anticipate updates to Looker that will continue to bring LookML dashboards and User Defines Dashboards closer together - we have a similar desire to schedule exports of some LookML dashboards.
I’ve created a working script (happy to share the base code) to schedule out looks / dashboards, but the destination emails *apparently* have to be for people who have a Looker user account.
Thanks very much for your helpful reply!! It’s great to hear that you built a working solution!!
I agree. I would think there are more folks needing to solve this same scenario.
We are mostly relying on “Approach 1” as I described it above - we have a bunch of individual looks that are scheduled to run once a month, delivering results to a specific staging S3 bucket, and then we have a python script that a team member executes manually which re-organizes the files out of the staging folder into dated archive folders, and zips the most recent files into a single deliverable which can be emailed to a consumer.
If we ever need to re-generate all of the files (because there is an upstream data loading issue, etc.), we have a separate script which leverages the Looker API to execute all of the saved Looks.
It’s not a completely “hands off” automated solution, which I don’t love, and I don’t think it scales very well, but luckily our requirements to generate these kinds of deliverables have not changed that much.
I am still surprised that there aren’t more Looker users in a similar scenario, or I’m curious if/how others have approached this.
Hi Timothy! I would really appreciate hearing how you made out with this. We have a similar need we are exploring.
You might look into Zapier’s webhooks as an intermediary tool to accomplish what you are trying to do without a script. Since you can schedule to a webhook with Looker, you can then leverage Zapier to move the data wherever you need. They offer a number of processing steps in their “Zaps” that might accomplish this task. Check out the formatting step for example. There’s a few FTP options for posting the webhooks including S3 and BrickFTP.