Securely get data to excel (automatically - not download)

  • 3 January 2020
  • 10 replies

Userlevel 7
  • Good Looker
  • 412 replies

Currently we sit securely behind a firewall. Many people use public looks in order to suck the data into an XL workbook using the webquery functionality. A lot of people have a lot of processes using this functionality (eg workbooks with loads of sheets in that pull lots of different looks data and then generate a powerpoint).

We are looking at moving to the cloud and so the public looks have to go.

It is important people own their own workbooks and not have one generated for them since they do a lot of bits and pieces in them and they are a living document.

We have thought about creating a macro or plugin which uses the API however this means saving users credentials in the workbook which brings its own complications.

Would be great to get some thoughts from people who have come up against this or have some ideas.



Best answer by Ryan_Tuck 7 January 2020, 22:42

View original

10 replies

Userlevel 1

We could definitely use something like this at Allbirds!

Userlevel 4

How about an ETL job that either puts the data directly into Excel, or into something Exel can consume?

Userlevel 3

We built an internal tool to pipe looks to google sheets on a daily / hourly cadence because our security team wasn’t fond of public look URLs. Users love it. I imagine a very similar thing could be done to write to Excel spreadsheets stored in Google Drive or something similar as long as they were accessible programmatically.

I think I’m somewhere between Quinn and Ryan’s ideas. Massive downloads of data to Excel doesn’t have to be realtime, but in batch, perhaps once a day. My team discourages such acts as users should be able to filter data using the Explore itself. Still, Rome wasn’t built in a day. I’d like to see the row limit extended to 100k and if it helps the application, simply allow this in batch or on a limited schedule.

Userlevel 7

The use case here is very much to generate a standardised powerpoint presentation, I wouldnt say there is filtering that occurs in the worksheets, its a place to collate many looks worth of data and then generate a presentation. If we have it as a scheduled thing then I suspect we would be getting many adhoc requests to run it at times of the day when it hasn’t just run - due to people changing looks and wanting their data into a sheet.

The solution of making some kind of system that the user can put in a path to their XL spreadsheet, say which sheet and the LookID and then press a run button (even if this is saved for further use) is just a million miles away from the ease of a webquery (not to mention the effort to build an maintain that process).

Perhaps I am just being lazy?

Ryan_Tuck, I’d love to get some description of how your team approached piping Looks to Google Sheets, or even a code sample if that’s not too high a hill to climb. (I can imagine a straightforward path, but it’d be great to learn if there were gotchas or things you figured out along the way.)

Userlevel 3

Broadly speaking, we use the Looker API to run a specific look, then use the google sheets API to upload those look results to a target sheet.

We use a single configuration spreadsheet where any user can add a new “pipe” that defines their Look URL and the target spreadsheet title / sheet title.

We haven’t open-sourced the code so it’s not shareable unfortunately – sorry!

Some gotchas to be aware of:

  • The google API leverages a service account that is configured in GCP, and needs to be granted edit access to target spreadsheets

  • I’d recommend using bulk actions wherever possible to load data to google sheets. Some python APIs, like gspread (at least, last time I checked), focus on individual cell updates, which IIRC was an inefficient approach for our use-case

  • Delete cell contents, but don’t delete rows in the target sheet! Users will likely have formulae that are referencing cells in the data that gets loaded in.

  • Ensure that monitoring is in-place, so that users can be aware of when their pipes fail, and when data was last loaded into their sheet.

Hope that all points you in the right direction @Aaron_Harnly

IanT, can you share more around the webquery you were using? Looking to build something similar, would be gr8 to leverage any existing wheels before running out and inventing my own.

Userlevel 7

It’s standard functionality:

We have a in house tool which you fill in the lookid, your googlesheetsid and which sheet and it runs it at specific intervals (haven’t worked it to include xl because it’s not worth it) but it’s an overhead which these public looks don’t have.

Regarding our move to cloud, we will actually retain public looks and implement ip whitelisting (which comes with its own sacrifices) to ensure security.

Userlevel 4

Is there any progress in this Topic.

Currently we face the issue, that some powerusers aksed us to create snowflake views, they can use via ODBC and Excel to rebuild some Reports which  

  • needs Extra Input Data (Parameter)
  • Needs to be commented by the business
  • Needs to be in a specific format 
  • and so on

So we are still looking for a proper solution, to keep the logic in looker and still open the “world” to excel users.