Solved

Update(write-back) data on BigQuery from Looks?

  • 24 October 2018
  • 28 replies
  • 2060 views

Userlevel 3

Is there a way to let users update data on DB right from looks?

I read about actions here but I found it very abstract.

Can someone please explain the process for this?


Thanks,

Krish.

icon

Best answer by jesse.carah 25 October 2018, 23:07

Hey Krishna,



I’ve been able to write data from Looker to BigQuery using both Data Actions as well as the Looker Action Hub. In either case, you’ll need to push data from Looker to some middleware that will interpret the webhook from Looker and perform the necessary operations to then stream the data to BigQuery.



Luckily, Google has a great service called Google Cloud Functions that makes this really easy. Like AWS’s Lambda, Cloud Functions let you deploy code that gets executed based off of some event. With a data action, you can push JSON containing data from Looker as well as user-defined form parameters to a Cloud Function endpoint. The Cloud Function then parses the JSON, extracts the relevant values, and calls on the BigQuery SDK to stream the results to BigQuery.



Here’s a quick overview of how to use Cloud Functions to stream data from Looker to BigQuery. In this example, we’ll create a data action and cloud function that lets an end user persist an annotation to BigQuery:



Create the Data Action



In this example, we’re going to attach a data action to field, and allow end-users to mark whether or not a name is a cool name.



 dimension: name {

type: string

sql: ${TABLE}.name ;;

action: {

label: "Cool Name?"

url: ""



param: {

name: "name"

value: "{{ value }}"

}

form_param: {

name: "annotation"

type: select

label: "Cool name?"

default: "No"

description: "Do you think that this name is a cool name?"

option: {

name: "No"

}

option: {

name: "Yes"

}

}

}

}



Note: We’re going to leave the url blank for now. Once we’ve spun up the cloud function we’ll paste the endpoint in.



Configure the Cloud Function





  1. Follow the first three steps here to select your GCP project and enable API access for Cloud Functions and BigQuery.


  2. Navigate to https://console.cloud.google.com/functions and ensure that you’ve selected the same project in which BigQuery resides.


  3. Click Create Function, and give it a name. Select a memory allocation (in my experience, you can select the minimum for this type of operation).


  4. Select HTTP as your trigger


  5. Select your preferred runtime (for this example, I will use Python 3.7, but versions of Node.js are also supported).




Creating your Cloud Function



We’re now going to write a simple Python function that writes the user selected annotation to BigQuery, and place it in main.py



import google.cloud.bigquery as bigquery

import datetime

import time



def annotation(request):

r = request.get_json() # Fetch the data action JSON



client = bigquery.Client()

dataset_id = '' # Replace with name of the BQ dataset

table_id = '' # replace with your table ID

table_ref = client.dataset(dataset_id).table(table_id)

table = client.get_table(table_ref) # API request



# request variables

name = r['data']['name']

annotation = r['form_params']['annotation']



# system variables

sys_time = int(time.time())



row_to_insert = [

(

name,

annotation,

datetime.datetime.fromtimestamp(sys_time).strftime('%Y-%m-%d %H:%M:%S')

)

]

row = client.insert_rows(table, row_to_insert) # API request to insert row

return '{"looker": {"success": true,"refresh_query": true}}' # return success response to Looker



Additional things to configure:





  • The ‘Function to Execute’ is annotation


  • Make sure to include a line for google-cloud-bigquery==1.5.0 in requirements.txt



  • Click the ‘Trigger’ tab and copy the URL. Paste this in the action that you setup in the first step.




The End Result:





Caveats:





  • Right now, this function is open to the internet, so anyone with your Cloud Function URL can write data to your BigQuery instance. Consider adding in some checks/validation that the request is coming from Looker.


  • For a more comprehensive and secure approach, consider adapting this using the Action Hub framework. I can provide more detail on this if you’re interested.




Cheers!


Jesse

View original

28 replies

Userlevel 3

Hey Krishna,


I’ve been able to write data from Looker to BigQuery using both Data Actions as well as the Looker Action Hub. In either case, you’ll need to push data from Looker to some middleware that will interpret the webhook from Looker and perform the necessary operations to then stream the data to BigQuery.


Luckily, Google has a great service called Google Cloud Functions that makes this really easy. Like AWS’s Lambda, Cloud Functions let you deploy code that gets executed based off of some event. With a data action, you can push JSON containing data from Looker as well as user-defined form parameters to a Cloud Function endpoint. The Cloud Function then parses the JSON, extracts the relevant values, and calls on the BigQuery SDK to stream the results to BigQuery.


Here’s a quick overview of how to use Cloud Functions to stream data from Looker to BigQuery. In this example, we’ll create a data action and cloud function that lets an end user persist an annotation to BigQuery:


Create the Data Action


In this example, we’re going to attach a data action to field, and allow end-users to mark whether or not a name is a cool name.


 dimension: name {
type: string
sql: ${TABLE}.name ;;
action: {
label: "Cool Name?"
url: ""

param: {
name: "name"
value: "{{ value }}"
}
form_param: {
name: "annotation"
type: select
label: "Cool name?"
default: "No"
description: "Do you think that this name is a cool name?"
option: {
name: "No"
}
option: {
name: "Yes"
}
}
}
}

Note: We’re going to leave the url blank for now. Once we’ve spun up the cloud function we’ll paste the endpoint in.


Configure the Cloud Function



  1. Follow the first three steps here to select your GCP project and enable API access for Cloud Functions and BigQuery.

  2. Navigate to https://console.cloud.google.com/functions and ensure that you’ve selected the same project in which BigQuery resides.

  3. Click Create Function, and give it a name. Select a memory allocation (in my experience, you can select the minimum for this type of operation).

  4. Select HTTP as your trigger

  5. Select your preferred runtime (for this example, I will use Python 3.7, but versions of Node.js are also supported).


Creating your Cloud Function


We’re now going to write a simple Python function that writes the user selected annotation to BigQuery, and place it in main.py


import google.cloud.bigquery as bigquery
import datetime
import time

def annotation(request):
r = request.get_json() # Fetch the data action JSON

client = bigquery.Client()
dataset_id = '' # Replace with name of the BQ dataset
table_id = '' # replace with your table ID
table_ref = client.dataset(dataset_id).table(table_id)
table = client.get_table(table_ref) # API request

# request variables
name = r['data']['name']
annotation = r['form_params']['annotation']

# system variables
sys_time = int(time.time())

row_to_insert = [
(
name,
annotation,
datetime.datetime.fromtimestamp(sys_time).strftime('%Y-%m-%d %H:%M:%S')
)
]
row = client.insert_rows(table, row_to_insert) # API request to insert row
return '{"looker": {"success": true,"refresh_query": true}}' # return success response to Looker

Additional things to configure:



  • The ‘Function to Execute’ is annotation

  • Make sure to include a line for google-cloud-bigquery==1.5.0 in requirements.txt

  • Click the ‘Trigger’ tab and copy the URL. Paste this in the action that you setup in the first step.


The End Result:



Caveats:



  • Right now, this function is open to the internet, so anyone with your Cloud Function URL can write data to your BigQuery instance. Consider adding in some checks/validation that the request is coming from Looker.

  • For a more comprehensive and secure approach, consider adapting this using the Action Hub framework. I can provide more detail on this if you’re interested.


Cheers!

Jesse

Userlevel 3

Thank you so much Jesse!

Going to try this now!

Userlevel 3

Yeah, let me know how it goes! Happy to discuss further.

Userlevel 1

Is it possible to push all the data this way into BigQuery? This would be super useful to create custom segments of users for example.

Userlevel 3

Hello Jesse!

I followed the exact process, but I am seeing an error on Cloud Functions:



NameError: name ‘bigquery’ is not defined



I included google-cloud-bigquery==1.5.0 in requirements.txt but still am seeing the same error.


EDIT:

Fixed this by adding “from google.cloud import bigquery”

Userlevel 3

Hey Krish, good catch. It looks like I didn’t copy over the first line of the cloud function when I pasted it in. Did you get the action to work?

Userlevel 3

Dimitri – yes, that is totally possible, but we’ll need to leverage the Action Hub framework to push all results to BigQuery. I’ll take that as a challenge and try to get you a working example sometime next week.

Userlevel 3

Hey Jesse,

Yes I finally made it work. But there is one issue (which I am trying to fix).

Data is being duplicated whenever I try to update. Any idea why that’s happening?


Capture

I tried updating rows with ids 8 and 10, and you can see data being duplicated

Userlevel 1

Amazing, thank you @jesse.carah! Looking forward to it.

Userlevel 3

Hey Krish,


Check out this discussion about the append-only nature of BigQuery.


I think the best move here is to have your Cloud Function insert a timestamp when each record is created, and create a view that selects only the most recent record. One approach to doing that is explained here.


Cheers!

Jesse

Userlevel 3

Thank you, Jesse!

I now have a much better picture of this!

Userlevel 3

Hey Dimitri,


I just wanted to update you that I’m making progress, but my goal to getting a working POC up and running this week was perhaps too bold.


In the meantime, could you elaborate on the specific use-case? What sort of data are you trying to push back to BigQuery?


Cheers,

Jesse

Userlevel 1

@jesse.carah Whenever people construct a view which represents a segment, e.g. “all user_ids who did xyz and live in London”. I would like to enable them to save this list of user_ids as a segments so that it can be used in other dashboards/analysis. The only missing part is to be able to save the output of an explore into an existing BigQuery table. This way everyone would be able to create custom (arbitrary complicated) segments. Hope that makes sense!?


I think this a very common analytical pattern in general!

Userlevel 3

@Dimitri_Masin – I got this working 🙂 . I’m going to make a post this afternoon describing how to get this up and running.

Userlevel 3

@Dimitri_Masin check out the new post here. Let me know if you have any questions.

Userlevel 3

Hey @hwimoon let’s take your question here:



Hi Jesse,


I have been trying to follow your instructions on using Data Action literally step by step, and Looker keeps telling me that “the form was not accepted”. What does the error mean and how can I get around this?


Please let me know! Solving this would be phenomenal for me…!


Thank you,

Hwi Moon



I’m curious if you can see anything in the Cloud Function logs indicating whether the action completed correctly. My code snippet doesn’t do error handling all that well 😛. My guess is that the function is failing and returning some exit condition that Looker doesn’t understand.


Take a look at the logs and let me know what you see – you can access them directly in the cloud function console.

Userlevel 1

Thanks for your reply! I have just try to re-run and see if I get anything in the log, but there are no new entries. I think I messed up somewhere in LookML(URL) or configuration/permissions. But I think I followed everything 100% in your instructions… and very confused!


The error message is as shown…

Userlevel 3

@hwimoon please share your LookML. You can redact anything sensitive if you’d like.

Userlevel 1

@jesse.carah Sorry for the late post - missed the notification for some reason.


I think there is an error in the cloud function (Python 3.7). It seems to me that my “r=request.get_json()” does not have any keys or values in it, causing errors when I do anything like r[‘data’] or r[‘form_params’].


I could not figure out why - please help!


Screen Shot 2020-10-14 at 11.57.38 AM

Userlevel 3

@hwimoon Very weird! If you add in print(r) below the r = ... line, you’ll be able to observe what Looker’s sending over in the logs. For example, my function is parsing a dictionary that looks like this:


{'type': 'cell', 'scheduled_plan': None, 'attachment': None, 'data': {'value': 'Emma', 'rendered': 'Emma', 'name': 'Emma'}, 'form_params': {'annotation': 'Yes'}}


Give that a spin and let me know what you see. Also, go ahead and share the LookML for your action with the endpoint redacted.

Userlevel 1

@jesse.carah


It seems that the variable “r” is empty (i.e., {}). Very weird.


Here is my LookML code:


dimension: managers {
label: "Manager or BM"
type: string
sql: ${TABLE}.ManagerName ;;

action: {
label: "Test"
url: "https://us-west2-XXXXX.cloudfunctions.net/manager_mtd_input_test"
param: {
name: "manager"
value: "{{ value }}"
}
form_param: {
name: "performance"
type: string
label: "Enter Performance"
}
}
}
Userlevel 3

@hwimoon I don’t know what to say – I tested your LookML out, and am seeing data flow through. You can use this tool to inspect the contents of your payload, by swapping out your endpoint with one generated by the service.

Userlevel 1

@jesse.carah


I just checked out the link and it does seem like the data is flowing through. At this point, I would think the request is being rejected by the Cloud Function (GCP), which is causing the request to be empty. Do you think it could be some permission related problem?

Userlevel 6
Badge

Here’s one possibility - Between the time the original article was written and now, GCP has changed some defaults for newly deployed Cloud Functions. You’ll want to make sure that you have opted-in to a publicly accessible function.


See here for more details: https://cloud.google.com/functions/docs/securing/managing-access-iam#allowing_unauthenticated_function_invocation

Userlevel 1

@jesse.carah

@fabio


Hey Fabio, thank you so much for your reply! I should have continued to share my progress. I was able to get my Cloud Function working by checking the checkbox “Allow unauthenticated invocations” as shown below when first creating the function.


Very excited that it is working now - however, the word “unauthenticated” concerns me a little bit. It seems that anybody with the trigger URL is able to write to our data, is that correct? If so, how do people safely manage and protect their database while using this “write-back” workflow? Please let me know. Thank you very much!!


Reply