Question

Bulk content validation through API

  • 4 August 2018
  • 6 replies
  • 405 views

The typically slow content validator is cripplingly slow during large-scale refactors. Is it possible to bulk update content references to models/explores/views through the API? If not, I’m hoping this is at least possible to do by contacting Support? If not, our content validation is going to take hours.


6 replies

Userlevel 2

Hello @hlburak,


The content of Looks can be updated via the API in bulk by taking the following steps:



  1. Using the /api/3.0/looks endpoint to get all Looks

  2. Getting the associated queries for those Looks after getting the query ID with /api/3.0/queries/{query_id}

  3. Updating the fields in the query objects

  4. Create new queries using /api/3.0/queries based on the updated query object

  5. Update the Look with /api/3.0/looks/{look_id}

    with the new query


Here is a basic sample script in Ruby of an example of how this would look:


require 'looker-sdk'

sdk = LookerSDK::Client.new(
:client_id => ENV['CLIENT_ID'],
:client_secret => ENV['CLIENT_SECRET'],
:api_endpoint => "https://<instance name>:19999/api/3.0"
)


looks = sdk.all_looks(:fields=> 'id, query_id')

looks.each { |look|
#get query from query ID
query = sdk.query(look.query_id).to_attrs
#replace all instances of field users.gender with users.gender_new
query[:fields].map! { |f| f == "users.gender" ? "users.gender_new" : f }

#remove client ID
query[:client_id] = {}
#create new query in Looker
new_query = sdk.create_query(query)

updated_look = sdk.update_look(look.id, :query_id => new_query[:id])
if new_query[:id] == updated_look[:query_id]
puts "Success"
else
puts "Error"
end

}

There are similar endpoints for dashboard_elements in the 3.1 API, but that API is currently in beta and can only be used from within the interactive API docs.


Also, if you would like to first determine which looks are broken the queries could be run to find the queries that return an error (although you may want to first update the queries with a low row limit to limit the amount of strain put on the database). I hope this helps!


Best,

Ryan

Hi, I am trying to implement a bulk find/replace field(aka dimension) via the api, using the above code as a template. It is quite difficult as there are a huge number of places that could be affected. For anyone else who tries to do this, the places I have found so far are:




  • query.fields (as above)


  • query.filters this is a dictionary of dimension > filter applied to it. For example "filters": { "application.submit_date": "7 days,NULL" }. I think you have to update all filters to apply these changes. Also need to null out query.filter_config when making this change.


  • query.sorts this is an array of sort fields, in the format "[fieldname] [desc?]" so some regex is involved here


  • query.dynamic_fields String encoded json array of computed columns. More regex needed.

  • All of the above needs to be applied for all looks as well as dashboard elements


  • dashboard.dashboard_filters.dimension Name of the field a dashboard filter gets it’s values from if its a field type. If it is not a field type then it can be ignored.


About halfway through this post I gave up. I still haven’t figured out how to handle:



  • Dashboard filters mapped to fields in some tiles but not others (and on different fields)

  • Merge queries, and filters across them


I’ve written enough code to know that doing this through the api is possible but every hard and error prone. I do not recommend it. Would be great if looker exposed the content validation functionality as its own api endpoint.

Userlevel 2

@ryan.dunlavy, any additional information on this issue? We want to give a giant +1 to @hlburak’s question.


Here’s our situation, our Looker product is repackaged in our primary product. One of the features we offer is the creation of “Custom Dimensions.” Not to be confused with dimensions made inside Looker, these dimensions are created from primary product, which generates LookML and pushes the changes to a customers Looker instance. Unfortunately, at the time of implementation, there wasn’t the needed API endpoint to allow us to handle deletion of the LookML. Now, two years later, our customers have many looks and dashboards with dimensions they no longer need and have deleted. As part of the deletion process, new LookML is generated removing the deleted field, but we are trying to find a way to remove the missing field from content, programmatically.


The Content Validator is not a great option. For us, it takes 20 minutes to complete and every time we “Remove” a dimension it restarts validation.


And for others who may come along, the comment linked below explains how to update a Look.



@Ben_Edwards, huge thanks to you, good buddy. Your notes may end up saving us hours of development.

Userlevel 2

Certainly would be great to be able to run content validation via API for a specific project. We are using Gazer for content deployments between different instances. If we could add a content validation step before the content is migrated to production environment, that would minimize production impact. We don’t want to validate all the content, but just what is related to a specific project

Big +1 on this. The Content Validator’s GUI is so slow it is unusable.

Being able to replace content is key to be able to do incremental LookML refactors, which in turn is key to be able to maintain a big Looker project healthy.

Userlevel 6
Badge

The content validator should contain filters which you preset so it either doesn't crash or take ages to get results back from.

A nasty workaround I have thought about but not yet tried is creating a new user which you only give access to the project and folder(s) affected by the change (if you know these of course!). Then sudo as the user, I would assume this works as scans through a lot less content and lookml.

Reply