There are many reasons to come up with a document that explains what fields/explores/models are available or exposed in our Looker application. This script is used to parse through the model definition and print out a csv of fields. I use the Looker API to get the model Metadata.
I use python, and while you can use the Looker SDK, I prefer to use the python requests library.
With some other changes, you can easily figure out how fields are set up and audit the model for items such as:
Weโve already done this into an interactive webpage using the Ruby SDK. You can see that here:
https://discourse.looker.com/t/creating-a-data-dictionary-using-lookers-api/3589
To get this started, I need to have endpoints for authentication
, get_model
and get_explore
endpoints. I do that with an API class:
class LookerApi(object):
def __init__(self, token, secret, host):
self.token = token
self.secret = secret
self.host = host
self.session = requests.Session()
self.session.verify = False
self.auth()
def auth(self):
url = '{}{}'.format(self.host,'login')
params = {'client_id':self.token,
'client_secret':self.secret
}
r = self.session.post(url,params=params)
access_token = r.json().get('access_token')
# print access_token
self.session.headers.update({'Authorization': 'token {}'.format(access_token)})
# GET /lookml_models/{{NAME}}
def get_model(self,model_name=None,fields={}):
url = '{}{}/{}'.format(self.host,'lookml_models', model_name)
# print url
params = fields
r = self.session.get(url,params=params)
if r.status_code == requests.codes.ok:
return r.json()
# GET /lookml_models/{{NAME}}/explores/{{NAME}}
def get_explore(self,model_name=None,explore_name=None,fields={}):
url = '{}{}/{}/{}/{}'.format(self.host,'lookml_models', model_name, 'explores', explore_name)
print url
params = fields
r = self.session.get(url,params=params)
if r.status_code == requests.codes.ok:
return r.json()
Once we can call those endpoints, The script should call for all models, and parse through each explore: calling for all the field information in a loop. We then will write each field and itโs metadata to a new row. For each row, I have created a function to call:
def write_fields(explore, fields):
### First, compile the fields you need for your row
explore_fields=explore['fields']
try:
connection_name = str(explore['connection_name'])
except:
connection_name = ''
for dimension in explore_fields[fields]:
# print dimension
field_type = fields
project = str(dimension['project_name'])
explore = str(explore_def['name'])
view=str(dimension['view'])
view_label=str(dimension['view_label'])
name=str(dimension['name'])
hidden=str(dimension['hidden'])
label=str(dimension['label'])
label_short=str(dimension['label_short'])
description=str(dimension['description'])
sql=str(dimension['sql'])
ftype=str(dimension['type'])
value_format=str(dimension['value_format'])
source = str(dimension['source_file'])
### compile the line - this is possible to combine above, but here to keep things simple
rowout = []
rowout.append(connection_name)
rowout.append(field_type)
rowout.append(project)
rowout.append(explore)
rowout.append(view)
rowout.append(view_label)
rowout.append(name)
rowout.append(hidden)
rowout.append(label)
rowout.append(label_short)
rowout.append(description)
rowout.append(sql)
rowout.append(ftype)
rowout.append(value_format)
rowout.append(source)
w.writerow(rowout)
Then all I need is to instantiate the API, open a CSV, write the header, and then iterate through my models. using the csv library we can start a csv with this code:
csvfile= open('dictionary.csv', 'wb')
w = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
header = ['connection_name',
'field_type',
'project',
'explore',
'view',
'view_label',
'name',
'hidden',
'label',
'label_short',
'description',
'sql',
'ftype',
'value_format',
'source']
w.writerow(header)
The rest of the script looks like this:
Get looker API 3.0 Credentials
Call for the model
Parse through the model and write each field as a row into our csv
close the file
f = open(โconfig.ymlโ)
params = yaml.load(f)
f.close()
hostname = โlocalhostโ
my_host = params[โhostsโ][hostname][โhostโ]
my_secret = params[โhostsโ][hostname][โsecretโ]
my_token = params[โhostsโ][hostname][โtokenโ]
looker = LookerApi(host=my_host,
token=my_token,
secret = my_secret)
models = looker.get_model("")
for model in models:
model_name = model[โnameโ]
## -- Get single model --
model_def = looker.get_model(model_name)
# pp(model_def)
## -- Get single explore --
for explore_def in model_def['explores']:
explore=looker.get_explore(model_name, explore_def['name'])
# pp(explore)
## -- parse explore --
try:
write_fields(explore,'measures')
except:
print 'Problem measure fields in ', explore_def['name']
try:
write_fields(explore,'dimensions')
except:
print 'Problem dimension fields in ', explore_def['name']
The end result of executing this file is a csv file called โdictionary.csvโ
Check out the full script called get_data_dictionary.py
here: https://github.com/llooker/python_api_samples
Note: the link will use a LookerAPI.py file to hold the class, and a configuration file for keys. Check the readme for setting this up.
Iโm trying to get this working, but have used Python only a couple times before. I have 2.7 installed, and am getting stuck on trying to import yaml. Iโm guessing itโs a basic config/operator error, can you offer any suggestions?
C:\Python27>python get_data_dictionary.py
Traceback (most recent call last):
File "get_data_dictionary.py", line 2, in <module>
import yaml
ImportError: No module named yaml
===============================================
It never fails, after I feel stuck enough to post, I find the answer, in a comment in another sample module:
import yaml ### YOU NEED THE pyyaml PACKAGE : [sudo] pip install pyyaml
That makes sense, and it works.
@la5rocks, the module yaml
is provided by PyYAML package, so pip install PyYAML
should get the package installed for you.
Please let us know if that sorts out this issue for you.
Hey! Awesome post. trying to run same get_data_dictionary.py iโm getting this:
https://looker-xxx.com:19999/api/3.1/lookml_models/
None
Traceback (most recent call last):
File โget_data_dictionary.pyโ, line 121, in <module>
for model in models:
that means no data is being extracted/written to the csv.
when accessing directly the link in the outpur (https://looker-xxx.com:19999/api/3.1/lookml_models/) i got a requires authentication error. anyone knows what iโm doing wrong?
thanks,
Mihai
The requires authentication error makes me think itโs likely related to the API keys, secret, or host URL specified on lines 11-26. Are you absolutely sure theyโre correct? Or, could you try running another one of those scripts thatโs more simple like get_look to see if itโs a problem with your configuration, or a problem with that script?
Thanks!
Manage to fix it ultimately. i added wrong/viceversa the id and secret key ๐
Cheers
This is also very useful to get a list of strings for localization
Looker now offers a native Data Dictionary in beta!
As of version 7.8, you can download the Data Dictionary from your Marketplace. You can find documentation here. It only takes a few clicks to install, and comes with a host of functionality such as:
Please feel free to post any feedback here once youโre up and running!
The github link in the original post is now dead. Does anybody know where it's ended up?
The csv export functionality would be incredibly useful for our usecase