Data Dictionary Best Practices

  • 13 September 2016
  • 6 replies

Userlevel 2

I’m not sure if this is the right category for my question, but the description has ‘best practices’ in it, so here goes.

Where do you store your data dictionary, and has it worked? Looker pitched their recently-released LookML Model API endpoint as being a tool to enable building a data dictionary. I’m inferring then that at least some users store the dictionary outside of Looker. As of last year, Warby Parker used gitbook. But Looker has come a long way with annotations and the markdown homepage, and in theory, the dictionary could live within Looker itself. Here at Managed by Q, we briefly tried using a google spreadsheet, but now matter how many times and places we linked to it, no one ever used it.

I’d love to hear input from the Looker community on how they’ve approached this problem.

6 replies

Userlevel 1

At UpCounsel, almost everyone uses Looker to answer their own questions about the data. It became glaringly obvious that we needed a centralized data dictionary. We had a google doc but same problem as you, no one knew where it was and didnt visit it. It was also not kept updated.

Once we bought Looker a year ago, I made sure to create a data glossary within the documentation files using markdown, which also serves as our Looker homepage. I created a table of contents within the glossary using markdown that links to more documentation. The Looker homepage (glossary) is made obvious when I onboard new coworkers.

Another thing I do is put descriptions next to every dimension and measure within our Looker instance. Then I always ask people did you read description? when I receive questions. They know they can rely on the descriptions.

Userlevel 3

@weitzenfeld we have this discourse post on Generating a Data Dictionary in Google Sheet showing how to define a Data Dictionary from your instance into Google Drive. It uses the API of your instance to populate the Google Sheets.

and @Erin_Breen since this script is pulling the information automatically from your API this would also ensure that the data dictionary is maintained constantly (given that newly added fields have a description parameter).

If you have any feedback on the script please let us know!

Would anyone here be interested in a product that does this? Searchable data dictionary that autogenerates every time new code is pushed. Would be hosted on a subdomain with ability for users to log-in.

Additional features could allow admins to hide certain definitions, override definitions, or add more content.

Yes Sam, such a product feature within Looker would be great!

It feels like we’re almost there with the Field Usage Explore that I stumbled upon yesterday. ( If it also had the labels, descriptions and field types, it would suit my immediate needs.

Userlevel 4

Looker now offers a native Data Dictionary in beta!

This functionality is available as of version 7.8. You can download the Data Dictionary from your Marketplace. You can find documentation here. It only takes a few clicks to install, and comes with a host of functionality such as:

  • Dedicated UX for searching through field descriptions and metadata

  • Quick filters to quickly identify and audit fields (e.g. find all fields without a description)

  • Preview field values by showing the top 10 values for any given field

  • Simple embedding for consumption in external applications

Please feel free to post any feedback here once you’re up and running!