Even very simple data sets can be complex to analyze. Starting with a really simple data set of names, we’re going to see if we can learn a few things. The data includes names given in the United States since 1910, listed by state and gender. We’ll go through the LookML model building and then build some data discovery widgets to learn about the popularity of names.
The data set is very simple, but with these 5 columns we can figure out a lot about names.
name | gender | state | year | number
Each row in the table is a roll up of the number of people with a given first name and gender born in a US state in a given year.
name - the first name, capitalized
gender - either ‘M’ or ‘F’
state - two letter abbreviation of a US state
year - number, 1910 to 2013
number - the number of people born that year with that particular name.
The Looker View
In the Looker view, Looker generates a dimension for each column in the database table. Looker uses the column definitions to generate the proper dimension types. For example, Looker generates the default ‘string’ dimension type for the name, gender, and state columns. Looker generates number dimension types for the table’s two columns with numeric values: year and number. We then renamed the dimension number to population, which corresponds to the number of people born that year with that particular name. Finally, we modified the format of year so that it does not add a comma in the number.
- view: names_step_0
- dimension: name
- dimension: gender
- dimension: state
- dimension: year
value_format: "0000" # format without commas
- dimension: population # rename from 'number' to 'population'
- measure: count
drill_fields: [name, gender, state, year, number]
Note that Looker also generated a count measure automatically.
All of the queries below can be created by Exploring From Here. Select dimensions and measures, add filters and bonk the ‘Run Button’.
By selecting count, we can see how many rows there are in the table.
[Explore From Here](http://looker.com/publicdata/looks/138?show=fields,data)
By selecting all of the dimensions, we can see the contents of the table.
[Explore From Here](http://looker.com/publicdata/looks/137?show=fields,data)
Over the following steps, we will gradually add more functionality to our model so we can ask increasingly complex questions. These steps are:
Measures in Looker’s world are summary calculations. In this step we’ll explore the name data by adding only two measures, total population and name count. These calculations will let us compare the use of names over time.
Sometimes data, as it is stored in tables, is not in its most useful form. We’ll generalize year into decade and roll up states into regions.
By building filtered measures, we can answer questions about particular aspects of the data. In this step, we’ll look at the gender nature of some names, find regional names, and find particularly old and particularly new names.
Population is not popularity. Older names may have been popular in their time, but as the total population grows, is the percentage with a specific name going up or down?
Ranking is a useful tool for lots of things. We’re going to use it to figure out if people are more or less likely to name their kids something common and how that trend has changed over time.