Name Game
Even very simple data sets can be complex to analyze. Starting with a really simple data set of names, we’re going to see if we can learn a few things. The data includes names given in the United States since 1910, listed by state and gender. We’ll go through the LookML model building and then build some data discovery widgets to learn about the popularity of names.
The Data
The data set is very simple, but with these 5 columns we can figure out a lot about names.
name | gender | state | year | number
Each row in the table is a roll up of the number of people with a given first name and gender born in a US state in a given year.
name - the first name, capitalized
gender - either ‘M’ or ‘F’
state - two letter abbreviation of a US state
year - number, 1910 to 2013
number - the number of people born that year with that particular name.
The Looker View
In the Looker view, Looker generates a dimension for each column in the database table. Looker uses the column definitions to generate the proper dimension types. For example, Looker generates the default ‘string’ dimension type for the name, gender, and state columns. Looker generates number dimension types for the table’s two columns with numeric values: year and number. We then renamed the dimension number to population, which corresponds to the number of people born that year with that particular name. Finally, we modified the format of year so that it does not add a comma in the number.
- view: names_step_0
sql_table_name: names
fields:
- dimension: name
sql: ${TABLE}.name
- dimension: gender
sql: ${TABLE}.gender
- dimension: state
sql: ${TABLE}.state
- dimension: year
type: number
sql: ${TABLE}.year
value_format: "0000" # format without commas
- dimension: population # rename from 'number' to 'population'
type: number
sql: ${TABLE}.number
- measure: count
type: count
drill_fields: [name, gender, state, year, number]
Note that Looker also generated a count measure automatically.
Explore!
All of the queries below can be created by Exploring From Here. Select dimensions and measures, add filters and bonk the ‘Run Button’.
How Big is the Table?
By selecting count, we can see how many rows there are in the table.
[Explore From Here](http://looker.com/publicdata/looks/138?show=fields,data)
Let’s look at the Raw Data
By selecting all of the dimensions, we can see the contents of the table.
[Explore From Here](http://looker.com/publicdata/looks/137?show=fields,data)
Adding Functionality
Over the following steps, we will gradually add more functionality to our model so we can ask increasingly complex questions. These steps are:
Step 1: Simple Measures
Measures in Looker’s world are summary calculations. In this step we’ll explore the name data by adding only two measures, total population and name count. These calculations will let us compare the use of names over time.
Step 2: Computed Dimensions
Sometimes data, as it is stored in tables, is not in its most useful form. We’ll generalize year into decade and roll up states into regions.
Step 3: Filtered Measures
By building filtered measures, we can answer questions about particular aspects of the data. In this step, we’ll look at the gender nature of some names, find regional names, and find particularly old and particularly new names.
Step 4: Normalizing Population with Cohorts
Population is not popularity. Older names may have been popular in their time, but as the total population grows, is the percentage with a specific name going up or down?
Step 5: Using Rank to Figure out Name Density and Popularity
Ranking is a useful tool for lots of things. We’re going to use it to figure out if people are more or less likely to name their kids something common and how that trend has changed over time.
Go to the next Name Game page, Step 1: Adding simple measures