Name Game: step 2 - computed dimensions

  • 28 March 2022
  • 0 replies
  • 2 views

Userlevel 5

This content, written by Lloyd Tabb, was initially posted in Looker Blog on Jan 15, 2016. The content is subject to limited support.

In this step, we are going to create computed dimensions. Most dimensions map directly to columns in the underlying table. In this step we are going to build three new dimensions that are computed. Each of these measures lets us combine the data in interesting ways.

Generalize year into decade

We've been looking at names by year. Now we are going to build a decade dimension that combines all years into a single decade.

Generalize state into region

We can look at data by state, but states near each other tend to have similar characteristics. We are going to take the 50 states and combine them into 5 regions.

Generalize name into first letter

We are going to bucket names by the first letter of the name so we can ask questions such as whether certain starting letters are used more often by a particular gender.

Combine name and gender into a more specific field

It becomes difficult to compare the female use of 'Riley' with the male use of 'Kerry' unless we have a single dimension that combines the name and gender concepts. We are going to combine them into a single computed dimension.

The code

This model inherits all the behavior of the model we used in and adds the 4 dimensions described above.

  #
  # bucketing the data by
  #   decade
  #   region
  #   first letter of the name
  #   name concatenated with gender

  view: names_step_2 {
    extends: [names_step_1]
    # computed dimensions
    dimension: decade {
      type: number
      sql: FLOOR(${year}/10) * 10 ;;
      value_format: "0000"
    }

    dimension: region {
      case: {
        when: {
          sql: ${state} in ('WA','OR','CA','NV','UT','WY','ID','MT',
                'CO','AK','HI')
             ;;
          label: "West"
        }

        when: {
          sql: ${state} in ('AZ','NM','TX','OK') ;;
          label: "Southwest"
        }

        when: {
          sql: ${state} in ('ND','SD','MN','IA','WI','MN','OH','IN','MO','NE',
                'KS','MI','IL')
             ;;
          label: "Midwest"
        }

        when: {
          sql: ${state} in ('MD','DE','NJ','CT','RI','MA','NH','PA',
                'NY','VT','ME','DC')
             ;;
          label: "Northeast"
        }

        when: {
          sql: ${state} in ('AR','LA','MS','AL','GA','FL','SC','NC','VA','TN',
                'KY','WV')
             ;;
          label: "Southeast"
        }

        when: {
          sql: true ;;
          label: "Unset"
        }
      }
    }

    dimension: first_letter {
      sql: SUBSTRING(${name},1,1) ;;
    }
  }

Explore!

All of the queries below can be created by . Select dimensions and measures, add filters and bonk the 'Run Button'.

Change the sort order. Notice that the most popular name varies by region.

Here we've restricted the decade to 2000. Change the sort order and notice how the most popular name varies by region in that decade.

Change the sort order to various measures to get an idea of the differences by first letter and gender.

Here's a list of all of the steps:


0 replies

Be the first to reply!

Reply