Tutorial: How to do a 2 X 2 Contingency Chi-Square...

mguindon · 04-15-2020 08:00 PM

This is a quick tutorial on how to do a 2x2 Contingency Chi-Square using table calcs in Looker. This comes from a question that was asked by Adrian Palacios on the Locally Optimistic community. We couldn’t find relevant examples anywhere and since it was not as straightforward as we anticipated, I thought I’d post this here for the benefit of everyone.

What’s a 2x2 Contingency Chi-Square?

The 2x2 Contingency Chi-Square is used for the comparison of two groups with a dichotomous dependent variable. For those who need a refresher on the theory, you can checkout this PDF from the Portland State University. I’ve used the example from in it as a base for this tutorial.

We know for this example data set that Chisq = 13.649, df = 1, p-value = 0.000, so let’s work to get these values from Looker!

The test data

I created a simple derived table to generate the test data:

view: chi_square_example_dt {
  derived_table: {
    sql:
        select 'party affiliated' as category, 'Trump' as candidate, 383 as support
        union all
        select 'party affiliated' as category, 'Warren' as candidate, 385 as support
        union all
        select 'independent' as category, 'Trump' as candidate, 225 as support
        union all
        select 'independent' as category, 'Warren' as candidate, 342 as support
      ;;
  }

  dimension: category {
    type: string
    sql: ${TABLE}.category ;;
  }

  dimension: candidate {
    type: string
    sql: ${TABLE}.candidate ;;
  }

  dimension: support {
    type: number
    sql: ${TABLE}.support ;;
  }

  measure: total_support {
    type: sum
    sql: ${support} ;;
  }

}

Building the query

Step 1: Get our data in a 2x2 table

I’ve selected my first dimension (Category), pivoted my second dimension (Candidate) and added my Total Support measure. I’ve also checked the Totals and Row Totals checkboxes. The result is this:

Step 2: Estimate the expected frequency

Now that we have our actuals, we need to calculate their expected frequency which, for any cell, is estimated as (row total * column total) / grand total.

We’ll create a first table calc named Expected Frequency with a Decimals (2) format and the following expression:

(${chi_square_example_dt.total_support:row_total} * ${chi_square_example_dt.total_support:total})
/ 
sum(pivot_row(${chi_square_example_dt.total_support:total}))

We now get this output:

Step 3: Calculate the probability (or p-value):

Now that we have our actuals and expected values, we can now calculate the p-value.

We’ll create a second table calc named Chi-Square P-Value which will use the chisq_test function. The description of the function is:

“Returns the probability for the chi-squared test for independence between actual and expected data. actual can be a column or a column of lists, and expected must be the same type.”.

Since we’re dealing with a 2x2 Contingency, we’ll nest the pivot_row function within the chisq_test function a first time to return a list of the values of each candidate’s actual values, and a second time to return a list of the values of each candidate’s expected values:

chisq_test(
  pivot_row(${chi_square_example_dt.total_support}),
  pivot_row(${expected_frequency})
)

We now get this output:

Step 4: Calculate the Chi-Squared (or χ2 )

Now that we have our p-value, we can calculate the Chi-Squared by using the chisq_inv function. The description states:

“Returns the position of probability on the inverse cumulative gamma distribution with dof degrees of freedom” .

For this function, we need to know the degrees of freedom. The formula for this is pretty simple as
dof = (number of rows - 1) * (number of columns - 1)

So (2-1) * (2-1) = 1. We’ll create a third table calc named Chi-Squared with the following expression:

chisq_inv(1-${chi_square_p_value}, 1)

Note: We’re substracting the p-value from 1 since the p-value measures the “tail” and the inverse function measures the “head”.

We now get this as an output:

So there you go! The same principle would apply here for a bigger contingency table.

Step 5: Phi Coefficient

Let’s add the Phi Coefficient now. The formula is once again pretty simple:
√ (Chi-Squared / grand total).

Let’s create a table calc named Phi Coefficient with the following expression:

sqrt(${chi_squared}/sum(pivot_row(${chi_square_example_dt.total_support:total})))

Our output is now:

Step 6: Percentage of Voters

Let’s add a final, easy calculation to help us write our conclusion: let’s add the percentage of voters for each category. The expression for this additional table calc formatted as Percent (1) would be:

${chi_square_example_dt.total_support} / sum(${chi_square_example_dt.total_support})

Which gives us our final output:

There you go! We now have all the numbers we need to write the conclusion:

A chi-square test was used to determine whether there was a significant
difference between the proportion of Warren and Trump’s supporters who are independent. Results indicated that 47.0% of Warren’s supporters were independents, whereas 37.0% of Trumps supporters were independents. This difference was statistically significant, χ2(1) = 13.65, p <.001. The phi coefficient, φ = .10, suggested a small effect.

Credits

I want to thank @fabio1 from the Customer Success team who helped figuring this out. You rock! 💖
Jason T. Newsom PhD., Professor at Department of Psychology, Portland State University, who wrote the document from which the example is taken from.