# Simplified A/B Test Analysis: Redshift Python UDF and P-value Measure

• 8 replies
• 631 views

• 0 replies

At Buffer, we are continually working on ways to streamline access to data within the company. We are big on transparency, and that goes for data too! After noticing a bit of a friction point with interpreting the data of quantitative user experiments (ie, A/B tests) for non-analysts, we built a LookML pattern to serve up the analysis dynamically at any time to any user. Many of the experiments we run are simple two group A/B tests , with one control group and one enabled group. Our solution utilizes a Python UDF in Redshift, along with a case statement and conditional formatting in LookML. We thought it might be helpful to others in the Looker community to share 🙂 . Let’s take a look using a simple marketing landing page A/B test as a example.

The Actual Stats Part (yeah math!)

Leveraging the awesomeness of Redshift’s python UDF’s, which already ships with a bunch of libraries that are often used in data science and analytics work (like numpy, scipy and pandas), we started with creating our own UDF to calculate a p-value. Given the nature of most experiments we run, we decided on the SciPy function `chi2_contingency`, which performs a chi-square test of independence of variables, which we named `experiment_result_p_value()` . This function excepts four parameters (the control group population, the control group count of successes, the enabled group population, and the enabled group count of successes) and returns a p_value. If you are up for exploring more, feel free to read in more detail how we created this UDF.

Using the UDF in Looker

With the UDF created in Redshift, we just needed to create four measures and pass them through as parameters in `experiment_result_p_value()` . Our experiment data is stored as events in Redshift, which tracks: user_id, experiment name, experiment group name (ie, control or enabled), and the timestamp the user entered into the experiment. Since we run a fair number of experiments on the main marketing landing page, we felt it worthwhile to create a specific view in Looker to handle experiments where the determination of success was if a visitor created an account and became a user. We extended our main `visitor_experiment_events` view as a new view `visitor_experiment_conversions` in order to add in the scenario specific dimensions and measures. Here are the four measures we needed for the parameters in the UDF:

Old LookML
```
- measure: visitor_count_control
type: count_distinct
sql: \${TABLE}.visitor_id
filters:
test_group: '%control%'

• measure: visitor_count_enabled

type: count_distinct

sql: \${TABLE}.visitor_id

filters:

test_group: ‘%enabled%’

• measure: user_count_control

type: count_distinct

sql: \${TABLE}.user_id

filters:

test_group: ‘%control%’

• measure: user_count_enabled

type: count_distinct

sql: \${TABLE}.user_id

filters:

test_group: ‘%enabled%’

``````</details>

<details>
<summary>New LookML</summary>
``````

measure: visitor_count_control {

type: count_distinct

sql: \${TABLE}.visitor_id ;;

filters: {

field: test_group

value: “%control%”

}

}

measure: visitor_count_enabled {

type: count_distinct

sql: \${TABLE}.visitor_id ;;

filters: {

field: test_group

value: “%enabled%”

}

}

measure: user_count_control {

type: count_distinct

sql: \${TABLE}.user_id ;;

filters: {

field: test_group

value: “%control%”

}

}

measure: user_count_enabled {

type: count_distinct

sql: \${TABLE}.user_id ;;

filters: {

field: test_group

value: “%enabled%”

}

}

``````</details>

So with our four measures created, we could then pass those through as parameters in the `experiment_result_p_value()`  function in the `sql:`  line of the `p_value`  dimension.  We then used the `p_value` to create an `is_significant` yesno dimension.

<details open>
<summary>Old LookML</summary>
``````

• dimension: p_value

type: number

sql: round(experiment_result_p_value(\${visitor_count_control}, \${user_count_control}, \${visitor_count_enabled}, \${user_count_enabled}), 3)

• dimension: is_significant

type: yesno

sql: \${p_value} < 0.05

``````</details>

<details>
<summary>New LookML</summary>
``````

dimension: p_value {

type: number

sql: round(experiment_result_p_value(\${visitor_count_control}, \${user_count_control}, \${visitor_count_enabled}, \${user_count_enabled}), 3) ;;

}

dimension: is_significant {

type: yesno

sql: \${p_value} < 0.05 ;;

}

``````</details>

Additionally, we created two measures for calculating the conversion ratio from visitor to user for each of the experiment groups:

<details open>
<summary>Old LookML</summary>
``````

• measure: control_conversion_ratio

type: number

sql: 100.00 * \${user_count_control} / NULLIF(\${visitor_count_control}, 0)

• measure: enabled_conversion_ratio

type: number

sql: 100.00 * \${user_count_enabled} / NULLIF(\${visitor_count_enabled}, 0)

``````</details>

<details>
<summary>New LookML</summary>
``````

measure: control_conversion_ratio {

type: number

sql: 100.00 * \${user_count_control} / NULLIF(\${visitor_count_control}, 0) ;;

}

measure: enabled_conversion_ratio {

type: number

sql: 100.00 * \${user_count_enabled} / NULLIF(\${visitor_count_enabled}, 0) ;;

}

``````</details>

**Packaging it all up in a single `experiment_result` measure**

We wanted the analysis to be really convenient to digest, so combined the above dimensions and measures with a case statement and some html conditional formatting and returned a single measure that reflects the current result.

The experiment result text and background color changes based on the four scenarios:

- enabled group has a higher rate of success and is significant (green background color)
- enabled group has a higher rate of success but is not significant (yellow background color)
- control group has a higher rate of success and is significant (orange background color)
- control group has a higher rate of success but is not significant (red background color)

Here is what the LookML looks like:

<details open>
<summary>Old LookML</summary>
``````

• measure: experiment_result

sql: |

CASE WHEN (\${control_conversion_ratio} > \${enabled_conversion_ratio}) AND (\${p_value} < 0.05) THEN ‘control_sig’

WHEN (\${control_conversion_ratio} > \${enabled_conversion_ratio}) AND (\${p_value} >= 0.05) THEN ‘control_notsig’

WHEN \${control_conversion_ratio} = \${enabled_conversion_ratio} THEN ‘tied’

WHEN (\${control_conversion_ratio} < \${enabled_conversion_ratio}) AND (\${p_value} < 0.05) THEN ‘enabled_sig’

WHEN (\${control_conversion_ratio} < \${enabled_conversion_ratio}) AND (\${p_value} >= 0.05) THEN ‘enabled_notsig’

ELSE NULL END

html: |

{% if value == “enabled_sig” %}

For experiment ‘{{ visitor_experiment_conversions.experiment_name._value }}’, the enabled test group has the highest conversion rate, and the result is statistically significant

{% elsif value == “enabled_notsig” %}

For experiment ‘{{ visitor_experiment_conversions.experiment_name._value }}’, the enabled test group has the highest conversion rate, BUT the result is NOT statistically significant

{% elsif value == “control_notsig” %}

For experiment ‘{{ visitor_experiment_conversions.experiment_name._value }}’, the control test group has the highest conversion rate, BUT the result is NOT statistically significant

{% elsif value == “control_sig” %}

For experiment ‘{{ visitor_experiment_conversions.experiment_name._value }}’, the control test group has the highest conversion rate, and the result is statistically significant

{% elsif value == “tied” %}

The conversion rate is the same for both the control and enabled test groups

{% else %}

{{ rendered_value }}

{% endif %}

``````</details>

<details>
<summary>New LookML</summary>
``````

measure: experiment_result {

sql: CASE WHEN (\${control_conversion_ratio} > \${enabled_conversion_ratio}) AND (\${p_value} < 0.05) THEN ‘control_sig’

WHEN (\${control_conversion_ratio} > \${enabled_conversion_ratio}) AND (\${p_value} >= 0.05) THEN ‘control_notsig’

WHEN \${control_conversion_ratio} = \${enabled_conversion_ratio} THEN ‘tied’

WHEN (\${control_conversion_ratio} < \${enabled_conversion_ratio}) AND (\${p_value} < 0.05) THEN ‘enabled_sig’

WHEN (\${control_conversion_ratio} < \${enabled_conversion_ratio}) AND (\${p_value} >= 0.05) THEN ‘enabled_notsig’

ELSE NULL END

;;

html: {% if value == “enabled_sig” %}

For experiment ‘{{ visitor_experiment_conversions.experiment_name._value }}’, the enabled test group has the highest conversion rate, and the result is statistically significant

{% elsif value == “enabled_notsig” %}

For experiment ‘{{ visitor_experiment_conversions.experiment_name._value }}’, the enabled test group has the highest conversion rate, BUT the result is NOT statistically significant

{% elsif value == “control_notsig” %}

For experiment ‘{{ visitor_experiment_conversions.experiment_name._value }}’, the control test group has the highest conversion rate, BUT the result is NOT statistically significant

{% elsif value == “control_sig” %}

For experiment ‘{{ visitor_experiment_conversions.experiment_name._value }}’, the control test group has the highest conversion rate, and the result is statistically significant

{% elsif value == “tied” %}

The conversion rate is the same for both the control and enabled test groups

{% else %}

{{ rendered_value }}

{% endif %}

;;

}
``````</details>

When used in a Look, all that needs to be set are filters for the experiment name and the starting date of the experiment.

Our hope is that by surfacing the experiment results in this way, non-analyst members of the Buffer team are able to realize even faster feedback loops in build-measure-learn workflows.

**The Full LookML**

<details open>
<summary>Old LookML</summary>
``````

``````- view: visitor_experiment_conversions
extends: visitor_experiment_events

fields:

- dimension: p_value
type: number
sql: round(experiment_result_p_value(\${visitor_count_control}, \${user_count_control}, \${visitor_count_enabled}, \${user_count_enabled}), 3)

- dimension: is_significant
type: yesno
sql: \${p_value} < 0.05

- measure: experiment_result
sql: |
CASE WHEN (\${control_conversion_ratio} > \${enabled_conversion_ratio}) AND (\${p_value} < 0.05) THEN 'control_sig'
WHEN (\${control_conversion_ratio} > \${enabled_conversion_ratio}) AND (\${p_value} >= 0.05) THEN 'control_notsig'
WHEN \${control_conversion_ratio} = \${enabled_conversion_ratio} THEN 'tied'
WHEN (\${control_conversion_ratio} < \${enabled_conversion_ratio}) AND (\${p_value} < 0.05) THEN 'enabled_sig'
WHEN (\${control_conversion_ratio} < \${enabled_conversion_ratio}) AND (\${p_value} >= 0.05) THEN 'enabled_notsig'
ELSE NULL END
html: |
{% if value == "enabled_sig" %}
<b><p style="color: black; background-color: #AECE63; margin: 0; border-radius: 5px; text-align:center">For experiment '{{ visitor_experiment_conversions.experiment_name._value }}', the enabled test group has the highest conversion rate, and the result is statistically significant</p></b>
{% elsif value == "enabled_notsig"  %}
<b><p style="color: black; background-color: #E8E47F; margin: 0; border-radius: 5px; text-align:center">For experiment '{{ visitor_experiment_conversions.experiment_name._value }}', the enabled test group has the highest conversion rate, BUT the result is NOT statistically significant</p></b>
{% elsif value == "control_notsig"  %}
<b><p style="color: black; background-color: #F7B251; margin: 0; border-radius: 5px; text-align:center">For experiment '{{ visitor_experiment_conversions.experiment_name._value }}', the control test group has the highest conversion rate, BUT the result is NOT statistically significant</p></b>
{% elsif value == "control_sig" %}
<b><p style="color: black; background-color: #FF865E; margin: 0; border-radius: 5px; text-align:center">For experiment '{{ visitor_experiment_conversions.experiment_name._value }}', the control test group has the highest conversion rate, and the result is statistically significant</p></b>
{% elsif value == "tied" %}
<b><p style="color: black; margin: 0; border-radius: 5px; text-align:center">The conversion rate is the same for both the control and enabled test groups</p></b>
{% else %}
<font color="darkred">{{ rendered_value }}</font>
{% endif %}

- measure: visitor_count_control
type: count_distinct
sql: \${TABLE}.visitor_id
filters:
test_group: '%control%'

- measure: visitor_count_enabled
type: count_distinct
sql: \${TABLE}.visitor_id
filters:
test_group: '%enabled%'

- measure: user_count_control
type: count_distinct
sql: \${TABLE}.user_id
filters:
test_group: '%control%'

- measure: user_count_enabled
type: count_distinct
sql: \${TABLE}.user_id
filters:
test_group: '%enabled%'

- measure: control_conversion_ratio
type: number
sql: 100.00 * \${user_count_control} / NULLIF(\${visitor_count_control}, 0)

- measure: enabled_conversion_ratio
type: number
sql: 100.00 * \${user_count_enabled} / NULLIF(\${visitor_count_enabled}, 0)
``````

``````</details>

<details>
<summary>New LookML</summary>
``````

view: visitor_experiment_conversions {

extends: [visitor_experiment_events]

dimension: p_value {

type: number

sql: round(experiment_result_p_value(\${visitor_count_control}, \${user_count_control}, \${visitor_count_enabled}, \${user_count_enabled}), 3) ;;

}

dimension: is_significant {

type: yesno

sql: \${p_value} < 0.05 ;;

}

measure: experiment_result {

sql: CASE WHEN (\${control_conversion_ratio} > \${enabled_conversion_ratio}) AND (\${p_value} < 0.05) THEN ‘control_sig’

WHEN (\${control_conversion_ratio} > \${enabled_conversion_ratio}) AND (\${p_value} >= 0.05) THEN ‘control_notsig’

WHEN \${control_conversion_ratio} = \${enabled_conversion_ratio} THEN ‘tied’

WHEN (\${control_conversion_ratio} < \${enabled_conversion_ratio}) AND (\${p_value} < 0.05) THEN ‘enabled_sig’

WHEN (\${control_conversion_ratio} < \${enabled_conversion_ratio}) AND (\${p_value} >= 0.05) THEN ‘enabled_notsig’

ELSE NULL END

;;

html: {% if value == “enabled_sig” %}

For experiment ‘{{ visitor_experiment_conversions.experiment_name._value }}’, the enabled test group has the highest conversion rate, and the result is statistically significant

{% elsif value == “enabled_notsig” %}

For experiment ‘{{ visitor_experiment_conversions.experiment_name._value }}’, the enabled test group has the highest conversion rate, BUT the result is NOT statistically significant

{% elsif value == “control_notsig” %}

For experiment ‘{{ visitor_experiment_conversions.experiment_name._value }}’, the control test group has the highest conversion rate, BUT the result is NOT statistically significant

{% elsif value == “control_sig” %}

For experiment ‘{{ visitor_experiment_conversions.experiment_name._value }}’, the control test group has the highest conversion rate, and the result is statistically significant

{% elsif value == “tied” %}

The conversion rate is the same for both the control and enabled test groups

{% else %}

{{ rendered_value }}

{% endif %}

;;

}

measure: visitor_count_control {

type: count_distinct

sql: \${TABLE}.visitor_id ;;

``````filters: {
field: test_group
value: "%control%"
}
``````

}

measure: visitor_count_enabled {

type: count_distinct

sql: \${TABLE}.visitor_id ;;

``````filters: {
field: test_group
value: "%enabled%"
}
``````

}

measure: user_count_control {

type: count_distinct

sql: \${TABLE}.user_id ;;

``````filters: {
field: test_group
value: "%control%"
}
``````

}

measure: user_count_enabled {

type: count_distinct

sql: \${TABLE}.user_id ;;

``````filters: {
field: test_group
value: "%enabled%"
}
``````

}

measure: control_conversion_ratio {

type: number

sql: 100.00 * \${user_count_control} / NULLIF(\${visitor_count_control}, 0) ;;

}

measure: enabled_conversion_ratio {

type: number

sql: 100.00 * \${user_count_enabled} / NULLIF(\${visitor_count_enabled}, 0) ;;

}

}

``</details>``

### 8 replies

Userlevel 2

Doesn’t this give you a divide by zero error if visit_count_control is NULL?

@tuple

Hi,

It won’t -> if visit_count_control is NULL the result is NULL as well, and if it would be 0 it would be converted to NULL.

Userlevel 2

Thanks, Frank. You are correct.

I was thinking of the Oracle NVL function, which does more-or-less the opposite – if an input is NULL then it returns a substitute value for NULL.

Redshift NULLIF documentation: http://docs.aws.amazon.com/redshift/latest/dg/r_NULLIF_function.html?tag=duckduckgo-ffsb-20

@tuple

On Postgres/Redshift/MSSQL Coalesce/IfNull can be used as replacement for NVL 🙂