Building XmR charts (Shewhart's control charts)

Tomas2 · 09-28-2017 05:43 AM

Hi all,

I’d like to build XmR charts using Looker, but have not yet succeeded in doing this. I’ve some time-series data from the first of January up to today, with new data being added on a daily basis. The dataset includes one value for my performance measure, per day. For the XmR chart I need to calculate the mean and standard deviation for a subset of the dataset, the control data (14 fixed, consecutive values of my performance measure in the dataset).

However, I’m not able to engineer table calculations such that it creates a measure which, for every date, computes the mean across the same time window. For example (see mock-up data), I’d like to calculate the mean and standard deviation for the period 2017-08-06 until 2017-08-19 and subsequently include this as a measure, where this value should remain constant (and the time window should stay fixed), also if new dates are added to the result set.

Anyone that has ideas how to solve this problem?

Thanks!

Aleksandrs_Vede

Hey @Tomas2,

I was thinking over your issue here and I think I was able to achieve something similar. Please note that the solution might vary based on the actual data.

My calculations were based on created date dimension & simple count measure.

These are the 4 calculations I have defined to calculate mean, -stddev, +stddev & the outlirers in my case I defined outliers which are outside 1 standard deviation of the mean values.

The last pease that has to be added if I understood your requirement correct is set a period of dates you want to calculate this over. I think you can achieve this with match advanced filters something like 14 days ago for 14 days should work. I did not have constantly added dates so i did not had this filter added. If you would like to get more information on Looker filter expressions you can find it here

Should you have more questions feel free to pop on chat. It would be a lot easier to help with this type of issues.

Best,

Sasha

Tomas2

Hi Sasha,

Thanks for your response. The first part indeed works as I want it to (which I kinda figured out already). The true problem lies in the filtering.

I do not want to filter on my full result set, I only want to filter in my table calculations. In other words, I want the result set to show all the value since the first of January, but I want the mean/standard deviation, which are computed using the table calculations to only look at a fixed time window (e.g., 2017-08-06 until 2017-08-19). I want the table calculations to always look at this fixed time window, also if new dates are included in the result set. That’s why something like 14 days ago for 14 days will not work, as this is relative to the maximum date contained in the result set (in this maximum date will change on daily basis).

Does this make sense? If not, how can I access chat via Discourse?

Thanks,
Tomas

Aleksandrs_Vede

Hey @Tomas2,

Chat can be only accessed through Looker application if you have admin/developer privileges. It is a lot clearer now, So, This can be done by creating a table calculation that returns Yes or No and then selecting Hide "No"s from Visualisation in the column’s gear dropdown. We have great discourse written by one of our colleagues here.
In brief you create an IF(“condition”, "value if Yes, “value if NO”) in your case condition would be date range, “value if Yes” would be the actual calculation mean(${cars_data_large.count}) for mean example, and value if NO, can be simple “NO”. This is quick sample I wrote in my sandbox for mean:

if(${cars_data_large.date_created_date} > date(2016,1,1),${mean} - stddev_pop(${cars_data_large.count}),0)

You would have to repeat this process for all your calculations.

Best,

Sasha

Tomas2

Hi Sasha,

We’re getting somewhere but there’s still a small misunderstanding. I don’t want to hide certain results from my visualization (which is what you’d achieve with the 'Hide no’s from visualization). I want ALL values to be displayed in my visualization, however, I want my table calculations to be calculated using only a fixed set of values (in a pre-defined time range). For example, I’ve a result set which ranges from 2017/09/01 until 2017/10/01, but I want the mean and standard deviation (which I clarified in my initial post) to be calculated using only the values from 2017/09/20 and 2017/09/21. I was trying something like below table calculation, but this doesn’t work because it only selects the value of the performance measure corresponding to that date only (so I do not obtain a constant mean across all dates):

mean(list(if(${datatable.date_date}=date(2017,09,21) OR ${datatable.date_date}=date(2017,09,20),${datatable.performance_measure},null)))

I have Developer-rights so I could get onto chat, but I haven’t found the ‘chat’-button yet.

Thanks,
Tomas

Aleksandrs_Vede

Hey again @Tomas2

Chat can be found in the help dropdown menu:

This is something that should work then as well:

	mean(${datatable.performance_measure}),${datatable.performance_measure})```

If the condition is satisfied then you doing your mean calculation if not, then you just keeping the measure value. In this scenario you are keeping all the values. Should you have more questions please hop on chat and our team would try to help you online.

Best,

Sasha

Tomas2

This chat function is not enabled for my account (though, I’m in Developer mode). See below graphic:

Aside from this, you’re answer has still not resolved my issue. What your table calculation does is give the mean of the total set of values for those two days, and for the other days it gives the value of the performance measure for that day. Though, what I want is that for EACH day, the table calculation is constant = the same value (!). Though, the table calculation (e.g, mean, standard deviation) is computed using the values of the performance measures for a limited time frame ONLY. I’ll give some mock-up data below from Excel, which I hope makes it (more) clear what I intend to get to. In this mock-up data, I want to show ALL values, for all dates, but I want the table calculation (e.g., mean) to only rely on 2017/09/04 and 2017/09/05.

Does this make sense?

Thanks,
Tomas

Aleksandrs_Vede

Hey @Tomas2

I took dates from your own examples so you can specify the date frame you want

mean(${datatable.performance_measure}),${datatable.performance_measure})```

So you would get your mean calculated for that period of time & rest would be your performance measure. However we can't keep both values as per your screenshot from excel. Unless you want to create second table calculation.

Best,

Sasha

Tomas2

Hi Sasha,

Creating a second table calculation would be totally fine. I just need to have a table calculation (doesn’t matter to me if I need 10.000 “helper” table calculations ;)) which satisfies my needs, as outlined in the Excel file.

Let me know if you’ve any ideas.

Thanks!

Aleksandrs_Vede

Hey @Tomas2,

Does the above calculation not satisfying your needs still? As it calculates the mean for the date window provided and brings in performance measure where condition is not satisfied? Is there any chance you can ask your Looker admin to enable you chat access on your instance as I feel like this would be beneficial for both parties.

Best,

Sasha

Tomas2

What finally worked for me (maybe you already mentioned it somewhere but I might have misconfigured it at the time), when we consider 2017/09/01 until 2017/09/07 to be the control data:

Central line/Mean control data:
mean(if(${datatable.date_date}>=date(2017,9,1) AND ${datatable.date_date}<=date(2017,9,7), ${datatable.performance_measure},null))

Standard deviation control data:
stddev_samp(if(${datatable.date_date}>=date(2017,9,1) AND ${datatable.date_date}<=date(2017,9,7), ${datatable.performance_measure},null))

Upper control limit:
${central_line}+3.27*max(${standard_deviation_control_data})

Lower control limit:
${central_line}-3.27*max(${standard_deviation_control_data})

High outliers:
if(${performance_measure}>${upper_control_limit},${datatable.performance_measure},null)

Low outliers:
if(${performance_measure}<${lower_control_limit},${datatable.performance_measure},null)

Aleksandrs_Vede

HI @Tomas2,

Nice, I see. Slight tweak to my solution, but similar logic. Awesome, as I said it’s hard to work on those without the actual data. However good to hear that it’s been resolved.

Best,

Sasha