Return the searched substring from the string

I have a dimension with the naming convention: market_region_operating system_date

I want to extract the operating system into its own dimension but cannot use the substring function as the number of character prior to the operating system varies and thus there is not one solid start_position to work with.

However, there is a consistent number of underscores before operating system.

Any advice?

0 4 3,045
4 REPLIES 4

Hello @Connor_Slevin and welcome!

You need to use a regex expression here.
What is your datawarehouse?

menashe
Participant V

In MySQL you can use two applications of the substring_index function; in Snowflake there’s the analogous split_part function. The following works for both those dialects (and probably more):

substr(
    substr(
        substr(column_name, 1+position('_' in column_name)),
        1+position('_' in substr(column_name, 1+position('_' in column_name)))
    ),
    1,
    position('_' in substr(
        substr(column_name, 1+position('_' in column_name)),
        1+position('_' in substr(column_name, 1+position('_' in column_name)))
    ))-1
)

Regex would be easier to use but I’m guessing (you can check) more costly.

Thanks for the help so far guys. I just want to say that I am not a programmer and don’t use SQL often and don’t have access to this. Our data warehouse is Snowplow.

I guess I am trying to do this inside Looker itself, without needing to a new field to be created by our data/analytics team. So I have taken the expression that you gave me @menashe and used it in the Edit Custom Dimension option in Looker but I am getting expression incomplete messages…can I use what you’ve given me for this purpose?

menashe
Participant V

I haven’t tried this but based on your screenshot I’d guess you need to replace each instance of position('_' in ${utms.campaign}) with position(${utms.campaign}, "_"). Good luck.

Top Labels in this Space
Top Solution Authors