Looking to analyze text fields? Want to understand top or trending keywords?
Some common use cases might be:
What are customers saying about my brand on social (Facebook, Twitter, etc.)?
How can we optimize campaigns or SEO to reflect what terms customers are searching for?
Which words are most commonly posted together (e.g. “Game” and “Thrones” as of late)
The list goes on… Let’s walk through how this can be done in Looker without taking up a data scientist’s time (or downloading R / Python ourselves).
In this example, we’ll take descriptions of products and search for top keywords and look for associations between keywords.
Starting Data & Desired End State:
To start, we need to loop through our product name field and break out each word used into its own data entry. Here’s what we have to start:
And here’s what we’d want for analytics:
So that we can create reports like the following word cloud to show trending keywords:
How to Get There:
To accomplish this, we’ll use Looker’s derived tables feature to create an analytical pattern (this example uses Snowflake, but could be done in other syntaxes as well) that unpacks our string field into individual records.
We’ll also make use of Lloyd Tabb’s SQL generate series SQL pattern here to generate a number series that we’ll leverage to simulate a loop that iterates through every word in a string field.
select id, brand,
SPLIT_PART(SPLIT_PART(tablename.description, ' ', numbers.n+1), ' ', -1) as parsed_description
+ p2.n * POWER(2,2)
+ p3.n * POWER(2,3)
+ p4.n * POWER(2,4)
+ p5.n * POWER(2,5)
+ p6.n * POWER(2,6)
+ p7.n * POWER(2,7)
(SELECT 0 as n UNION SELECT 1) p0,
(SELECT 0 as n UNION SELECT 1) p1,
(SELECT 0 as n UNION SELECT 1) p2,
(SELECT 0 as n UNION SELECT 1) p3,
(SELECT 0 as n UNION SELECT 1) p4,
(SELECT 0 as n UNION SELECT 1) p5,
(SELECT 0 as n UNION SELECT 1) p6,
(SELECT 0 as n UNION SELECT 1) p7
) as numbers
INNER JOIN (
SELECT id, brand, name as description
) as tablename
on LENGTH(tablename.description) - LENGTH(REPLACE(tablename.description, ' ', '')) >= numbers.n
order by id, n
Let’s review what’s happening here:
The “numbers” nested table is generating a number series
The “tablename” nested table is selecting the field we’d like to parse and the join key (id in this example) to relate it back to other tables, as needed, for later analysis
The inequality join is telling Snowflake to join the number series to a given description when a number (n) is <= the number of distinct words in that description
The SPLIT_PART function then breaks out each individual word into its own data record
How to Use This Pattern:
Consider underlying data volume and DB performance
Limit to recent events to spot trending topics or keywords (e.g. product X mentioned 50% today than yesterday, with over Y people talking about it)
Use Looker’s Scheduling & Alerting features to proactively monitor and spot trends
Pair with Looker’s Affinity Analysis Block to analyze associations between words (e.g. are customers who mention topic A also more likely to mention topic B?)