Parsing json and array of json is usually a pain, so I just wanted to share what I use to do this. In the following example, I used the dataflow mongoDB template, which is a no code way of piping your data from mongoDB to Bigquery. In this case, the flattened option was not selected.
So this template created a table called delivery_orders in BigQuery with 2 columns, one called id and one called source_data which would include a json blob with the following structure:
{
"id": "gkdlvjhkfdld",
"user_id": "a7087yv9a8fds7vya9,
"billing_address": {
"name": "john doe",
"street": "7 fun plaza",
"number": "193",
"zip": "5123456",
"city": "zombieland"
},
"orders": [
{
"store_id": "9bb9dc1b1a58c5aa7",
"products": [
{
"gtin": "6234623462346",
"quantity": 12,
"price": "11.99",
"status": "DELIVERED"
},
{
"gtin": "4547413453466",
"quantity": 1,
"price": "1.99",
"status": "DELIVERED"
}
],
"total": "13.98"
},
{
"storeId": "a0545518b9adc6",
"products": [
{
"gtin": "4260745675612",
"quantity": 1,
"price": "6.9",
"status": "DELIVERED"
}
],
"total": "6.9"
}
]
}
bWe notice that the delivery_orders blob includes an array (a repetitive pattern of data with the same structure) for all the orders which has a variable size as it can include from 1 to x amount of orders inside it. It starts and ends with a square brackets. And within an order we can also have a variable amount of products, also stored in an array.
When creating the view in Looker, I would get the following:
view: delivery_orders {
sql_table_name: `mongodb.delivery_orders` ;;
dimension: id {
primary_key: yes
hidden: yes
type: string
sql: ${TABLE}.id ;;
}
dimension: source_data {
hidden:yes
type: string
sql: ${TABLE}.source_data ;;
}
}
To get all the first level metrics, like user_id, I just need to add the following in my delivery_orders view:
dimension: user_id {
type: string
sql:REPLACE(JSON_EXTRACT(${source_data}, '$.user_id'),"\"","") ;;
}
The replace is used to remove the double quotes.
Now to parse the array of orders, here is the process: In delivery_orders view, create a new dimension called orders which will include all the orders:
dimension: orders {
hidden: yes
type: string
sql: JSON_EXTRACT_array(${source_data} , '$.orders');;
}
this will just extract the json into an array, which Bigquery understands.
Now to get all the orders form the array, wer are going to create a new view called orders:
view: orders {
sql_table_name: `orders`;;
dimension: products {
hidden: yes
type: string
sql: JSON_EXTRACT_array(${TABLE} , '$.products');;
}
dimension: total {
type: number
sql: cast(REPLACE(JSON_EXTRACT(${TABLE} , '$.total'),"\"","") as float64) ;;
}
dimension: store_id {
type: string
sql: cast(REPLACE(JSON_EXTRACT(${TABLE} , '$.store_id'),"\"","") as string) ;;
}
}
Notice that this view also includes an array for all the products included in an order. I am also creating new dimensions for all the “first level” dimensions like total or store_id
Now to flatten this orders view within my delivery_orders explore, in my model, I write the following code
explore: delivery_orders {
join: orders {
sql: LEFT JOIN unnest(${delivery_orders.store_orders}) orders ;;
relationship: one_to_many
}
}
Make sure the orders at the end before the semicolons matches the name of your view.
Now for the products , I am also creating a new view:
view: products {
sql_table_name: `products` ;;
dimension: gtin {
type: string
sql: cast(REPLACE(JSON_EXTRACT(${TABLE} , '$.gtin'),"\"","") as string) ;;
}
dimension: price {
type: number
sql: cast(REPLACE(JSON_EXTRACT(${TABLE} , '$.pricePerUnit'),"\"","") as float64) ;;
}
dimension: status {
suggest_persist_for: "24 hours"
type: string
sql: cast(REPLACE(JSON_EXTRACT(${TABLE} , '$.status'),"\"","") as string) ;;
}
dimension: quantity {
type: number
sql: cast(REPLACE(JSON_EXTRACT(${TABLE} , '$.quantity'),"\"","") as float64) ;;
}
}
And I can add this to my explore in the model:
explore: delivery_orders {
join: orders {
sql: LEFT JOIN unnest(${delivery_orders.store_orders}) orders ;;
relationship: one_to_many
}
join: products {
view_label: "Products"
sql: LEFT JOIN UNNEST(${orders.products}) products ;;
relationship: one_to_many
}
}
And that's it. Now BigQuery can easily access all the data form your json!