Hi, I am working on a dashboard that is based on BigQuery
there are several conversions I am sending to BigQuery and I am trying to get the session scoped source and medium like analytics provided in the UI, unfortunately when observing the records of my conversion events I see no presence for the session scoped utm's so I am left with null records
is there a solution to this scenraio or a workaround, currently I am measuring things based on first user scoped utm's because I have no other option and it raises a lot of questions when comparing to GA4 UI
thanks in advance 🙂
Solved! Go to Solution.
Dealing with Discrepancies in Direct/None Traffic and Ecommerce Metrics in GA4 BigQuery Queries
Understanding the Challenges:
Strategies to Address These Issues:
1. Understanding Direct/None Traffic
2. Refining Query for Direct/None Traffic
CASE WHEN traffic_source.source IS NULL OR traffic_source.source = '' THEN 'direct' ELSE traffic_source.source END AS source,
CASE WHEN traffic_source.medium IS NULL OR traffic_source.medium = '' THEN '(none)' ELSE traffic_source.medium END AS medium
3. Handling ecommerce.purchase_revenue
4. Query Optimization for Additional Metrics
5. Testing and Validation
6. Advanced Debugging
Step 1: Construct a new table in BigQuery that captures the session-scoped source and medium for each session:
SELECT session_id,
FIRST_VALUE(traffic_source) OVER (PARTITION BY session_id ORDER BY event_timestamp ASC) AS session_scoped_source,
FIRST_VALUE(traffic_medium) OVER (PARTITION BY session_id ORDER BY event_timestamp ASC) AS session_scoped_medium
FROM `[YOUR_PROJECT_ID].[YOUR_DATASET_ID].[YOUR_EVENT_TABLE_ID]`
Step 2: Merge the conversion event table with the newly created session-scoped source and medium table using the session_id column.
Step 3: Incorporate the session_scoped_source and session_scoped_medium columns in your dashboard to evaluate the efficacy of your conversions based on session-scoped source and medium.
For instance:
SELECT event_name,
COUNT(*) AS conversion_count,
session_scoped_source,
session_scoped_medium
FROM `[YOUR_PROJECT_ID].[YOUR_DATASET_ID].[YOUR_EVENT_TABLE_ID]`
JOIN `[YOUR_PROJECT_ID].[YOUR_DATASET_ID].[YOUR_SESSION_SCOPED_SOURCE_MEDIUM_TABLE_ID]`
ON `[YOUR_PROJECT_ID].[YOUR_DATASET_ID].[YOUR_EVENT_TABLE_ID]`.session_id = `[YOUR_PROJECT_ID].[YOUR_DATASET_ID].[YOUR_SESSION_SCOPED_SOURCE_MEDIUM_TABLE_ID]`.session_id
GROUP BY event_name,
session_scoped_source,
session_scoped_medium
event_name | conversion_count | session_scoped_source | session_scoped_medium |
---|---|---|---|
purchase | 100 | organic search | |
purchase | 50 | social media | |
signup | 200 | direct | (direct) / (none) |
Always test the queries on a subset of your data initially to ensure accuracy and optimal performance.
Hi, thank you very much, I will definitely try this workaround
I tried it and I did manage to generate a report but I still got a lot of difference when comparing my data set to the exploration report in GA4, any suggestions?
There are a few possible reasons why you might be seeing differences between your BigQuery report and the exploration report in GA4:
To troubleshoot the differences between your two reports, you can try the following:
Here are some additional tips for comparing your BigQuery report to the exploration report in GA4:
hey,
I am facing a similar issue and I tried many different workarounds to fetch the required results, however it does not work properly. The main issues is that it does not gets direct/ none data correctly and additionally if you add an extra metric such as ecommerce.purchase_revenue the results also get all mixed up. Is there a way to fix these?
Dealing with Discrepancies in Direct/None Traffic and Ecommerce Metrics in GA4 BigQuery Queries
Understanding the Challenges:
Strategies to Address These Issues:
1. Understanding Direct/None Traffic
2. Refining Query for Direct/None Traffic
CASE WHEN traffic_source.source IS NULL OR traffic_source.source = '' THEN 'direct' ELSE traffic_source.source END AS source,
CASE WHEN traffic_source.medium IS NULL OR traffic_source.medium = '' THEN '(none)' ELSE traffic_source.medium END AS medium
3. Handling ecommerce.purchase_revenue
4. Query Optimization for Additional Metrics
5. Testing and Validation
6. Advanced Debugging
@ms4446 Hi, I have a question about "2. Refining Query for Direct/None Traffic".
In my understanding, "traffic_source" does not provide event-scoped traffics, but it provides user-scoped traffics, which provides where the user first came from on the website.
Instead, if I am right, we should use "collected_traffic_source", which is a newly added record on June 2023, or values in parameters like "campaign", "source" or "medium" in event_params record before June 2023 to fetch event-scoped traffics. This is supported in other articles like below.
https://mu-kong.medium.com/get-session-campagin-from-ga4-exported-bigquery-data-fcb568afc44
https://tanelytics.com/ga4-bigquery-session-traffic_source/
I am concerned if my understanding is wrong, so please let me know the correct way to get session-scoped traffics when you finds out.
Thanks.
Your understanding is correct, and your approach aligns with the evolving nature of GA4's data schema and how session-scoped traffic sources are captured in BigQuery.
Initial Limitations:
traffic_source
fields). This shows the first source a user arrived from.Updates and Enhancements:
collected_traffic_source
field. This provides a more accurate way to understand traffic sources for individual sessions or events.Using collected_traffic_source
Event-Scoped Traffic Before the Update:
event_params
to extract event-scoped traffic source information. This involved parsing parameters like campaign, source, or medium. It required more work, but was the main method before collected_traffic_source
.Your Approach
collected_traffic_source
field.event_params
.Example Queries:
Using collected_traffic_source
:
SELECT
event_name,
event_timestamp,
collected_traffic_source.source,
collected_traffic_source.medium,
collected_traffic_source.campaign
FROM `your_project.your_dataset.events_*`
WHERE _TABLE_SUFFIX BETWEEN 'YYYYMMDD' AND 'YYYYMMDD'
Extracting event-scoped traffic from event_params
:
SELECT
event_name,
event_timestamp,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') AS source,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') AS medium,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'campaign') AS campaign
FROM `your_project.your_dataset.events_*`
WHERE _TABLE_SUFFIX BETWEEN 'YYYYMMDD' AND 'YYYYMMDD'
Your approach of using collected_traffic_source
for session-scoped traffic analysis and event_params
(for data before the update) is accurate and recommended for detailed traffic source analysis in GA4 BigQuery exports.
I am relieved to hear that my approach is accurate and is a recommended way. Thank you for your quick and detailed answer.
Hi @ms4446 ,
I have tried to count the sessions using
collected_traffic_source.manual_source,
collected_traffic_source.manual_medium,
COUNT(DISTINCT user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))) sessions_count
on my dataset for event period after June 2023 but the result somehow still have huge discrepancy with the sessions source/medium in GA4 Interface. I have tried to use traffic_source.source and traffic_source.medium but the result still far different especially for Google / Organic and Direct . What is the best practice to get the nearest result as in GA4?
Hello,
Is there a way to get the ecommerce.purchase_revenue and ecommerce.purchase_revenue_in_usd values for non-null source, mediums like google, facebook etc?
Tried using both collected_traffic_source and event_params methods above but all total_revenue and revenue_usd seems to be tagged to null source, medium.
Sample of simple sql below.
SELECT
((SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source')) AS source,
((SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium')) AS medium,
SUM(ecommerce.purchase_revenue) AS revenue,
SUM(ecommerce.purchase_revenue_in_usd) AS revenue_usd
FROM
`datasource`
GROUP BY
source,
medium
Results look like example below
owsourcemediumrevenuerevenue_usd
1 | null | null | 1000 | 10 |
2 | google | cpc | null | null |
3 | adroll | Email | null | null |
4 | Instagram | display | null | null |
5 | google | organic | null | null |
Thank you in advance!
j
The issue you're encountering, where ecommerce.purchase_revenue
and ecommerce.purchase_revenue_in_usd
are primarily associated with null sources and mediums, highlights a challenge in accurately attributing revenue to specific traffic sources within your GA4 BigQuery data. Here's why this happens:
Refined Approach for Attribution
To get better attribution, focus on session-level or user-level data when analyzing purchase events. Here's what to do:
collected_traffic_source
(Post-June 2023 Data): This field gives more accurate session-level attribution for data collected after June 2023.Example Query
Here's a query demonstrating how to attribute revenue to sessions based on traffic source (assuming you can link purchase events to sessions):
WITH SessionTraffic AS (
SELECT
session_id,
MAX(IF(key = 'source', value.string_value, NULL)) AS source,
MAX(IF(key = 'medium', value.string_value, NULL)) AS medium
FROM `datasource`,
UNNEST(event_params)
WHERE event_name = 'session_start'
GROUP BY session_id ),
PurchaseRevenue AS (
SELECT
session_id,
SUM(ecommerce.purchase_revenue) AS revenue,
SUM(ecommerce.purchase_revenue_in_usd) AS revenue_usd
FROM `datasource`
WHERE event_name = 'purchase'
GROUP BY session_id )
SELECT
st.source,
st.medium,
SUM(pr.revenue) AS revenue,
SUM(pr.revenue_usd) AS revenue_usd
FROM SessionTraffic st
JOIN PurchaseRevenue pr
ON st.session_id = pr.session_id
GROUP BY st.source, st.medium
This is amazing.
Tested it out and while a bulk are still attributed to null (around 60-70%), I can see other source/mediums showing non-zero/null values for revenue now.
Thank you so much!
User | Count |
---|---|
1 | |
1 | |
1 | |
1 | |
1 |