GA4 BigQuery: Mastering event data testing
Discover missing or incorrect data. Analyse cross-device tracking and advanced consent mode behaviour.
In this article, we'll explore how to test event data by focusing on a specific user.
By utilizing BigQuery GA4 export + Looker Studio you can take this process to the next level 🚀.
User ID
When using Advanced Consent Mode, Google Analytics 4 collects hits when consent is “denied” (gcs is G100).
These hits have a lot of parameters stripped off. Identifiers like the Client ID (user_pseudo_id) and the Session ID are null in the BigQuery export.
But surprisingly 👀 User ID is always present in the export with a lot of valuable parameters as well.
To reduce the chance that the user_id is not automatically transferred from the Config tag in Google Tag Manager, sending it with each event as a parameter is recommended.
User_id is a unique identifier, usually tied to the user's e-mail address. It is also used as a key for joining with other tables in BigQuery. Or also as a key for enriching events through Cloud Firestore in near real time in Server-side GTM.
So I suggest encouraging users to log in to your website or application. Start with implementing Google Sign-in on your web or app.
Or try interesting techniques like the one described in this article 👇
Query
WITH user_events AS (
SELECT
event_name,
user_id,
user_pseudo_id,
event_date AS date,
device.category AS device_category,
geo.city,
privacy_info.analytics_storage AS analytics_storage,
privacy_info.ads_storage AS ads_storage,
COUNT(*) AS event_count
FROM
`analytics_XXXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20240321' AND '20240321' -- adjust the date range
AND user_id = 'u12345' -- change this to your user_id (you can easily copy it from dataLayer)
GROUP BY ALL
)
SELECT
event_name,
user_id,
user_pseudo_id,
date,
device_category,
city,
analytics_storage,
ads_storage,
SUM(event_count) AS total_event_count
FROM
user_events
GROUP BY ALL
ORDER BY date, total_event_count DESC
Query results
The table contains all events triggered for a specific date range, with important parameters for more in-depth analysis.
User_pseudo_id - is null, because of denied consent - but it is possible to stitch user_id retroactively to client_id (user_pseudo_id) if your Legal / Compliance team approves it.
Device_category - is necessary for cross-device analysis
City - is just out of curiosity, you should discard the user's location in the production data due to GDPR
Analytics and ads_storage - is for testing consent mode behavior
Quick analysis
I immediately noticed that the event job_ application_sent was sent more than once.
That day I sent only one application, so the event should only occur once. However, I had the thank you page in the browser open and refreshed the tab three times.
The devs can adjust the event to push only once. Alternatively, a unique application_id can be used in BigQuery for deduplication since we send it with the event.
Explore with Looker Studio
The data in the results table are often hard to read. Especially if we analyze more data over a longer period.
So instead of writing additional queries, I often use the “Explore with Looker studio” option via direct integration.
Just click on Explore data → Explore with Looker Studio 😎.
In Looker Studio, I used my favorite pivot table to see if cross-device tracking was working correctly.
Notice how the first_visit and session_start events behave. They are triggered with almost every page_view because analytics storage is “denied”. Therefore, I do not prefer counting these events.
During this one day visit, I was recorded 7 times as if I had a first visit. So take this into account when writing queries with session_start and first_visit.
Never save and share such reports in Looker Studio! So you don't have to pay for running the query behind the report. Such a method is only good for AdHoc analysis.
If you want to analyze a specific event with all fields, use this query:
select * from `analytics_XXXXXXX.events_*`
WHERE
event_name IN ('page_view') -- change this to your event
AND _TABLE_SUFFIX BETWEEN '20240820' AND '20240820'
AND user_id = 'XXXXXX'
Summary
BigQuery is amazing! And this is just a small sample of what is possible. Feel free to explore and modify the query as you like. I think this is a much better way to understand and learn how things work in digital analytics nowadays than relying on a black-box Google Analytics 4 UI. Not to mention that BigQuery costs next to nothing for small projects and Looker Studio is free.
All you need to do is invest in quality education 👇
https://www.teamsimmer.com/all-courses/query-ga4-data-in-google-bigquery/