Ahh, Google Analytics 4 – as controversial as it is powerful. Love it or hate it, it’s here to stay along with its quirky way of handling and managing session channel attribution.
And nope… Although UTMs are often the key culprit in breaking attribution reports, this isn’t the case here – there’s another ‘force’ at work that’s responsible for impacting the accuracy of your channel reporting.
Before we dig into how to fix the problem at hand first (and help you to diagnose it within your own GA4 property), let’s first unpack precisely how GA4 defines session attribution and why it’s likely reporting incorrectly within your properties.
So, how does GA4 define its traffic channel groupings?
Note: If you’re concerned or curious as to how GA4 defines what traffic goes to specific default channel groupings and what each of them means, take a look at their official documentation which details each of them.
Typically it uses variables from the referrer which are source, medium and campaign. The vast majority of channel groupings use the source and medium attributes.
So for example, the organic search channel grouping is where traffic:
- Matches a list of referring websites (including Google.*, Bing, DuckDuckGo etc)
- The medium attribute matches ‘organic’
Comparatively, the Paid Search channel grouping us where traffic is:
- Matches a list of referring websites (including Google, Bing, DuckDuckGo etc)
- The medium must contain ‘cp’, ‘ppc’, ‘retargeting’ or begin with ‘paid’
Ultimately, there are a set of rules in GA4 that define what sessions end up where. All conversion data and other metrics are then ‘stitched’ to that session and the session is then allocated to its respective default channel.
For clarification, a session can only be allocated to a single default channel group, but a user can have multiple sessions with each session being allocated to a different channel.
How can UTMs break session attribution?
It wouldn’t be a thorough dive into GA4’s attribution quirks if we didn’t also talk about how UTMs can and have impacted it.
Note: To learn more about UTMs, how they work and why they’re essential in digital marketing, take a look at this cool article over on Forbes.
First, let’s take a step back to Universal Analytics for just a moment. When partner agencies or internal teams misspelled or misconfigured a source, medium or campaign name within UTM parameters, it would often cause traffic and conversions to be allocated to the incorrect channels or be placed into ‘(other)’.
Now, with UA, we could quickly clean up those attribution errors by tweaking the default channel groupings and we’d wash our hands of the problem. Simple fix! Off to the pub…
When GA4 first launched cough prematurely cough it was not possible to do that – as a result, there were a large amount of GA4 properties where the channel attribution had gone sideways and they were wildly different compared to the attribution data seen in UA.
The only way to fix it at the time? Replace or fix the UTM parameters on the URLs – which wasn’t always simple to do; particularly in large corporate setups.
ℹ️ Note: The good news is that GA4 will now allow you to build custom channel groupings. However, applying those changes to your primary reports is a little trickier.
Here’s the session attribution quirk in GA4 that’s causing problems
Now that we’ve UTMs out the way, let’s dive into the quirk within GA4 that could be causing your property to exaggerate and incorrectly report higher traffic levels for the ‘Direct’ channel.
Let’s document this issue using four steps:
Step 1: A user starts a session
When a visitor hits your website with GA4 installed a session starts, the user is identified (or GA4 sees the visitor as a new user) the default channel grouping for the session is allocated and each page view related events are tracked within GA4 under that session.
In this example, let’s assume that this visitor came from Google Ads and was therefore allocated to the Paid Search default channel grouping for their session
Step 2: The session goes ‘dark’
Let’s assume your visitor is now distracted and is no longer actively browsing your website. Although your website is still open on their device, there has been no activity for the last 40 minutes. After a set amount of time (the default in GA4 is 30 minutes) GA4 decides that their particular session has now ended.
The problem here is that their original default channel for the session
Step 3: Your visitor now returns
Your website visitor is now back to browsing your website (via their original browser tab) and GA4 kicks off a fresh session for them as the last one closed due to no activity for 40 mins.
Now, here lies the problem…
This new GA4 session does NOT inherit the default channel and attribution from their first session. In fact, there is no attribution data at all. Ultimately source defaults to (direct)
and the medium parameter is empty so it’s (not set)
.
This forces this session and all subsequent page views during this session to be placed into the the Direct default channel grouping. There’s no recollection of the ‘Paid Search’ grouping for this session.
ℹ️ This is what is responsible for the over-reporting of Direct traffic!
Step 4: Your visitor converts
So your visitor decides to complete their purchase/convert. Instead of the sale/conversion being allocated to the original session’s channel which was Paid Search, it has been allocated to Direct.
Isn’t this a pretty limited edge case?
I know what you might be thinking.. This is such a unique combo of events, it’s probably only a small number of sessions that are impacted by this, right?
Well – not quite. At least not in the cases I’ve identified and since fixed. It’s definitely more concerning and recurrent than you might be thinking. Particularly on ecommerce sites.
A particular client with the above issue caused Direct traffic to be over-reported by 29%
How to identify if you’re GA4 is impacted by this problem
For this, we’ll need to build and poke about in an exploration report. You’ll need to build a free form report and import the following dimensions and metrics:
Dimensions to import
- Landing page + query string
- Event name
- Session source/medium
Metrics to import
- Sessions
- Total Users
Filters
- Create a filter where session source / medium is equal to
(direct) / none
- Create a second filter when event name contains
session_start
Values to set
- Sessions
- Total Users
Rows
- Landing page + query string
- Event name
Once you’ve set up the above, you should have the report you need which will look like the report I’ve prepared below:
If (not set) is high up the list for the landing page column (assuming you’ve sorted sessions in descending order), then you likely have an over-representation of Direct traffic!
ℹ️ Note: Calculate the percentage of (not set) landing pages for the session_start event compared to the total. This will give you a rough percentage of how much your Direct traffic is over-reporting by.
How do you fix the problem?
The good news is, there are two ways to address the issue and improve the accuracy of your attribution. The first is more of a ‘quick and dirty’ fix and the second is (at least in my case) is much more robust.
Solution A: – Extending the session timeout window
If you recall – the reason why this issue exists in the first place is due to an inactive session ending with another starting afterward. The new session loses the channel data from the first session which places the second session within the Direct channel group.
Extending the session timeout window will give the original sessions larger windows before they timeout and lose the initial session’s channel data.
The default setting is 30 minutes and you can extend it up to a maximum 7h and 55m.
My recommendation here is to extend by 30-60 minutes at a time and see how that impacts your Direct traffic levels after a week or so -then extend if required.
Solution B: Fire an automated event shortly prior session timeout
By firing an event every 29 minutes (1 minute before the session timeout period) to GA4 via GTM it will cause GA4 to reset the timer to keep the original sessions active thus avoiding the loss of critical session channel data.
ℹ️ Although this event will show in your event reports within GA4, it can be ignored/disregarded. Ultimately, this is simply a throwaway event that posts to GA4 to keep the original session up and running.
Here’s how to get this set up in GTM. First the trigger:
- You’ll need to leverage a Timer trigger
- Rename the event if you wish. Personally, I’ve called it session_reset_timer for easier recognition
- Set the interval to 1,740,000 (1.74 million milliseconds) which is 29 minutes in milliseconds or simply one minute less than your session timeout window in GA4
- I’ve kept the limit to 10 – which is how many times the timer trigger will fire once each interval expires.
- Set the first condition dropdown to ‘Event’ and enter ‘gtm.load’ into the text box. This means that the timer will start once the GTM container has fully loaded on the page
ℹ️ Don’t forget to apply your user’s cookie preferences to your trigger in order to adhere to your applicable cookie and compliance laws.
Time for the tag – for this, we’ll need to leverage a Google GA4 Event Tag and attach it to the trigger we’ve just built.
For the event name, I’ve specified session_channel_fix
. You should call it something memorable and clear so you can remember to ignore/exclude it from various reports etc.
Once you’ve tested and pushed your changes live, a session_channel_fix event will fire a minute before the session is expected to fire; thus resetting the timeout window and GA4 and retaining the initial channel data.
Bonus solution: A bit of both
Why not extend your timeout window within GA4 in addition to launching the timer and event via GTM?
Doing so will reduce the number of events that fire to GA4. So instead of firing an event every 29 minutes, you could set your timeout in GA4 to say 60 minutes and only fire an event to GA4 every 59 minutes instead.
Whichever solution you select, you’ll likely see a decrease in misallocated Direct traffic and subsequently traffic increases to other channels.
Granted, GA4 isn’t perfect, but when used correctly it is powerful albeit with a handful of customisations and workarounds.
So, there you have it! More accurate session attribution within GA4 🙂
Comments are closed.