in Danish
Here’s the situation: You want to run a Random Forrest algorithm on your Google Analytics (GA) data, to see if there are any significant patterns in the way users behave on your site. But the data Google lets you retrieve from your free GA-account are not nearly fine grained enough. For one, there is no user ID attached to the data, so you can’t tell which user has done what. Also, there’s no timestamp on the records, so it is very hard to make sense of any of it.
Fortunately, there’s a hack you can apply to GA, which fixes the problems. This is what this blogpost is about. (I wrote an introduction to the complete process of getting data out of GA the other day. You can check it out here, if you would like).
Creating custom dimensions
To have a User ID and the timestamp merged into your GA, you first need to create two GA-custom dimensions. We call them “Client ID” and “Hit timestamp” respectively. Here’s what to do, to make it work:
- Log into the Google Analytics account.
- Click on admin.
- Choose an account from the account column.
- Choose a property from the menu within the property column.
- ….
- …
- …
- Confirm “Active” and “Create”.
- The custom dimension now has an index number which can be referred to by a custom JavaScript variable in Google Tag Manager.
- Click on “+ NEW CUSTOM DIMENSION”.
- Add “Timestamp” as “Name” and set “Scope” to “Hit”.
- Confirm “Active” and “Create”.
The complete 12 steps needed to implement the Customer ID and Hit timestamp-variables are available in our whitepaper “Google analytics, ML and AI”. You can download the whitepaper for free right here.
Got it? Google Analytics now should have the correct setup to receive data.
Making Google Tag manager collect the data for you
The two custom dimensions are not worth much by themselves, however. For them to have any purpose, the Client ID and Hit Timestamp must be collected from the user as he browses the site as well. Otherwise there’ll be no data for the Custom Dimensions to receive. This is done by implementing two tags on your website – one for each of the two variables in question.
The two tags are to be implemented within each of the pageview and event tags already existing on your site, in order to have your new Client ID and Hit timestamp available in all the relevant records in your GA.
You can off course do this by hand – but I definitely do not recommend this. Instead you should use a tag manager to handle the tags. Like e.g. Google’s own Google Tag Manager. Here’s how to set up the two tags in Google Tag Manager:
- Log in to your Google Tag Manager…
- Set “Field Name” equal to “cookieDomain” and the associated “Value” to auto.
- Set other “Field Name” equal to “customTask” and its “Value” to advanced.
- Set “Name” to “Send Client ID to Custom Dimension”.
- Set “Type” to “Custom JavaScript”.
- As “Custom JavaScript” the below code is pasted. Note that the “customDimensionIndex” must represent the index number for the relevant custom dimension in Google Analytics.
The full code needed to implement the Customer ID variable in Google Tag Manager is available in our whitepaper “Google analytics, ML and AI”. You can download the whitepaper for free right here.
- Under “More Settings” choose “Custom Dimensions”.
- Set “Index” to the index number for “Timestamp” in Google Analytics.
- As “Dimension Value” add a “Custom JavaScript” with the below code.
The full code needed to implement the Hit Timestamp variable in Google Tag Manager is available in our whitepaper “Google analytics, ML and AI”. You can download the whitepaper for free right here.
- As “Firing Triggers” choose “All Pages”
- Having done the above for each relevant tag validate through “PREVIEW”.
Now both client ID and hit timestamp will start being sent to the two custom dimensions we created in Google Analytics. And – what’s even more important – the data will also be available in the API provided by Google Analytics. Problem solved: The GA data extracts are now perfectly suited for ML and AI analysis.
Next question is: how do I extract the data from GA? And particularly: is there a workaround, that makes you bypass the upper limit of how much data you can extract at a time from GA? Luckily these to questions can be answered in the affirmative too – how this is done is the subject of my next blogpost: “How to extract unsampled data for ML/AI analysis from Google Analytics through API for Python”. Stay tuned.
Credits: thanks to Simo Ahava for great inspiration to this blogpost
Create your own custom dimensions – download whitepaper
This is the second blogpost about how to make data science based on data extracts from the free edition of Google Analytics. You can read all of the blogposts, including coding examples and check list for free in our whitepaper “Google analytics, ML and AI“. It is available for free download right here.
in Danish