How to extract raw data from Adobe Analytics and use them for ML and AI-analysis

28 Aug 2019

Here’s the situation: You want to run machine learning (ML) algorithms on your Adobe Analytics data, and perhaps even combine this data with data from other sources. But the data that Adobe makes available to you is not nearly fine grained enough to do the trick. That is: by default, they’re not. But with a little Adobe-massaging, a lot can be done. Here’s a guide of how to make this happen.

The other day a client brought me an issue he was facing. The page load speed on his website was unacceptable high, and he was afraid of how the visitors, their user journeys and ultimately his sales, was affected by this. To help him find out about this case, and what could be done about it, I looked to his Adobe Analytics installation, which stored all activity by all users on his website. Surely, this data would be able to tell us which users actually experienced the delayed page load, and also how page load time affected the conversion rates of individual users. In particular, we wanted to use ML and AI to analyse the huge amount of data, the Adobe system holds (I had my eyes set on open-source XGBoost for this job).

Adobe-data are not fine grained enough (at least not by default)

To pull tricks like these, you need data to be as fine grained as possible – that is: at least on an individual user-level. Even though the Adobe Analytics Workspace UI is versatile, the Adobe-data is always aggregated, summing up the behavior of many different users to deliver metrics as page views, bounce- or conversion rates. Consequently, the UI is not well equipped to let you analyze data from each individual users’ perspective: The requested table wasn’t available – certainly not out of the box.

Data Warehouse to the rescue

Fortunately, Adobe lets you extract the huge amounts of data needed through two separate approaches:

The first approach is the Data Warehouse. Here you can select from all breakdowns (dimensions), metrics and segments for any predefined data range. This data is already preprocessed and aggregated by Adobe.
The second approach is to use Data Feeds. Here you get partially processed data that has been sent to Adobe. Compared to the Data Warehouse this is highly granular hit level data.

I went for the first method – the Data Warehouse approach – to help my client out. This whitepaper covers what to do, to do the same.

Any size. Right away: Pulling the Data Warehouse data from the Adobe Analytics API using R and RSiteCatalyst

The way to pull data out of the Adobe Data Warehouse I like the best, is by requesting the export through the Adobe Analytics API using the statistical programming software, R. It’s fast and it provides you with highly granular, high volume datasets. However, you do need some coding skills to proceed this way. (If you’re not into R, there is a another – slightly limited – way to get hold of the data. I cover this in the next section of this whitepaper).

I prefer to pull data out with the RSiteCatalyst package as this enables you to request large amounts data without dealing with JSON (if you don’t mind JSON, though, feel free to access the Adobe Analytics API via JSON with e.g. Postman). R can be installed on Linux, Windows and Mac OS X from cran.r-project.org. If you want to follow my lead with R, I recommend that you also download RStudio from rstudio.com as it provides a user-friendly integrated development environment.

Having installed R and RStudio you need to ensure that your user account has Web Services Access. Afterwards you can leverage the power of RSiteCatalyst by identifying your username and secret within Adobe Analytics:

Go to Admin. From the default landing page in Adobe Analytics click through to the “Admin” section.
Download free whitepaper for detailed instructions
Go to “Analytics Users & Assets”. From the “Admin” section, click through to the “Analytics Users & Assets” section.
Download free whitepaper for detailed instructions
Locate yourself. Use the search field in order to locate yourself.
Download free whitepaper for detailed instructions
Connect API. Once you click on your “USER ID” this will open the details associated with your user account. Under the “Web Service Credentials” header you will find the necessary credentials to connect to the API. The “User Name” should be your email followed by the company name whereas the “Shared Secret” is a 32 character string.
Download free whitepaper for detailed instructions

Now that you have located your credentials you can connect to the API with R using the below script which installs and loads RSiteCatalyst in R:

Request report suites data frame. Considering that a connection with the API now has been established you can start querying data. First, request a data frame that contains the report suites you want to extract data associated with.
Download free whitepaper for detailed instructions
Store as vector. You are now able to open a data frame that contains the report suite ID under the “rsid” header. The report suite you want to export data from can be stored as a character vector.
Download free whitepaper for detailed instructions

Request elements, metrics and segment data frames. Since you now have defined which report suite you will be extracting data from, a sensible next step is to request data frames that contain all relevant elements (dimensions), metrics, segments, props and eVars.
Download free whitepaper for detailed instructions

Append IDs. Importantly, both the Analytics Visitor Id and the Experience Cloud ID is not part of the elements data frame. Consequently, we need to append these dimensions to the elements data frame.
Download free whitepaper for detailed instructions

Specify headers. We can now access the data frames and specify which items we want to use for our report. Note that the items must be referenced by their value in the “id” headers. Below we specify these headers which will be used to create a request for the number of page views and average time spent on page broken out by visitor id, page name and device.
Download free whitepaper for detailed instructions

Make names correspond. As mentioned, the “id” headers do not always have meaningful names. For instance, “evar1” represents the page name. Fortunately, the “id” header also has a corresponding “name” header. Having completed the previous steps, a reference data frame containing the corresponding names can be created with the below snippet.
Download free whitepaper for detailed instructions

The above will also be beneficial when we are to create meaningful headers for the export.

Export data. Having completed the preceding steps, we are now ready to export data with the “QueueDataWarehouse” function. In the below export we input nine arguments into the function:

id – report suite id stored in the character vector.
from – start date for the report (YYYY-MM-DD).
to – end date for the report (YYYY-MM-DD).
metrics – metrics specified in the “used_metrics” object.
elements – elements specified in the “used_elements” object.
granularity – time granularity of the report (year/month/week/day/hour), the default is “day”.
seconds – how long to wait between attempts.
attempts – number of API attempts before stopping.
enqueueOnly – only enqueue the report, don’t get the data. Returns report id, which you can later use to get the data.

By default, the function will run for ten minutes before it stops (120 attempts separated by 5 second pauses). In my experience these defaults need to be adjusted upward in order to complete requests for larger exports.

Furthermore, it is also possible to simply enqueue the report without actually receiving the data by setting “enqueueOnly” equal to true.

When the following snippet is run, a request will be made for a report with the predefined metrics and elements with an upward adjusted number of attempts and pauses:

Make headers meaningful. Once you have successfully retrieved the data the below snippet can be used to map the meaningful header names to the export data frame. Note that “datetime” will always be the first column.
Download free whitepaper for detailed instructions

Excel it. Finally, if you would like to work with the data in excel format, R offers an easy way to export data frames as .csv files.
Download free whitepaper for detailed instructions

Of course, this was only an example of what kind of data you could potentially export. Actually, you could export data with a lot of other metrics and elements and transform the data as you see fit.

Long wait, small sizes but no coding: Sending the Data Warehouse request to your email as a .csv file

If you’d rather not embark on a journey into R-programming, there’s another way to get hold of the data. All you need to do here is to go into the Adobe Analytics interface, specify the report details and enter your email address in order to receive the desired data. If the data volume you want to extract can be held under 10 MB and if you’re not in a rush to get the data (it’s sent through email and can be several hours underway), this method might be just fine for you.

Log into Adobe Analytics…
Hover over the “Tools” header and click on “Data Warehouse”.
Specify the “Request Name”. This is done in order for you to locate your request in the “Request Manager” afterwards.
…Download free whitepaper for detailed instructions
Click “Request this Report” to start scheduling the report.

Now all you’ll have to do is to wait for the report to arrive at your inbox (which, as noted, might not happen instantaneously).

Happy truth-hunting

I used the R method to get hold of the client’s data and helped him with looking at how much pain the page load caused his visitors – and his business. I could have used the latter method as well, but the data size cap would have caused us trouble – and the timing issues as well.

If you’re planning to do something similar yourself, either of the two methods would help you towards your goals – and once you have gone through the above steps, you’ll end up with a nice, fine grained data extract from your Adobe Analytics. A data extract, which is just waiting for you to apply your own advanced analysis, find interesting patterns in the data and predict future actions for each user on your website. Good luck hunting for your “truths”.

in Danish