Data sampling in Google Analytics may sometimes lead to inaccurate reports, impacting the quality of your insights. We’ll show you two approaches that will fix sampled data in your Google Analytics reports, even when you’re using the free version of Analytics.
Long time period + lots of dimensions = sampled data
If we have a situation where we want to report over a long time period for different dimensions, we normally run into sampled data. For some cases, e.g. when you look at conversion data, this will give you just wrong numbers and lead to wrong actions. For that reason, we highly recommend to always look at the “containsSampledData” field in the API Response if using the reporting API for fetching data. However, which workarounds do we have if we get sampled data?
Use partitions over time to avoid sampling
Let’s say we want to look at a full year of reporting data and our API request is showing us that we have sampled data in the response. The solution is to make more requests with smaller time periods and sum up all results in the end. In our example, we would split the single request to 365 requests, one per day, and sum up the results. Job done!
In the past, you had to write also the clientID to a custom dimension, but it’s now available in reports by using “ga:clientId”. Currently the API documentation doesn’t tell you about this.
Expert tip for the Google Analytics free version: daily loading strategy for accessing raw data in big query
We have already learned that we get unsampled data when we use daily partitions. However, when using the API, we still have the limitation of getting only seven different dimensions per report. To get real raw data and total dimensions, we need to use another tweak. We split the requests for all the dimensions we need. For each query, we need to add the ClientID and a custom dimension with the timestamp of the hit. With these two keys, we can join the different tables in BigQuery and have raw data for all dimensions. Pretty cool, isn’t it?
Hint: In the past, you also had to write the ClientID in a custom dimension. However, now it’s available in reports with “ga:clientId”. Currently, the API documentation doesn’t tell you about this.
This approach isThis approach is good for power users with many reports over time and different dimensions.
If you don’t have developer background and still want unsampled data without doing all the stuff manually, feel free to get in touch with us.
Key takeaways
???? Always look at the “containsSampledData” field in the API Response if using the reporting API for fetching data.
???? You can access raw data in BigQuery in Analytics 360 but when query statements become tricky on nested fields, it’s more convenient to use the API.