How To Back Up Your Universal Analytics Data To BigQuery With R

  ●   September 26, 2023 | Analytics
Written by
September 26, 2023 | Analytics

Universal Analytics (UA) has finally sunset, and our data is now flowing freely into our Google Analytics 4 (GA4) properties. It may be tempting to never look at our UA setups again, however, before we leave UA behind, it is important that we store the data it has already processed, should we need to analyse it in future. For storing your data, we of course recommend BigQuery, Google’s data warehousing service, and in this blog we’re going to show you what data to back up from UA, and how to do it!

To download our data, we are going to make use of the Google Analytics API. We are going to write a script that will download the necessary data from UA and upload it to BigQuery, all at once. For this task, we highly recommend using R, as the googleAnalyticsR and bigQueryR packages make this job very simple, and we have written our tutorial for R for this reason!

This guide will not cover the more complex steps in getting authentication set up, such as downloading your credentials file. For information on that, and more information on how to upload data to BigQuery, check out our blog on uploading data to BigQuery from R and Python!

Backing Up Your UA Data With R

As usual for any R script, the first step is to load our libraries. For this script we will need the following:

            library(googleAuthR)
  library(googleAnalyticsR)
  library(bigQueryR)

If you haven’t used these libraries before, run install.packages(<PACKAGE NAME>)in the console to install them.

We will then need to get all of our various authorisations sorted. To do this, you’ll want to run the following code and follow any instructions that you are given:

googleAuthR::gar_cache_empty()
googleAuthR::gar_set_client(“C:\\Users\\Tom Brown\\Documents\\R\\APIs\\credentials.json”)
bqr_auth(email = “<your email here>”)
ga_id <- <YOUR GA VIEW ID HERE>


The ga_id can be found beneath the name of the view when you select it in UA, as shown below:

Next, we have to decide what data to actually take from UA. We recommend pulling the following:

Session-Scoped DimensionsEvent-Scoped DimensionsPageview-Scoped Dimensions
Client IDClient IDPage Path
TimestampTimestampTimestamp
Source / MediumEvent CategorySource / Medium
Device CategoryEvent ActionDevice Category
CampaignEvent LabelCampaign
Channel GroupingSource / MediumChannel Grouping
 Campaign 

Putting these into three tables in BigQuery should be sufficient for all of your potential future UA data needs. To pull this data from UA, first you’ll need to specify a date range. Go into the UA platform and have a look in one of your reports to see when data collection first began. Then, specify a date range that runs from then up to the day before you run your script, which is the last day that you’ll have a full 24 hours’ worth of data for (and if you are doing this after UA has been sunset, will include 100% of your available data anyway). Our data collection started in May 2017, so I wrote:

            dates <- c(“2017-05-01”, Sys.Date()-1)

Now, we need to specify what needs to be pulled from UA according to the table above. For this we will need to run the google_analytics() method three times, since you cannot query dimensions of different scopes together. You can copy the following code exactly:

            sessionspull <- google_analytics(ga_id,
                                  date_range = dates,
                              metrics = c(“sessions”),
                                  dimensions = c(“clientId”, “dateHourMinute”,
“sourceMedium”, “deviceCategory”, “campaign”, “channelGrouping”),
                                  anti_sample = TRUE)

eventspull <- google_analytics(ga_id,
                            date_range = dates,
                            metrics = c(“totalEvents”, “eventValue”),
                            dimensions = c(“clientId”, “dateHourMinute”, “eventCategory”, “eventAction”, “eventLabel”, “sourceMedium”, “campaign”),
                                anti_sample = TRUE)

pvpull <- google_analytics(ga_id,
                        date_range = dates,
                        metrics = c(“pageviews”),
                        dimensions = c(“pagePath”, “dateHourMinute”, “sourceMedium”, “deviceCategory”, “campaign”, “channelGrouping”),
                        anti_sample = TRUE)

This should neatly put all of your data into three data frames titled sessionspull for the session-scoped dimensions, eventspull for the event-scoped dimensions, and pvpull for the pageview-scoped dimensions.

We now need to upload the data to BigQuery, the code for which should look something like this, repeated three times for each dataframe:

bqr_upload_data(“<your project>”, “<your dataset>”, “<your table>”, <your dataframe>)

In my case, this means my code reads:

            bqr_upload_data(“my-project”, “test2”, “bloguploadRSess”, sessionspull)
  bqr_upload_data(“my-project”, “test2”, “bloguploadREvent”, eventspull)
  bqr_upload_data(“my-project”, “test2”, “bloguploadRpv”, pvpull)

Once this is all written, you can set your script to run, sit back, and relax! Once it is done you’ll be able to head over to BigQuery and you should see all of your data where it now belongs!

With your UA data safely tucked away for a rainy day, you can put your full focus on maximising the potential of your GA4 setup – and Semetrical is here to help with that! Check out our blog for more information on how to get the most out of your data. Or, for more support on all things analytics, check out our web analytics services to find out how we can help you.

Our Blog