Improve your Digital Marketing with R – An Introduction to your R toolkit

  ●   July 14, 2022 | Analytics
Written by
July 14, 2022 | Analytics

The automation of processes within the digital marketing universe is becoming increasingly popular. However, not every marketing process has specialised software designed to help you. Even if said software exists, you bet it’s going to cost a lot. 

While coding may seem slightly overwhelming at first, some very straightforward code can save you hours of work, enabling you to dedicate more time to more important tasks. From reporting to data visualisation, forecasting to machine learning, there are an abundance of use cases for coding in the digital marketing world. In this blog, I will cover some use cases for R in digital marketing, while walking through each line of code step-by-step. 

Why R instead of Python? 

This is not an easy question to answer. I like R because I think it has superior packages for data visualisation and statistical analysis. However, the main reason that I recommend using R is due to the collection of Google packages that exist, which brings us nicely on to the first use case of R in digital marketing…

  1. Google packages

Mark Edmondson has created a collective of cross-compatible packages that use googleAuthR for backend authentication. These packages allow you to connect and interact directly with tools such as Google Analytics, Big Query, Search Console, Tag Manager and Google Cloud Project. 

Google Analytics – 

The Google Analytics (GA) package is massively useful for marketers as it allows you to access your data outside of the platform. In this blog, I won’t go into too much detail on this package as we have already covered googleAnalyticsR in other blogs. See those on an introduction to googleAnalyticsR and using Google Trends with googleAnalyticsR for more information.

In many of the other use cases shown below, I will use GA data which has been accessed via the googleAnalyticsR package (i.e. see use case 2 to sum and group your GA sessions data). Here is how you can grab that data for your website – just set the View ID, date range and connect to your account.

set_view_ID <- #Enter GA View ID#
Date1 <- #Set start date in YYYY-MM-DD format#
Date2 <- #Set end date in YYYY-MM-DD format#

set_date_range <- c(date1, date2)
data <- google_analytics (set_view_ID,
                          date_range = set_date_range,
                          metrics = c("sessions","bounces"),
                          dimensions = c("date","deviceCategory","source"),
                          anti_sample = TRUE)

Search Console – 

The searchConsoleR package easily allows you to grab information on clicks, impressions and rankings for your site’s keywords. See the example code below on how to make your first query. Use case 3 shows how you can select Search Console data for a list of URLs using regexp.

sc_data <- 
  search_analytics("https://www.semetrical.com", 
                   "2021-01-01", "2021-12-31", 
                   c("query", "page"), 
                   dimensionFilterExp = c("device==DESKTOP","country==GBR"), 
                   searchType="web", rowLimit = 100)

Big Query – 

The bigQueryR package is extremely powerful as it lets you easily interact with the platform. I am currently writing a blog on how to upload and read data to and from BigQuery – keep an eye on my LinkedIn where I will share it soon.

  1. Group and Sum

With the dplyr package you can do a lot of easy data manipulation. The sum function allows you to find the total of a specific metric within your dataset. Suppose you are looking at daily sessions and want to find the total number of sessions across the dataset. 

The %>% symbols are essentially telling you that you are applying the function on the right to the variable on the left.

data %>% summarise(sum(sessions))

Now suppose you want to identify the sum for each device. This is where the group by function comes in. 

data %>% group_by(deviceCategory) %>% summarise(sum(sessions))

Other use cases for this would be calculating monthly totals. Notice how we have no ‘month’ column. We can easily create one with the substr function shown in the code below. Here we take the first 7 characters from the date. The 1 tells us we start from the first value and the 7 tells us to select the first 7 values.

data$monthYear <- substr(data$date, 1, 7)

Now we can group the dataset by not only the device category but also the month.

data %>% group_by(c(monthYear, deviceCategory) %>% summarise(sum(sessions))
  1. Regexp 

Regexp strings appear frequently in digital marketing. They are useful for filtering larger datasets. Suppose you have Search Console clicks and impressions data for every URL on your site, but you want to only look at a set of landing pages. In this case, I have added 2 other blogs on ‘digital marketing with R’ to the regexp string to see data for only them.

regexpString<- c("/using-google-analytics-api-with-r/|/using-ga-google-trends-api-with-r/")
filteredData <- sc_data[grepl(regexpString,sc_data$page),]
  1. Merge, rbind and cbind

With data coming from various sources, marketers are constantly having to merge data, whether via a VLOOKUP in Excel or JOIN functions in BigQuery. 

The example below shows you how you can merge GA and Search Console data on the landing pages to see the total sessions from GA compared to the organic clicks from SC, so that we have an indication of the importance of said search query to the landing pages’ traffic.

First, we group by landing page to sum the GA sessions. Then we create a new column in the SC dataset to replace any blanks with the homepage URL (to match the format of GA data for our merge).

Next, we simply apply the ‘merge’ function and dictate on what column we are joining. I recommend reading these programming notes on data reshaping for a full explanation into the different joins.

ga_landingPages <- data %>% group_by(landingPagePath) %>% summarise(sum(sessions)) 

sc_data$landingPagePath <- gsub("https://www.semetrical.com","",sc_data$page)

merged_data <- merge(sc_data, ga_landingPages, by = 'landingPagePath')
  1. Date formats

Nothing is more frustrating than a computer not recognising a date. The lubridate package does absolute wonders when applying date functions. When reading in a dataset via an API or via a csv stored on your machine, you should instantly apply a function to the date column to ensure it is considered a date, not a character or factor etc. 

Unfortunately, it seems like there are an infinite number of ways to display the date. The most commonly seen in digital format tends to be either yyyy-mm-dd or dd/mm/yyyy. Based on the dataset you are using and how it is currently displaying the date, try the relevant line of code below. R will then default to displaying your code in yyyy-mm-dd format. 

data$date <- as.Date(data$date, format = ‘%Y-%m-%d’)
data$date <- as.Date(data$date, format = ‘%d/%m/%Y’)

Summary

After covering various examples, I hope to have convinced you that there are a multitude of benefits to using R to improve your marketing efficiency. The example code should be enough to get you started but do take advantage of the documentation for all the packages used and online forums. Please feel free to get in touch with myself and the Analytics team if you have any questions on the content covered.

Our Blog