Creating Sparklines for Blog Post with R and Plumber

Aug 19, 2018 | 10 min read

One of my favorite R libraries is plumber, it has the abibilty to quickly turn any of my R code into an a REST API allowing for anyone to use my function functions or access my data in a restful manor. I wanted to explore some use cases in the digital analytics relm for creating my own API. One thing I have always been interested in was when a blog or new site would show a trendline or spark line of the number of readers or pageviews of an article. It is nice to see an organization being a transparent with some of the data they are collecting and surfacing it back to the users. As a consumer of content, it helps me understand the popularity of that peices and can help me protize where I should start reading. Another great example of when data is returned to the user is a site showing how many times a specfic lines has been highlighted or tweeted.

Being open with your site data puts you a bit of risk of exposing some of your company startegey but as a consumer of alot of articles, I wish I knew more about blog post to know how releivent they are and how popular they are so I can determine the value of the content. If you are writing good content, you should not be afraid of what giving up some of your data will do.

So in the this blog post I will be demostrating how to connent to the Google Analytics API using googleAnalyticsR written by Mark Edmondson. Then how to use the plumber library for Jeff Allen to explose some of the data from Google Analytics for each article on your website. I will be showing three examples of how to write the R functions that returns a JSON, a htmtwidget you can plug in to your code, or a images containing a visualization all with a REST API.

In order for you to programaticly access the data from Google Analytics, you need to set up a service account and create your own project in Google Developer Console. I could write a whole blog post on just setting up the authenication, but Mark Edmondson has done several great write ups on that. Here he will explain how to get a .json file from the Google Cloud Console (GCC) that will be used for the crediants. Make sure to add the email that is generated from the GCC as a user to your Google Analytics property.

The most important things is to make sure the authentication is working before you start up the API. You can test that by using googleAuthR::gar_auth_service() to read the credintial file and setting the scope. After that run a function to retrieve some data from the Google Analytics API, if your authication worked you will get some data, if not you will get a fairly descripe error message of what went wrong. Some common errors I ran into is setting up the scope correctly, that particular service account didn’t have access to my GA instance, or I didn’t give the GCC project the correct access to the APIs. Now we can move on to set up our API.

First we are going to start by loading in the libraries and reading in the token that you downloaded from the GCC.

library(googleAnalyticsR) #reading data from Google Analytics
library(googleAuthR) #authicating Google apps with service account
library(sparkline) #returning sparkline htmlwidget
library(lubridate) #work with dates easier
library(dplyr) #manipulate data easier
library(ggplot2) #static graphs

#authicating conection to API 
gar_auth_service("google-analytics-sparkline-fbe6d3a4f4c5.json",  
                 scope = c("https://www.googleapis.com/auth/analytics", 
                           "https://www.googleapis.com/auth/analytics.readonly")) #only giving this token the ability to read data

After loading in the libraries and setting up the authication, I wrote some helper functions that makes it easier to call the Google Analytics API and aggeragate / shape the data to make it easier for a sparklines to consume. Make sure to get your view id for the specific view you want data from.

#used to pull from specfic GA view
#you can pull the view ids from your account list using ga_account_list()$viewId
view_id <- "YOUR-VIEW-ID"

# used to reaggregate the daily data from daily to weekly or monthly. 
aggergate_by_time_unit <- function(google_analytics_data, time_unit) {
  google_analytics_data %>% 
    mutate(date = floor_date(date, unit = time_unit)) %>%
    group_by(date) %>%
    summarise(pageviews = sum(pageviews))
}

# returns the pageviews for specific page in a aloted time rage
retrieve_ga_data <- function(start_date, end_date, page_url, view_id) {
  google_analytics(viewId = view_id, 
                 date_range = c(start_date, end_date), 
                 dimensions = "date",
                 metrics = "pageviews",
                 filtersExpression = paste0("ga:pagePath==", page_url))
}

After setting up the authentication and helper function, we can now look atplumberand how we can use it to return data over a REST API.plumberhas great documenation and I would recommend through it. plumber makes it easy to turn your R code into a REST API that allow for any other application have access to what ever that R function is returning. It is able to return all kinda of data like json, images, and even HTML widgets. Similar to how roxygen2 turns comments into R documention, plumber uses special comments to create different parts of the API.

I am going to show three ways for you to return a spark lines to your web pages. Lets start with plumber returning just a JSON.

You can think of plumber as just a R function that is decorated with some additional paramaters that supplies input for the API. Our plumber API functions are going to call the Google Analytics API using the helper function, reshape the data by the given time unit and returning the data, htmlwidget, or image. I would recommend at least reading the begining of the plumber documention to understand the basics of the library.

The comments you see below consist of a description of the API endpoint, definition for each paramater, the name of the endpoint, and sometimes the seralizer used to tell what type of data you are returning with that endpoint. There are additonal special comments that modify the API endpoint further, those can be found here.

This endpoiunt specifily will return a JSON object of the data from Google Analytics for that specific page.

#' Returns JSON of pageviews of specifc page in Google Analytics by given measurement of time.
#' @param start_date date for the begining of timerange | format(YYYY-MM-DD)
#' @param end_date date for the end of timerange | format(YYYY-MM-DD)
#' @param page_url used as filter to return data for specific page url
#' @param aggergate_by aggerage data by day, week, month, or year
#' @get /sparklines-data
function(start_date, end_date, page_url, aggergate_by){
  data <- retrieve_ga_data(start_date, end_date, page_url, view_id = view_id)
  agg_data <- aggergate_by_time_unit(data, time_unit = aggergate_by)
  agg_data
}

Here is an example of the JSON that will be spit out. Plumber by default will run jsonlite::toJSON on the object you are trying to return. Since the Google Analytics API returns a data frame, the toJSON function will convert that dataframe into an endpoint like this.

One feature I am still exploring is how to get the data exactly like I would like it to. Converting a data.frame back into a JSON object comes with a few challenages and I need to learn more about how to effectively convert data.frames into an easy to use JSON file. This would be a great time for me to read up on Jenny Bryan’s talks and blog posts on list. I found this talk in particular ineresting for working with lists.

The great part about plumber is you can return almost any data type and what ever structure you want. So if you also wanted to inlude another piece of data about the traffic sources, you can create nested lisst that contain the pageview history and the traffic sources for that specific page. Building an API is as flexable as you want it to be.

As I stated earlier, you can also return HTML widgets with plumber. You will see all that it requres is the adding the #' @serializer htmlwidget comment, then wrapping the data in the sparkline::sparkline() function. This will return a self-contained html file with the already processed data.

#' Returns a htmlwidget of a sparkline for a specifc page in Google Analytics by given measurement of time.
#' @param start_date date for the begining of timerange | format(YYYY-MM-DD)
#' @param end_date date for the end of timerange | format(YYYY-MM-DD)
#' @param page_url used as filter to return data for specific page url
#' @param aggergate_by aggerage data by day, week, month, or year
#' @get /sparklines-html
#' @serializer htmlwidget
function(start_date, end_date, page_url, aggergate_by){
  data <- retrieve_ga_data(start_date, end_date, page_url, view_id = view_id)
  agg_data <- aggergate_by_time_unit(data, time_unit = aggergate_by)
  sparkline::sparkline(agg_data$pageviews)
}

Here is an example of that output.

There are several ways to serve this on a web page. One way is to render it in an iframe. Below is some sample code that does that. This code will make a call to your API and return an htmlwidget of what ever you submited in an iframe. Iframe are not the easiest thing to work with but is an option. If you are haivng issue with the iframe, I would recommend using retriving the JSON verison of this API and then use the corresponding javascript library to rebuild the visual.

<body>
      <iframe id="da-iframe" width="100%"></iframe>
      <script>
        var frame = document.getElementById("da-iframe")
        var page_url = document.pathPath
        var start_date = ...
        var end_date = ...
        iframe.src = "http://localhost:9999/cool-api-name"
      </script>
 </body>

If you do not want your visual to be interactive or want to render the data using javascript, you can also useplumberto return a image. Using the same template, you can have the final part of the API function a PNG. You can also get the width and height of the iamges with the #' @png comment.

#' Returns an image of a sparkline for a specifc page in Google Analytics by given measurement of time.
#' @param start_date date for the begining of timerange | format(YYYY-MM-DD)
#' @param end_date date for the end of timerange | format(YYYY-MM-DD)
#' @param page_url used as filter to return data for specific page url
#' @param aggergate_by aggerage data by day, week, month, or year
#' @get /sparklines-ggplot
#' @png (width = 140, height = 40)
function(start_date, end_date, page_url, aggergate_by){
  data <- retrieve_ga_data(start_date, end_date, page_url, view_id = view_id)
  agg_data <- aggergate_by_time_unit(data, time_unit = aggergate_by)
  plot_line <- agg_data %>%
    ggplot() +
    geom_line(aes(date, pageviews), stat = "identity") +
    theme_void()
  print(plot_line)
}

Similar to the htmlwidget example, you can use javascript to build the API call and then just make the src of an image tag the API call.

Now that you have a few endpoints built, you want to make sure your API is available to your site or application. Plumber has some prebuilt function that make it easy to deploy and test. I would recommend using the plumber::do_provision() to create a droplet on DigitalOcean, this will deploy a plumber application to a virtual machine and make it assesiable. By default plumber applications will only allow an application on the same server to make a call to the API. This has to do with how plumber handles CORS. To make it easier to prototype, I like to allow my API to be called from anywhere. You can set that up using a custom function demonstrated in the documention. Before you expose your API to a production instance, I would recommend researching how CORS work and ensure your application will be secure.

Here are a few things to think about too.

You can run any functions or libraries on your data before you return it with an API. An example would be running a forecasting function on your pageviews to see how they will perform in the future.
This example can be extended from any analytics system like Adobe Analytics or Snowplow.
You might want too obfucate your data to hide the actual values. One way you can do this is divide each row by the total sum of the range to get a percent of whole. This will give you the same sparkline as if you used the actual values.

Hopefully you found this blog post helpful and it will inspire you to liberate some of your data back to your users. This is an intro toplumberand there are almost unlimted number of use cases for turning your R code into an API. Let me know if you found this blog post helpful and what you have built with plumber.

Brett Kobold

Creating Sparklines for Blog Post with R and Plumber

Comments