Part of me is quite surprised I am writing a blog post about how to produce an analytics report in a PDF format, but as I talked more about the request with a client, I saw how this could be valuable. After learning how to create a PDF-based report, I discovered that documentation on this topic is quite sparse! So, I put this together for those who come across this same request.
 

Overview

First, here’s a high-level outline on how this report is created so you understand the architecture:

  1. Pull and clean data from a DWH, store in RDS based format (database for R) [not covered in this article].
  2. Import data and produce charts. Export charts and additional information into a clean PDF report.
  3. Deploy scripts onto a server and use cron to automatically run the report each week and email it out [will cover in a future article].

Without a doubt, the hardest part of this project was learning how to format a PDF. A quick google search enabled me to find a few parameters that I could pass through the YAML of my .Rmd file, but I needed so much more flexibility than this. What I ended up doing was creating my own .tex file (LaTex template) which was then used to create the PDF.
 

The Basic Knit to PDF

First, let’s start with the basics and work our way up from there. I used fake data in this post, but the layout is what I used for the project. If we run the following code, we get something like this:
 

---
title: "Weekly Business Report"
output:
pdf_document
---
```{r setup, include=FALSE}
#Load libraries
library(rmarkdown)
library(dplyr)
library(lubridate)
library(ggplot2)
library(scales)
```
``` {r, include=FALSE, echo=FALSE}
#Transform Data
gears <- mtcars %>%
group_by(gear) %>%
summarise(mpg = mean(mpg))
```
```{r, echo=FALSE}
ggplot(gears, aes(x=gear, y=mpg, fill=as.character(gear))) +
geom_bar(stat = "identity")+
theme_minimal()+
ggtitle("Plot 1")+
labs(fill="Gear")+
theme(legend.position = "bottom")
```


 
Now, the first thing you need to change is the ability to have multiple charts on one page. By editing some of the YAML parameters, you can make this happen. Update your header to look like this and you will see this new output:
 

---
title: "Weekly Business Report"
output:
 pdf_document:
 fig_width: 4
 fig_height: 3.75
---


Now you have to figure out how to edit the margins, as the margin on the left is pushing your charts too far over. You can edit this from the header as well, but might as well dive into the .tex file since that is more fun.

 

Creating a Custom Template for LaTeX

To start, you will need a standard template to act as your current file since you will be replacing the template that R uses with a custom one. You can copy the current template in your directory through the following command in R:
 

file.copy(system.file("rmd/latex/default-1.17.0.2.tex", package ="rmarkdown"), "template.tex")

 
After doing that, open up your .tex file and behold the glory! I don’t think .tex files are known for being super readable, but that’s ok šŸ™‚
 
In order to change the margins, we’re going to include a new package and also set the command at the same time. Include this command within the other usepackage commands to make the margins really small:
\usepackage[margin=0.2in]{geometry}
 
Your new .tex file should look something like this:
 

\documentclass[$if(fontsize)$$fontsize$,$endif$$if(lang)$$babel-lang$,$endif$$if(papersize)$$papersize$paper,$endif$$for(classoption)$$classoption$$sep$,$endfor$]{$documentclass$}
$if(fontfamily)$
\usepackage[$for(fontfamilyoptions)$$fontfamilyoptions$$sep$,$endfor$]{$fontfamily$}
$else$
\usepackage{lmodern}
$endif$
\usepackage[margin=0.2in]{geometry}
...

 
And now before we knit again, we have to tell R to use this new template. Change the header of your .Rmd file to look like this and then hit knit again.
 

---
title: "Weekly Business Report"
output:
 pdf_document:
  fig_width: 4
  fig_height: 3.75
  template: template.tex
---


Much better! Now that we have two charts next to each other, we want to create some summary numbers below these charts to add more clarity on how last week performed. This is where the fun begins!
 
I’m going to move a little faster now that we’ve got our grounding. Next, I’m going to write variables to store “Last Week’s” performance, as well as the ability to calculate Wow performance. Here is the final code for those changes:
 

---
title: "Weekly Business Report"
output:
pdf_document:
fig_width: 4
fig_height: 3.75
template: template.tex
---
```{r setup, include=FALSE}
#Load libraries
library(rmarkdown)
library(dplyr)
library(lubridate)
library(ggplot2)
library(scales)
this_week <- lubridate::week(today())
last_week <- lubridate::week(today()) - 1
two_weeks_ago <- lubridate::week(today()) - 2
this_year <- lubridate::year(today())
last_year <- lubridate::year(today()) - 1
this_full_week <- paste(this_year, this_week)
```
``` {r, include=FALSE, echo=FALSE}
visits <- read.csv("visits.csv")
visits$dt <- as.Date(visits$dt, "%m/%d/%y")
visits$fullWeek <- paste(visits$year_id, visits$week_id)
visitsW <- visits %>%
  filter(fullWeek != this_full_week) %>%
  group_by(year_id, week_id) %>%
  summarise(visits = sum(visits))
visits_LW <- visits %>%
filter(year_id == this_year, week_id == last_week) %>%
summarize(visits = sum(visits))
visits_Wow <- visits %>%
filter(year_id == this_year, week_id %in% c(last_week, two_weeks_ago)) %>%
group_by(year_id, week_id) %>%
summarise(visits = sum(visits))
```
```{r, echo=FALSE}
ggplot(visitsW, aes(x=week_id, y=visits, color=as.character(year_id), group=as.character(year_id))) +
geom_line(size=1.1)+
theme_minimal()+
ggtitle("Plot 1")+
labs(color="Year")+
scale_y_continuous(labels=comma)+
theme(legend.position = "bottom")
ggplot(visitsW, aes(x=week_id, y=visits, color=as.character(year_id), group=as.character(year_id))) +
geom_line(size=1.1)+
theme_minimal()+
ggtitle("Plot 1")+
labs(color="Year")+
scale_y_continuous(labels=comma)+
theme(legend.position = "bottom")
```
```{r, echo=FALSE}
paste("Last Week:", prettyNum(visits_LW, big.mark = ","))
paste("Wow:", round((visits_Wow$visits[visits_Wow$week_id == last_week] / visits_Wow$visits[visits_Wow$week_id == two_weeks_ago] -1) *100,2))
paste("Last Week:", prettyNum(visits_LW, big.mark = ","))
paste("Wow:", round((visits_Wow$visits[visits_Wow$week_id == last_week] / visits_Wow$visits[visits_Wow$week_id == two_weeks_ago] -1) *100,2))
```

 
When I run this, I get the following output:


 
Technically, you need the data to go in your dashboard, but it’s certainly not formatted the way it should be. The text doesn’t look good, the data for the second plot isn’t on the right side, etc. So, let’s study our template.tex file more to understand how we can change this.
 

Understanding Your LaTeX Template

If you are generally familiar with other languages, you will quickly see that your .tex file has a very similar structure to HTML. It provides the instructions for how your PDF gets printed, starting from the header all the way down to the actual content. So, where in your .tex file is the content about last week’s performance?
 
Toward the bottom of your file, you will see a line of code which says $body$. Just like in a modern web framework, this line of code is referencing another script and telling the PDF “insert everything from this other file here.” What is this other file you ask?
 
As you can probably guess, it’s your .Rmd file! In order to style this information as you need it, you will write LaTeX code within the .Rmd file, which will then get compiled through Pandoc and eventually end up in a pretty format in your PDF.

 

Writing LaTeX in Your Rmd File

Let’s start with something basic. You want your “Last Week” line to be in a bigger font, so just like in HTML when we say “H2,” you have to write the equivalent in this new language. In between your two code chunks, write the following:
 

scale_y_continuous(labels=comma)+
theme(legend.position = "bottom")
```
\huge Last Week:
```{r, echo=FALSE}
paste("Last Week:", prettyNum(visits_LW, big.mark = ","))
paste("Wow:", round((visits_Wow$visits[visits_Wow$week_id == last_week] / visits_Wow$visits[visits_Wow$week_id =

 
When you print this, you get the following:


 
 
This new line \huge enabled you to change the font size. This also trickled down to your next R chunk because you didn’t end that change, nor change it to anything else. But, you still have those annoying # marks as well as the number formatting. If you’re truly making this look good, you have to embed the r code within your LaTex. How do we do that? Check it out!
 
\huge Last Week: `r prettyNum(visits_LW, big.mark = ",")`
 
By using the `` sign and specifying the language you are using, you are able to put a dynamic variable within your static page. Next, make this change for the other lines and see what this looks like. Here’s your new code and output:
 

\huge Last Week: `r prettyNum(visits_LW, big.mark = ",")`
\Large Wow: `r round((visits_Wow$visits[visits_Wow$week_id == last_week] / visits_Wow$visits[visits_Wow$week_id == two_weeks_ago] -1) *100,2)`\%
\huge Last Week: `r prettyNum(visits_LW, big.mark = ",")`
\Large Wow: `r round((visits_Wow$visits[visits_Wow$week_id == last_week] / visits_Wow$visits[visits_Wow$week_id == two_weeks_ago] -1) *100,2)`\%



 
Much better! Your text looks a lot cleaner, but you still need to figure out how to move your second line to the other side. Let’s explore a new package and method to help you solve this.
 
In the header of your .tex file, include the following package: \usepackage{multicol}. Now, going back to your Rmd file, make the following additions to your code:
 

\begin{multicols}{2}
\huge Last Week: `r prettyNum(visits_LW, big.mark = ",")`
\newline\Large Wow: `r round((visits_Wow$visits[visits_Wow$week_id == last_week] / visits_Wow$visits[visits_Wow$week_id == two_weeks_ago] -1) *100,2)`
\vfill\null
\columnbreak
\huge Last Week: `r prettyNum(visits_LW, big.mark = ",")`
\newline\Large Wow: `r round((visits_Wow$visits[visits_Wow$week_id == last_week] / visits_Wow$visits[visits_Wow$week_id == two_weeks_ago] -1) *100,2)`
\end{multicols}

 
The command \begin{multicols}{2} specifies starting a new multicolumn section, specifically using 2 columns. The command \newline ensures your code ends up on a new line, and the commands:
\vfill\null
\columnbreak

ensure that the next section will end up in the second column. After writing this, knit your new .Rmd.
 

 
Boom! You’re really making progress. Besides the fake data, this should be looking like a decent dashboard. Now from here, you should have the ability to build out whatever you need to build. The basics of LaTeX are:

  1. Find the command needed for whatever you want to do.
  2. Ensure you load the package for said command.
  3. Insert this into either your .tex file or your .Rmd file.

However, I will include one more section in this article for fun, which is conditional formatting the Wow numbers so you can quickly understand Wow performance.
 

Conditional Formatting Using Basic Logic

I don’t think you can do conditional formatting straight in LaTeX. However, you can quickly write some R code to do the heavy lifting. Also, I’m sure there’s a much more efficient way to write this code, so if you can clean up my sloppy code please contact me!
 
First, here’s a quick if statement in R which will determine whether or not you want the color of the Wow to be red or green.
 

Color_visits <- if (visits_Wow$visits[visits_Wow$week_id == last_week] / visits_Wow$visits[visits_Wow$week_id == two_weeks_ago] -1 > 0) {"ForestGreen"} else {"red"}

 
Next, you need to figure out how to change the color of the text. This can be accomplished through the command \textcolor{red}{text to color}.
 
Now, instead of having a hard-coded value, you’re going to pass in your R variable to the spot that holds the color so that this will change based on weekly performance. Make sure to include the following package. After that, run your script and see your Wow data come to life!
 
\usepackage[dvipsnames]{xcolor}
 
Here’s the final update to our multiple columns:
 

\begin{multicols}{2}
\huge Last Week: `r prettyNum(visits_LW, big.mark = ",")`
\newline\Large Wow: \textcolor{`r Color_visits`}{`r round((visits_Wow$visits[visits_Wow$week_id == last_week] / visits_Wow$visits[visits_Wow$week_id == two_weeks_ago] -1) *100,2)`\%}
\vfill\null
\columnbreak
\huge Last Week: `r prettyNum(visits_LW, big.mark = ",")`
\newline\Large Wow: \textcolor{`r Color_visits`}{`r round((visits_Wow$visits[visits_Wow$week_id == last_week] / visits_Wow$visits[visits_Wow$week_id == two_weeks_ago] -1) *100,2)`\%}
\end{multicols}



 
Now that’s what I call a dashboard! I’m sure there are tons more you can do, but I’ll stop here. Hopefully this article was helpful. If you have questions, contact us, and we’ll help out wherever we can.

About the author

Jon Boone

Jon Boone
Jon is a digital analyst with an exuberant amount of passion for the digital analytics industry. He works with our analytics team to move clients out of reporting and into actionable insights. He believes in the power of measuring results and hopes to one day integrate data-driven philosophies with the potential of social entrepreneurship.



eBook: Understand Your Customers

Cognetik eBook: Guide to User Journey Analysis

 

Related Articles