Introduction
So far, we have learned how to import data in all kinds of format into R, convert it into a tibble that follow the first two tidy principles (one observation per row, one variable per column), and clean up messy strings with the stringr package. These steps are the 80 in the 80/20 rules of data science, which states that 80% of the time is spent preparing the data, and 20% is spent analyzing it.
We are embarking on our journal to draw insights from the data and write clear and compelling data stories. This chapter introduces Quarto, a suite of tools that will enable you to perform all the data processing AND produce beautiful reports in one simple yet powerful interface: RStudio.
Quarto documents
So far, we have been working with R scripts (.r files, like the ones we used for Labs 1 to 3). In R scripts, everything is interpreted by R as code unless we specify that we are writing a comment with #. This works fine, and there is no limit to what we can do with these kinds of R scripts regarding data processing and analysis. However, R scripts are not a great tool for making sharing our data stories with the world.
Quarto brings coding, writing, and formatting together by allowing you to write nicely formatted HTML, Word or PDF documents, with embedded chunks of code that are executed to process the data and create beautiful tables and figures for your document. It uses the markdown (https://en.wikipedia.org/wiki/Markdown) syntax. This course website is an example of the type of output that you can produce with Quarto. It’s a Quarto book.
There is a lot that you can do with Quarto that is beyond the scope of this course, but if you want to explore the possibilities, I recommend that you consult the Quarto website, which contains plenty of documentation.
Creating a Quarto document in RStudio
To create a new Quarto document (.qmd), you can click on file, then new file, then select the second option (Quarto Document…). Then RStudio will allow you to provide some information about the document you are creating.
You can select different types of outputs, like documents, presentations, and Shiny applications. In this course, we will use R Markdown to make documents. You can choose to output an HTML document, a PDF document, or a Word document. I recommend sticking with the default (HTML). Word is also a good option if you want to be able to modify the format or add text in Word. The PDF format is awesome but can sometimes be a little buggy as it requires additional packages and a LaTeX installation, so if you are struggling with PDF rendering, I recommend sticking to HTML documents and Word documents for this course.
When you create a new Quarto document, it comes with some content that can get you started with the components of the document and the syntax. This can give you a quick overview of Quarto and the markdown syntax.
Three components of a Quarto document
There are three components to a Quarto document: Metadata, text, and code. All components are optional. So you can write a R Markdown document without metadata, or include only code or text.
Text
Whatever you write after the metadata statement is text, which can be formatted using markdown syntax. Here is an example, with the code on the left and the HTML output on the right.
Code
What makes Quarto so great is that we can embed code chunks into our document. We have seen a lot of those code chunks throughout this website so far. They start with three backticks (`
) followed by curly brackets with the coding language used in the chunk and end with three backticks.
```{r}
some r code
```
We can also set options for the code chunks to tell R whether to display the code itself, its output, or both. #| echo: FALSE
tells R not to display the code in the published document. #| eval: FALSE
tells R not to execute the code (so the output of the code will not be displayed in the published document because there will be no output). Here is an example of each option. The code is on the left, and the HTML output is on the right.
Other useful code chunk options are #| message: FALSE
, #| warning: FALSE
, and #| error: FALSE
. Sometimes your code will produce error messages, warnings, or other messages. Typically, we do not want these to show in our documents, so setting these options to FALSE is t,he way to go.
Visual mode
The visual mode provides an interface and tools to make writing and formatting your Quarto document easier. Here is the same .qmd file edited in normal mode (left) and visual mode (right).
Rendering your document
When you are ready to produce your HTML, Word or PDF document, you click the Render button in RStudio. This will execute all the code and output the HTML, Word or PDF file in your working directory. You can also choose, in the settings, to preview the output in a popup window or the viewer pane (as in the screenshots above).
Making beautiful tables
A lot of times, we write statements that print tibbles as their output. So part of making beautiful documents is learning how to print our tibbles as nicely formatted tables.
Here are the key rules for table layout proposed by Wilke (2019):
- Do not use vertical lines.
- Do not use horizontal lines between data rows. (Horizontal lines as the separator between the title row and the first data row or as a frame for the entire table are fine.)
- Text columns should be left aligned.
- Number columns should be right-aligned and use the same number of decimal digits throughout.
- Columns containing single characters are centred.
- The header fields are aligned with their data, i.e., the heading for a text column will be left aligned, and the heading for a number column will be right aligned.
- Captions are placed above the table.
keeping it simple
My main advice with tables is to keep them clean, simple, and informative. Usually, simple tables are enough to convey the message efficiently. If you want to impress with fancy tables, this can certainly be done with kableExtra or some other table packages (an overview of some of the main packages is available on this website).
Summary
Quarto documents combine the data processing, analysis and reporting elements of the data science workflow in a single location. The visual tool brings the Quarto document writing experience closer to writing in Word or by hiding the code and letting you visualize the format of your document as you write them.
To get you used to Quarto, we will use it for the remaining course labs. You are also encouraged to use it for your project.
Wilke, C. 2019. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. First edition. Sebastopol, CA: O’Reilly Media.