How to create a report/article in Quarto
Workshop organized by VVSOR
1 Introduction
Quarto is a tool that weaves together text, code, tables, graphics and bibliographic content: it allows to produce dynamic documents (i.e., documents that update as soon as something is altered in the underlying data and/or code). It is an example of literate programming1. The code is not limited to R, it can also be Python, Julia, Observable, but only R is discussed in this workshop. In other words, Quarto is more generic and not language-specific, whereas its predecessor, RMarkdown, is an R package (note that Quarto is not an R package, it is a command-line-interface but RStudio takes care of all that). Quarto is easier to use than RMarkdown (i.e., same functionality with less code); it combines code from earlier packages in one tool. Quarto can be used seamlessly with RStudio (also from the command line interface or other IDEs but that will not be discussed here)2. To learn more about Quarto see https://quarto.org. The website offers some excellent tutorials. RStudio published a workshop on Getting Started with Quarto: Get Started with Quarto - rstudio::conf 2022 Workshop (rstudio-conf-2022.github.io). There are also several websites available, see for instance this tutorial. With Quarto3 you can produce a multitude of outputs from the same source file: html, pdf, Word docx, presentations (powerpoint, revealjs), blogs. Here we limit ourselves to html, pdf and docx. Of course, this reader is also written in Quarto!
In the following, we will focus on Quarto but almost everything we describe in this reader can also be done in RMarkdown, except from some new features in Quarto (but there will be dedicated packages to do the same thing in RMarkdown). If you are new to RMarkdown, our advice is to use Quarto rather than RMarkdown. If you are already a little bit familiar with RMarkdown, you will recognize that many things are the same in Quarto, except for some new interesting and relevant features.
1.1 Structure and learning goal of the workshop
This workshop in three parts is meant for students, postdocs and scientists who are interested in using Quarto (or RMarkdown) in their study and/or scientific work. It is a perfect tool to publish reports, papers, a thesis, presentations, and even websites/blogs.
The structure of the workshop is that you will first learn about the basic anatomy of Quarto documents. Then you will apply this to build up a scientific report step by step. We will provide you with instructions for each new step and you will then practice with an exercise. At the end of the first part of the workshop, the result will be a small report containing all the basic elements of a scientific report written in Quarto. In between the first and the second part of the course, you can do some homework by writing two more small exercise reports, either with your own data or with data supplied by us. The second day of the workshop will focus on how papers can be combined in a booklet (like a thesis, for instance). In addition, some more advanced Quarto aspects will be discussed.
The learning goal is that, after this workshop, participants are knowledgeable about the basic principles of Quarto and how to apply this to their own work when writing scientific reports and compile those in a (thesis-like) booklet.
1.2 Prerequisites
We assume that you have basic knowledge of how to work in R and RStudio, i.e., how to install and load libraries, how to view data, how to handle dataframes and how to perform basic things like reporting summaries and graphs of simple regression analysis. Some knowledge of the tidyverse ecosystem (Tidyverse packages) will also be very helpful; for a styleguide in the tidyverse see Welcome | The tidyverse style guide. There is even a R package called styler
that makes code tidy: Non-Invasive Pretty Printing of R Code • styler (r-lib.org). Though this course is not about the tidyverse, some elements of it are used in the exercises. Using the tidyverse will make your scientific life much easier and fits in the Open Science approach. It is about how to store, analyze and process data in a tidy way. It is also very useful for visualization with ggplot, which we will use here. Tidy data are not only useful for R users but also for excel users, for example, see Broman and Woo (2018). In short, tidy data represent a data matrix in which each column represents a single measurement and each row a single object on which the measurement was made.
R packages are continuously updated and functions in those packages may change. Something that works well at one point may suddenly not work well anymore when packages are updated. Though there may be warnings about this (“function x will be deprecated in version x.x.x”) it can be quite frustrating. There is a package that keeps track of all the R packages that are used in a certain file renv so that you can go back to earlier versions if so desired.
On 24 January 2024, a new version of Quarto (1.4) is released with some very nice new features. These are not yet incorporated in this workshop. See Quarto 1.4 for more information.
2 Part 1 of the Quarto workshop
2.1 Anatomy of a Quarto/RMarkdown document
A Quarto/RMarkdown document (which is a plain text file) has the following basic elements:
A YAML. This piece is the start of your document and contains meta-data for the whole of your document. It tells the software basic things like font size, graphics details, type of output desired. There are default settings but you can override them.
Text, i.e., the story you want to tell. For scientific reports this will be the usual format: Introduction, Materials and Methods, Results and Discussion and References. For books and theses chapters are the usual format structure. Of course, Quarto can handle both formats.
Code for data wrangling, calculations, tables and figures. There are two options to do something based upon data: inline code in the text, and separate code chunks. Sometimes it may be needed to insert figures that are not produced by R, which is why the option exists that Figures are imported from outside Quarto. While RMarkdown was basically developed for use with R, Quarto can now also be used with other programming languages such as Python and Julia. Which language to choose is up to the user, of course. In this course, we only use R.
Code to insert equations. Equations can be added to the text and these are coded in Latex language.
References. These will be automatically inserted once the references you want have been indicated by you and are present in a so-called bibtex file. It can be done using various styles; these are defined by so-called ‘css’ files. However, with Quarto you can also refer directly to a DOI (digital object identifier) and once the reference has been found it will be added to your bib file.
The real strength of Quarto (and RMarkdown) is that it integrates all these parts. For instance, suppose you have made a draft report, and then you discover an error in a calculation somewhere, or some data have changed, or some additional data have become available. Once you have incorporated such changes, they will be automatically taken into account in all other related calculations, graphs and Tables. This is Open Science in optima forma! And if you use Git/GitHub you can actually also trace this back in history so that you yourself and your supervisors can see what has changed.
2.2 The YAML
The YAML (acronym for Yet Another Markup Language, or tongue-in-cheek: Yaml Ain’t Markup Language) is where it all begins. Whatever the acronym, it is an essential part with which every Quarto document starts. It is separated from the main text by three dashes at the top and at the end. The YAML tells the Quarto software what to do in general terms with the content. You can specify options further into sub-options; note that the colon and spacing matter: see Figure 1.
You can indicate the title, description of the document, date (today, last-modified, a fixed date), authors of the document, abstract, font size, type of desired output, table of contents, and many more things that you will learn in due course with exercises in this course.
2.3 The Text
Markdown as such is a language, elements of which are used within Quarto. A Quarto file (extension .qmd) containing R code is processed by rendering (Quarto) or knitting (RMarkdown) via the R package knitr
into plain markdown text (files that are then produced have the extension .md; qmd files containing Python code use jupyter
instead of knitr
). A markdown file is then the basis for other document types like pdf, word, html; this processing is done by pandoc (a universal document conversion tool) in the background: see Figure 2. Normally, you do not notice this (except perhaps for some intermediate messages in the background) but it may be handy to know a little bit of what goes on behind the scenes (not further discussed here).
Quarto/RMarkdown/RStudio can be used like text editors. You can write text as you would do in Word or any other word processor with more or less the same functionality. There is also a spelling checker. Nowadays, you can use RStudio in two modes, the ‘visual editor mode’ and the ‘source mode’, see Figure 3.
Visual mode supports editing of all pandoc markdown features. Regardless of the mode you use, the result is the same for the output but the visual editor mode is probably easier to use than the source mode; the source mode requires more coding (you can toggle between the two modes without any problem, so you can choose your preference). There are shortcuts and clickable buttons in visual mode to achieve special formatting effects (see Figure 4), these are not available in Source mode, there you need to type code.
Some possible features to modify your text are (to be practiced in the exercises):
Bold text
Italic text
- Bullets (unordered list)
- Numbered lists (ordered list)
(Lists can be with normal spacing (default) or tight where tight means: less vertical spacing between items, see Quarto - Content Editing for details).
Underlined text
Links in text are indicated by: <https://quarto.org>, or: [Quarto](https://quarto.org); links will be recognized by a different font color.
superscripts need to be enclosed between ^ on each side: superscript
subscripts need to be enclosed between ~ on each side: subscript
strikethrough need to be enclosed between two ~ ~ on each side: strikethrough
Headings in various sizes, to be chosen via the button Normal in Visual mode, or by using the hashtag (#, ##, ###) in Source mode.
A hard pagebreak can be achieved with a so-called shortcode:
{{ <pagebreak> }}
If you want a linebreak between sentences, finish a sentence with two spaces and hit return.
Mathematical and greek symbols can be taken up in the text following LaTex code embedded with two $ signs. See Figure 5 for some examples.
You may have noticed that some special characters like * are used to produce italics and bold text. If you want to print those characters as they are, you need to ‘escape’ them by putting a backslash (\) before them.
In Visual mode, you can insert many things in the file by clicking on the button ‘Insert’, including hard line breaks, non-breaking spaces and special unicode characters, see Figure 6.
Sections with headings can be automatically numbered (to be discussed later).
Everything you do in Visual mode can be done also in source mode but that requires more coding. If you are in the Visual editor mode, you can switch to the Source mode to see how exactly the code then looks like. Furthermore, it is possible to insert block quotes and call-outs, Figures and Photo’s (if you use links, think about copyright!). An example of a block-quote is:
Every student and scientist should use Quarto!
Two examples of a call-out:
Of course, also clickable links to websites can be given where the format is: [name](address of the website).
It is possible to divide the page into text columns, where you can determine the width of each column. A new feature in Quarto is that you can also make remarks on the side (and also figure and table captions can be placed there if so desired).
A very handy tool when using Quarto in RStudio is to type the slash / (if you are at the beginning of a line, or control-slash anywhere in a line) and you will be presented with a lot of options to choose from for formatting, inserting code chunks, equations, call-outs, emoji’s 😀, to name a few.
Basic elements of the language markdown can be found in the following link Quarto-guide
2.4 Data handling, tables, figures
2.4.1 Quarto projects
An essential element of Open Science is that you work in a structured way, not only for potential users of your work but also for future-you. This implies that you should organize your files in directories and subdirectories so that they become portable. RStudio makes this very easy for you if you decide to work in projects. Our advice: make a new project for every new activity that you start. If you do so: RStudio will present you with the option to make a new directory, or to use an existing directory, it will then write a small file with the extension .RProj. Make further subdirectories in this project directory, for instance, a subdirectory data, scripts, figures, tables. And very importantly, do not point at those directories with absolute computer addresses, this will only work at your own computer (and even then, not anymore when you obtain a new computer).
here
There is a very nice R Package called here
that takes all sub-directories relative from the main project directory in which the file .RProj is present. Never use commands such as getwd() or setwd() in your code, that is not transparent! (If you do and it is discovered, Jenny Bryan from RStudio will come and set your computer on fire, so be warned!).
A practical exercise is now started by constructing a report using an existing real data set on penguins. Here we use it merely as a vehicle to illustrate some aspects of Quarto that you need to write a scientific report. For the purpose of the exercise there is no need to dive into the background of the penguin research.
An exercise report has already been produced to show you the type of end product we are aiming for; take a look at ‘penguin_paper.pdf’. However, we are going to build it up from scratch together with you as a practical exercise (our exercise paper is only meant as an example, you do not need to rebuild it exactly like this, you can add your own flavour to it). So, to begin the production of your exercise article: start with defining an R Project.
Make a new R project for every new activity you are going to undertake. It is really a sign of good project management and open science, not only for your peers but also for yourself to keep track of what you have been doing in the past!
Exercise 1: Setting up a Quarto project to produce a report
It is assumed that you have installed the newest version of Rstudio (currently v2023.06.1) which includes Quarto; with older versions of RStudio, you need to install Quarto from https://Quarto.org (always use the newest version!). Open RStudio and start by creating a new project in Quarto/RMarkdown from the RStudio menu:
File -> New Project
In a popup menu (Figure 7) select new directory (you can also use an existing directory, but start with a New Directory for this exercise).
In the next menu, select Quarto Project (there are lots of other options) and type a Project Name in the next popup screen, you also have the option to create it as a sub directory of another folder). RStudio will consider this directory as the root directory for this project. Quarto returns a Quarto document with the same name as the project with a very minimal YAML: see Figure 8.
If you Choose Existing Directory, Quarto will use that directory as its root directory for the chosen project. To create a Quarto document, type:
File -> New File -> Quarto Document
A pop-up screen appears as in Figure 9.
Check the html button, write your name in the Author field, click on Create and a document will appear as in the left panel of Figure 10 if ‘Use visual markdown editor’ remains checked (default); if that button is not checked, the right panel of Figure 10 will appear (source mode). You can toggle between source and editor mode anytime.
Modify the default YAML as indicated below (the colon followed by a white space is essential!):
---
title: Document title
author: your name
date: last-modified
affiliations: your affiliation
format: html
editor: visual
---
Note that the yaml keys and values are all in lowercase!
Give the document a suitable title and insert your own name as author if you didn’t do that yet and delete the text below the YAML. You can modify and expand the YAML any time. Give the file a name and save it from the RStudio menu:
File -> Save As..
Do not use a period (.) nor a space in the filename, use an underscore (_) or a hyphen (-) if you want to separate words in the filename. Quarto will give the .qmd extension to the file (in RMarkdown it will be .Rmd).
You can change format: html into format: pdf in the yaml if you want pdf output instead of html, or to docx if you want a Word file as output (there are many more options, actually). If you want to produce a pdf, please note that you have to install latex (tinytex is recommended) on your computer if you did not do that already (you only need to do this once).
When you stop working on your project and you have saved your file, close the project:
File -> Close Project
Whenever you are ready to resume working on it, open first the project again from the RStudio menu:
File -> Open Project..
(Or click on Open Project in the upper right corner of the RStudio menu). You can then continue working on it because RStudio will have opened the project in the right directory. Using the R package here
will automatically take this directory as the root directory for the project, and absolute referencing for your specific file directories on your computer is not needed anymore (which would be very much against open science!).
If you want to open a new document in the same project, then go to:
File -> New File -> Quarto Document…
and follow the instructions given above. You can make as many Quarto documents as you want in the same project (but if it is about a different topic, make a new project. Do not clutter different topics in one project!)
End of exercise 1
The project you created, with a .qmd file in it, is now ready for further processing. After having prepared the project in the previous exercise, the next exercise 2 is to do some formatting. Exercise 2 is divided into several different formatting topics.
For scientific output you probably will use pdf in most cases (or perhaps Word, as sometimes is required by colleagues/supervisors and/or journals). However, in the writing phase you may want to use html output at first. It is less sensitive to small mistakes (the pdf engine can be rather picky and comes with rather vague error messages) and html is much faster in producing output. Once you are happy with the output, switch to pdf output.
Exercise 2a: adding and formatting text to the exercise paper
Resume working on the project saved in the previous exercise. Imagine for this assignment that you are going to write a scientific report on penguins using the data that are supplied. This exercise is to write a setup for such a report. Imagine that you are the one who did the measurements and you need to report on it. The format could be that of a standard scientific publication, i.e., Abstract, Introduction, Materials and Methods, Results and Discussion, Acknowledgments, References. So, define headings accordingly. Reproduce roughly the following text.
Abstract
This exercise is based upon the palmerpenguins
R package. The exercise contains some of these data in the form of Tables and Figures and performs some data analysis using linear regression and ANOVA.
Introduction
Ecological sexual dimorphism was examined among adult Pygoscelis penguins. Variation in 𝛅13C and 𝛅15N SI signatures of blood tissue was investigated.
Materials and Methods
Ethics statement
Research was conducted in accordance with an Antarctic Conservation Act permit to WRF (2008-020), in addition to the Canadian Committee on Animal Care guidelines. Data management was done accordingly4.
Field methods
Field research was conducted on Pygoscelis penguins nesting on several islands within the Palmer Archipelago west of the AP near Anvers Island. The reduced sample size for chinstraps was due to the overall smaller number of individuals breeding at rookeries on Dream Island.
Statistical methods
Least-squares general linear models were used to examine continuous variation in 𝛅13C and 𝛅15N SI signatures of adult penguin RBCs in relation to three parameters treated as main effects.
Data
The data were obtained from the R package palmerpenguins. Incomplete data reported as ‘NA’ were removed before analysis.
The palmerpenguin data are fantastic to teach Quarto thanks to Allison Horst!
Results and Discussion
The results are divided in several parts:
general impression of the data
results presented in tables
results presented in graphs
Furthermore, the statistical analysis consists of:
Linear regression
Anova
References
Switch between html and pdf in the format YAML field to see the difference in output. Do not forget to save your file once in a while while you are working on it!
If you want pdf as output you need to install tinytex.
Go to the terminal mode in RStudio (click Tools > Terminal) and then type on the command line:
quarto install tool tinytex.
(pdf’s are produced from LaTeX with the xelatex engine). Be aware that installation takes quite some time because a lot of files will be downloaded to your computer. Fortunately, you only have to do this once.
If you want to share a html file with others, you will have to produce a portable html file (otherwise it will not be complete with figures and tables). Also, it may be helpful for others to see the code you have used in your Quarto document. Click on the following callout to see how to do that.
During your project development, it can be useful to produce a html file instead of a pdf for several reasons:
a html file is less sensitive to small errors in the code when producing output. Especially when developing draft papers, it can be easier to share a html file with the supervising team than a pdf.
team members can share comments in a html file (see below)
you can make tables and figures interactive in a html file, not in a pdf
WARNING: if you send a html file to someone else, you need to make sure that graphs are included (otherwise a html file will only include figures on your own computer). Include the following in the YAML:
---
format:
html:
self-contained: true
---
- code behind calculations and graphics is hidden by default (unless you specify the option echo: true) but can be made visible by the reader if so desired. If you want to offer that option to your reader, then the instruction in the YAML needs to be:
---
format:
html:
code-fold: true
code-summary: "Show the code"
code-overflow: wrap
---
If you want to see these options in action, open this very reader in the html mode. There are many more code block options in html, see Quarto - HTML Code Blocks
End of exercise 2a
2.4.2 Pandoc Divs
Divs are special entities (originating from html language) that allow you to apply identifiers and styles to a block of a document. In Quarto, Divs start and end with at least three colons (:::), more colons can be used if you want to nest them. These are so-called ‘fenced Divs’ from Pandoc, with the fences consisting of the colons mentioned at the beginning and at the end. The content inside the fences will be subject to something special that is defined by a so-called class, and these classes are defined by a css file (css = cascading style sheets). (In html language this is defined as <div> but the Pandoc version is not limited to html and can also be used for pdf and Word output.)
Exercise 2b: css files and Divs
You can produce css files with any text-editor, including RStudio, but be careful to create it as a text file (.txt), see Figure 11.
To actually write a css file, use the syntax as shown in Figure 12.
Once it has been written, store the file with a css extension, see Figure 13. You can store a css file in the project root directory but it may be better to create a separate directory, for instance called css, to keep the overview.
Once stored, you can start using it. Two steps are necessary for that: i) the css file should be called from the YAML from the directory where it is stored, ii) you have to produce a fenced Div (in the text, not in code chunks that will be discussed hereafter) to be able to use what is written in the css file. See Figure 14 for adjusting the yaml.
To place the Div itself in the text, there are two options. In the Visual mode, you can insert a Div via the Insert menu and a screen as in Figure 15 will appear. In Source mode you have to type the Div yourself in the text as indicated in Figure 16.
Upon rendering, the content inside this Div will now be printed as big text with font size 120 px in this example: see Figure 17. Of course, this is just a silly example, but the use of Divs opens up many opportunities to use special effects for certain content and it is relatively simple to apply once you know how to create css files. You could change in this way font color, background color, font-size, and much more. Css files can also contain more than one command. See the section on CSS styles in this Quarto guide.
Use this Div on some text in your exercise doc and render the doc to see the effect.
End of exercise 2b
Callouts
Quarto has already quite a few of these fenced Divs available so that yo do not have to define them yourself. Call-outs, for instance, are actually Divs. There are five different callouts built in Quarto:
note {.callout-note}
warning {.callout-warning}
important {.callout-important}
tip {.callout-tip}
caution {.callout-caution}
Note that there are five types of callouts, including: note
, warning
, important
, tip
, and caution
.
This is an example of a callout with a title.
This is an example of a ‘folded’ caution callout that can be expanded by the user. You can use collapse="true"
to collapse it by default or collapse="false"
to make a collapsible callout that is expanded by default.
Exercise 2c: Produce a callout in your exercise qmd file
The easiest way is in the Visual mode: Insert -> Callout and a screen as in Figure 18 will appear; you can give the callout a name in the Caption field (in source mode: start with three colons (:::), followed by the type of callout between curly brackets (e.g., {.callout-note}), type your text and close with three colons. A title for the callout is achieved by supplying two ##). If you start in Visual mode to insert a Div, you can then switch to Source mode to see the actual code.
Produce the following callout somewhere in your exercise doc:
This data set is about measurements done on three penguin species living on various islands in the Palmer Archipelago in Antarctica, data like body weight, bill- and flipper length, species type, island they lived on. Quite a few internet tutorials use this nowadays, so you can practice with it further if you like. See this website for more information.
Render the text to see the effect.
And note that it is possible to collapse the callout so that the reader can open it at his/her own wish by adding within the curly brackets the code: collapse=true. Try this as well.
End of exercise 2c
2.4.3 Pandoc Spans
Similarly as for Divs, Spans can be used for a selected in-line text (e.g., to give a word or a line a specific color). For instance, if you want to color a word or a sentence, use spans by surrounding the word or sentence by [ ] and write the styles between { }. To color a word red for instance, type {style=“color: red”}. An example: The color of this word is red. To highlight a sentence, type {style=“background-color: yellow”}. An example: This line has a yellow background to highlight it.
If you are working in Visual mode, there is a simple way to achieve this: select the text, and then use the command (see Figure 19):
Format -> Span
In the next window (Figure 20) you can then select what you want to do, for instance, to change the color of a word or a sentence, depending on what you have selected. As described above, you can also introduce own styles by defining a CSS class.
For a handy tool, see the part on Pandoc Divs and Spans in the blog Automate insertion of Pandoc Divs and Spans with snippets if you want to make frequent use of Divs and Spans. An more detailed explanation of the syntax of Divs and Spans can be found in this blog.
Exercise 2d: use of Span
Type a sentence in your exercise file and color a word red by using the Span option. Choose or type another text part and highlight the text using the Span option. You can also underline text with a Span.
End of Exercise 2d
Text in columns
It is possible to format your text in two (or more) columns. This is also achieved with fenced Divs, in this case with nested fenced Divs. In the source mode, start with 4 colons, followed by the word columns to indicated that what follows is about columns. Then continue with 3 colons followed by {.column}, enter your text, close with three colons, and continue with the second column by repeating this. Close the Div with 4 colons. In this way, the layout for the two columns are nested within the overall column Div. See this Quarto tutorial for more options.
Exercise 2e: text in columns
Open your exercise doc and write some text that is separated in two columns. Render the document to see the result.
You can also play with the width of the column by adding additional commands in the curly brackets, for instance, {column width=“40%”}, or play with the font size, for instance, {.column font-size = “60%”}. You can also combine such additional commands within {}
End of exercise 2e
Tabsets
When you are producing html documents, you can organize information in different clickable tabs. The user can then choose to show one or another. This is obviously only possible in html, not in pdf.
Exercise 2f: tabsets
Open your exercise doc and write some text that is separated in two tabs. Render the document to see the result.
End of exercise 2f
This ends the exercises on formatting text. There are many more possibilities, see the Quarto website for more options. For this workshop, we will now continue with code chunks, the way to weave code and text together.
2.4.4 Code chunks and inline code
Code chunks
The one option to weave R code into the text is to add code chunks, which can be interspersed throughout the text. Code chunks are used to do calculations, data wrangling, or to prepare graphs and tables. The output of the chunk may, or may not, be put inside the text (such as a graph or a table). A chunk can be labeled so that you can refer to it. In the case that a graph or table will be output, a caption can be supplied in the chunk. A chunk can be inserted via the RStudio menu
Insert -> Code Chunk -> R
or via the shortcut (Ctrl-Alt-i, in which case {r} is put between the curly brackets (can be changed into Python or Julia) or you can insert code chunks manually by enclosing the code in between three backticks at the beginning (followed by {r}) and three backticks at the end. The by far easiest way to insert code chunks in Quarto in RStudio is to type the slash / at the beginning of a new line, which will give you a lot of options to choose from, including code chunks. Always give a short name to a code chunk (syntax: #| label: name; the #| is called the hash pipe). Think of a name that gives a hint what the code chunk is supposed to do (note the spacing after the colon!). That way you will be able to find it back, which is also handy if errors occurring in a code chunk are reported by the software (and errors will appear for sure!). Valid chunk names to be used with the chunk option label are, for instance:
#| label: chunkname
#| label: mychunkname
#| label: my-chunkname
#| label: mychunk1
Names that cannot be used:
#| label: my_chunkname (do not use underscores ( _ ) in a label, use dashes (-) to separate words, do not use special characters)
#| label: my chunk name (do not use spaces in a label)
A reserved chunk name is ‘setup’, if used it will automatically run before any other code. Chunk names must be unique (i.e., duplicate names are not allowed).
The most important options to control chunks are (default options are true, if you do not want these defaults you need to make them false):
eval: false (prevents code from being evaluated)
include: false (runs the code but does not show code nor results)
echo: false (runs the code, shows the results but does not show the code)
message: false (prevents messages to appear)
warning: false (prevents printing of warnings)
results: false (hides printed output)
If you want code chunk options to be valid for every code chunk in a document, this can be achieved via the YAML with the command ‘execute’ (next to other YAML commands that are not mentioned in this example; note again the indentation and the spacing after the colon!):
---
execute:
echo: false
warning: false
message: false
---
Code chunk options for a specific chunk can override global options mentioned in the YAML. There are many code chunk options, it could look like this:
```{r}
#| label: chunkname
#| warning: true
#| message: false
```
See for a list of possible options: Quarto - Code Cells: Knitr.
A code chunk that will be always needed is one that tells Quarto which packages need to be loaded in R. This depends obviously on the type of analysis you want to do. Some packages you will probably always need, such as the tidyverse
and the package here
and for the exercise you could load the package palmerpenguins
:
```{r}
#| label: libraries
#| include: false
library(tidyverse)
library(here)
library(palmerpenguins)
```
Another chunk you will need is one that will load the data. Always keep your raw data separate in a subdirectory, for instance in csv format or excel format, and import them into your Quarto file to do further data wrangling if needed. Leave your raw data untouched!
```{r}
#| label: data
= read.csv(here("data","filename.csv"))
data_file1 = read.excel(here("data","filename.xlsx"))
data_file2 ```
(you can use any name for the file, it does not need to be data_file1).
Exercise 3: insert code chunks
At this stage, you may decide to add some other subfolders to the project, such as the subfolders ‘data’, ‘scripts’, ‘figures’. You can do that inside RStudio (in the Files panel -> New Folder) or outside, for instance by using Windows’ File Explorer. Once you have done that, use the package here
to refer to files in these subdirectories. For instance, if you have a .csv file with data in the subdirectory data, and a .png file as a figure in the subdirectory figure, it could look like this:
<- read.csv(here("data","file_name.csv"))
file_name <- knitr::include_graphics(here("figures","figure.png")) graph
The package here
takes the directory in which you have defined your Quarto project as the root directory, the advantage being that you do not have to specify absolute computer addresses on the computer. This would make it very irreproducible for others to use your file.
Go to your exercise report and insert a code chunk to load the libraries here
, tidyverse
(you may need to install them first using install.packages() from the R console). Also add a code chunk to load the penguin data. Label the code chunks with a name. For transparency, insert such general code chunks at the beginning of your file.
End of exercise 3
Inline code
The other option to weave R into your document is to insert inline code in the text. It relieves you from the error-prone task to copy and paste data, or parameters derived from your data, into the text (for instance, to write a mean or a standard deviation manually). The big advantage is that, if for whatever reason data change, you do not need to copy new values all over again, which could potentially be a big source of errors. The way to insert R code is to start with a backtick `, followed by the letter r, then the code you need to calculate something, followed by another backtick `:
`r mean(penguins$body_mass_g)`
All kinds of expressions can be used in the inline code, for instance, if you want to round numbers to two decimals:
`r round(mean(penguins$body_mass_g),2)`
Exercise 4: insert inline code
Practice with inserting inline code using the penguin data set in your exercise article. Import the data set penguins.csv into your .qmd file (because there are some missing values in the data the “drop_na()” command is added, %>% is the so-called pipe from the tidyverse, you could also use the RStudio pipe |>):
```{r}
#| label: data
<- read_csv("penguins.csv") %>% drop_na()
penguins ```
You will need to do some data exploration to be able to do this: study the data file in RStudio to see which variables are used. Use inline code to report in a sentence, for instance:
- how many penguins were measured on each island
- summary statistics: how many penguins were measured
- the average bill-length and bill-depth
Render the file and study the output.
End of exercise 4
2.4.5 Inserting tables and figures
Tables
Compared to RMarkdown, life has become much easier with Quarto to inserting Tables in the text. You can refer to tables produced in a code chunk with the chunk option #| label: tbl-tablename and you can provide a caption to the table with the chunk option #| tbl-cap: “caption text”. The prefix ‘tbl-’ tells Quarto that it is about a table and you can refer to it in the text by typing: @tbl-tablename and Quarto will then number the table for you. You can use various table packages inside the code chunk to produce tables from data. In this course we stick to kable
and kableExtra
and gt
with its extensions.
If, for some reason, you want to produce a table from scratch without these table packages, then there is the possibility to do this from the RStudio menu:
Insert -> Table
(Shortcut: Ctrl-Alt-T) and then you will have to fill in the table cells manually. This is not recommended for scientific work (avoid copy-paste actions!) but it can be handy for some simple stand-alone tables that are not based on data, or if you need to produce a table with purely text. Table 1 was produced in this way and may serve as a simple example:
Measurement | Method | Dimension |
---|---|---|
weight | balance | kg |
length | measuring tape | m |
(If you switch to Source mode when you have produced a table in this way, you can actually see a table in Markdown language.) Quarto will provide a Table number and you can cross-reference such tables by typing a colon (:) followed by a text below the table which is followed by {#tbl-tablename}; the cross-reference is then @tbl-tablename. Quarto can even produce tables directly from dataframes without any additional editing. By using the function head() from the tidyverse
package (head displays the first six rows of a dataframe or tibble), for instance, the following table is obtained for the penguin data.
Code
head(penguins)
# A tibble: 6 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
# ℹ 2 more variables: sex <chr>, year <dbl>
In general, however, you will most likely build tables from data and then it pays off to use dedicated packages because those will give you many options for formatting. With the kable function (from R package knitr
) inside a code chunk the first six rows and columns of the penguins data set look as displayed in Table 2:
```{r}
#| label: tbl-two
#| tbl-cap: "Table produced by kable on the first 6 rows of the penguin data"
::kable(mtcars[1:6,1:6])
knitr```
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g |
---|---|---|---|---|---|
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 |
Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 |
Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 |
Adelie | Torgersen | NA | NA | NA | NA |
Adelie | Torgersen | 36.7 | 19.3 | 193 | 3450 |
Adelie | Torgersen | 39.3 | 20.6 | 190 | 3650 |
The R package kableExtra
gives you many more options to format a table according to your wishes, see Create Awesome HTML Table with knitr::kable and kableExtra (r-project.org). For instance, Table 6 a could be changed as shown in Table 3 (see Figure 21 for the code).
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g |
---|---|---|---|---|---|
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 |
Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 |
Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 |
Adelie | Torgersen | NA | NA | NA | NA |
Adelie | Torgersen | 36.7 | 19.3 | 193 | 3450 |
Adelie | Torgersen | 39.3 | 20.6 | 190 | 3650 |
You can be more specific in the YAML about how to print using the df-print option (see Quarto - Using R for more details). This includes an option for kable
. kable takes a dataframe as input and turns it into a markdown format in the background, which is then rendered in the required output (html, pdf, word).
Probably, the most advanced R package to produce tables is gt
(which stands for the grammar of tables, from the RStudio team). See Easily Create Presentation-Ready Display Tables • gt (rstudio.com) and links therein for more information. Figure 22 gives a typical workflow for gt
and shows schematically the many options to change the appearance of a table.
For instance, Table 4 is the table output with gt
for the penguin data, shown here as the default setting of gt
, but as mentioned there are many options to modify the format.
Code
library(gt)
head(penguins[,1:6]) %>%
::rownames_to_column("penguins") %>%
tibblegt()
penguins | species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g |
---|---|---|---|---|---|---|
1 | Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 |
2 | Adelie | Torgersen | 39.5 | 17.4 | 186 | 3800 |
3 | Adelie | Torgersen | 40.3 | 18.0 | 195 | 3250 |
4 | Adelie | Torgersen | NA | NA | NA | NA |
5 | Adelie | Torgersen | 36.7 | 19.3 | 193 | 3450 |
6 | Adelie | Torgersen | 39.3 | 20.6 | 190 | 3650 |
A very interesting package that builds upon gt
is the R package gtsummary
, see Presentation-Ready Data Summary and Analytic Result Tables • gtsummary (danieldsjoberg.com). It produces ready-to-print tables, for instance with descriptive statistics, or the results of regression models. Table 5 gives an example for the penguins data set using this package:
Code
library(gtsummary)
tbl_summary(penguins)
Characteristic | N = 3441 |
---|---|
species | |
Adelie | 152 (44%) |
Chinstrap | 68 (20%) |
Gentoo | 124 (36%) |
island | |
Biscoe | 168 (49%) |
Dream | 124 (36%) |
Torgersen | 52 (15%) |
bill_length_mm | 44.5 (39.2, 48.5) |
Unknown | 2 |
bill_depth_mm | 17.30 (15.60, 18.70) |
Unknown | 2 |
flipper_length_mm | 197 (190, 213) |
Unknown | 2 |
body_mass_g | 4,050 (3,550, 4,750) |
Unknown | 2 |
sex | |
female | 165 (50%) |
male | 168 (50%) |
Unknown | 11 |
year | |
2007 | 110 (32%) |
2008 | 114 (33%) |
2009 | 120 (35%) |
1 n (%); Median (IQR) |
See also the cheat sheet gtsummary available at internet. Yet another extension to gt
is gtExtras
that makes the formatting of tables easy: see Additional features for creating beautiful tables with gt • gtExtras (jthomasmock.github.io).
It is also possible to produce two tables side-by-side with the code chunk option #| layout-ncol, see Table 6 for the output:
```{r}
#| label: tbl-three
#| tbl-cap: "Example of two tables next to each other"
#| tbl-subcap: ["penguins", "Just cars"]
#| layout-ncol: 2
# table on the left
::kable(head(penguins[, 1:3]))
knitr
# table on the right
::kable(head(penguins))
knitr```
Multiple tables produced within a code cell can be referred to as subtables by adding a tbl-subcap option, as in Table 6.
Table 6: Example of two tables next to each other
species | island | bill_length_mm |
---|---|---|
Adelie | Torgersen | 39.1 |
Adelie | Torgersen | 39.5 |
Adelie | Torgersen | 40.3 |
Adelie | Torgersen | NA |
Adelie | Torgersen | 36.7 |
Adelie | Torgersen | 39.3 |
Characteristic | Adelie, N = 1521 | Chinstrap, N = 681 | Gentoo, N = 1241 |
---|---|---|---|
sex | |||
female | 73 (50%) | 34 (50%) | 58 (49%) |
male | 73 (50%) | 34 (50%) | 61 (51%) |
flipper_length_mm | 190 (7) | 196 (7) | 217 (6) |
body_mass_g | 3,701 (459) | 3,733 (384) | 5,076 (504) |
1 n (%); Mean (SD) |
Being able to format tables based on your own data in a publication-ready way is of course the ultimate goal, so let’s practice with that now in the penguin exercise article.
Exercise 5: inserting tables in the exercise report
Open your exercise manuscript and add manually (via Insert Table) a similar table as in Table 1 shown above. Save and render to study the output.
Next, produce a Table from the penguins data set. Try to reproduce Tables 1, 2 and 3 from the exercise manuscript using kable from knitr and kableExtra (see above for some options you can use), you will need to do some data wrangling. Feel free to experiment with other table producing R packages. Make sure that reference is made in the text to the tables. Render the document and study the output.
End of exercise 5
Figures
Figures can be incorporated in the text in two ways.
Importing figures
Importing figures (not produced by R code) goes via a link in Quarto; this needs not be done from a code chunk, it can be directly in the text. Syntax is: 
(this is actually markdown code and is the same syntax as used for producing links but preceded with an exclamation mark). Another option is to insert an outside figure from a code chunk, using a function from knitr called “include_graphics()”.
```{r}
#| label: fig-knitrfunction
#| fig-cap: Use of knitr to include files from outside Quarto
::include_graphics("filename.png")
knitr```
Figures can be given captions (#| fig-cap: some-text) and they are numbered automatically by referring to them in the text as @fig-name (this fig-name reference should obviously correspond to the label name of the chunk in which the figure is produced). The prefix ‘fig’ tells Quarto that it concerns a figure. You can also play with their location and alignment. If you would like to put figure captions in the margin, then write in the code chunk options: #| cap-location: margin. Other options are fig-align, fig-height and fig-width (the latter two with values in inches); the default values for figure width and height are 5.5 x 3.5 inch, but you can change that as indicated. For many more details on how to handle Figure details check out Quarto - Figures. For positioning, aligning and combining graphs produced within code chunks, there is also the option to use the fantastic R package patchwork
The Composer of Plots • patchwork (data-imaginist.com). Let’s practice with adding figures from an outside source in the exercise manuscript.
Exercise 6: import figures from outside in the exercise article
Two outside figures (as *.png files) are provided that explain some of the terms used in the penguin data set, namely penguin.png and culmen_depth.png. Insert them in your doc, make reference to them in the text and make sure that they are numbered. Try both methods, with and without using a code chunk.
End of exercise 6
Producing Figures from data
Producing figures from available data can be done using baseR or in the tidyverse
with ggplot. This will normally be done in a code chunk. ggplot is a fantastic tool to produce very fancy graphical output (Wickham 2016), much better than the baseR plotting possibilities. See also the cheat sheets on data visualization, data-import, data-transformation, and tidyr available on internet. In the e-book from Hadley Wickham R for data science, Chapter 2 Data visualization, you will find an explanation of ggplot using the penguins data set.
There are endless possibilities in using ggplot to visualize data. A website describing extensively such possibilities is ggplot tutorial. Besides, there are numerous internet sites you can consult, or internet fora where you can ask for help if you are stuck.
The R package patchwork
makes it easy to combine and position graphs and to apply annotation. See this tutorial for more information.
It is possible to produce multiple figures in a code chunk and reference them as subfigures (e.g., Figure 1a and Figure 1b) by using the option ‘fig-cap’ for the main caption and ‘fig-subcap’ for subcaptions. See Figure 23 for how to achieve this for two figures in one. You can then refer to the whole figure (@fig-plots) but also to a subfigure, e.g., the second figure (@fig-plots-2). See Quarto - Cross References for details.
In terms of sizing of figures, Chunk options are:
fig-width: 6 (6 inches, recommended by Hadley Wickham)
fig-asp: 0.618 (aspect ratio, ratio of width to height, 0.618 recommended by Hadley Wickham)
fig-height (if fig-width and fig-asp are given this needs not to be supplied)
out-width: “70%” (controls the output size as % of line width, 70% recommended by Hadley Wickham)
fig-align: “center” (recommended by Hadley Wickham in combination with 70% out-width)
out-height:
If you want plots shown in the text directly after the code that produces the plot, supply this chunk option:
- fig-show: “hold”
It may happen that you would like to use inline code inside a code chunk. An option to do that is with the R package glue
with which you can make a so-called glue-chunk. First, you will have to install and load the glue
package, of course. The glue-chunk needs the output: asis
option, while the R code you want to use needs to be within curly brackets {}. It may look like this:
```{glue}
#| label: glue-example1
#| output: asis
library(glue)
The mean of the penguins body mass = {round(mean(penguins$body_mass_g),2)}
```
This becomes especially useful in a figure caption, for instance to refer to the number of data points without having to type that but to read it from the data. This can be achieved by putting !expr glue::glue() in the sentence.
```{r}
#| label: fig-glue-example2
#| fig-cap: "!expr glue::glue(The mean of the penguins body mass = {round(mean(penguins$body_mass_g),2)})"
```
With all this information, it is now time to produce graphs in the exercise paper using the penguin data.
Exercise 7: producing graphs from data in the exercise article
There are quite a few options to produce graphs from the penguin data set. Let’s practice a few.
Take a look at the example exercise paper and try to reproduce Figures 2, 3, 4, 5, 6 using ggplot and patchwork and write some text to explain what the figures are showing.
End of exercise 7
2.5 Inserting equations
Depending on your field of research, you may want to show mathematical/statistical equations in your paper. There are three ways to show equations in a Quarto document:
inline equations, enclosed by a $ sign at each side of the equation (such equations will not be numbered)
unnumbered equations separate from the text, enclosed by two $$ signs at each side of the equation (such equations will not be numbered if you do not indicate that)
numbered equations enclosed by two $$ signs at each side of the equation to which you can refer by giving them a name
In all cases LaTeX syntax is used, so you will have to learn that first. There are many cheat sheets available on internet where you can look up codes. Though it may look intimidating at first, once you get to know it, it is actually quite intuitive and not more difficult than, for instance, the equation editor in Word. Moreover, when in Visual mode, Quarto makes life easy by showing the resulting equation while you type LaTex code, so you can immediately see how the output will look like. Start with an equation by clicking on Insert: see Figure 24 and choose either Inline Math (will produce an equation inline) or Display Math (will produce an equation on a separate line).
Quarto will number equations, if they are on a separate line and when you give them a name. In Visual mode, click on the three small dots on the right hand side, edit attributes, and fill in a name preceded by #: see Figure 25 and refer to them in the text as @eq-name (the prefix ‘eq-’ tells Quarto it is about an equation).
(In Source mode, you can achieve the same by adding at the end of the equation {#eq-name} after the two dollar signs.)
The best way to learn to type equations is to just do it, so try it in the next exercise.
Exercise 8: practising writing equations LaTex style
LaTex code you will often need:
^ for superscripts
_ for subscripts
{ } to keep expressions together
\ followed by the name of a greek letter will display greek letters in the equation
\ followed by frac will give fractions with the numerator in { } followed by the denominator in { }.
Quarto will put symbols that represent quantities in italics by default, if you want it differently:
\ followed by mathbf, followed by {symbol} will give you a boldface symbol
\ followed by mathrm, followed by {symbol} will give you a roman symbol
Operators such as logarithms, exponentials need to be preceded by a backslash \ to prevent them from being put in italics
Open your exercise article to practice with equations. Try to reproduce the equations given below and refer to them in the text so that they are numbered. To start an equation, click on
Insert -> LaTex Math -> Inline Math (for an equation in the text)
or:
Insert -> LaTex Math-> Display Math (for stand alone equations)
Try to introduce the following equations in your exercise doc, inline:
The famous Einstein equation is \(E=mc^2\) that everyone knows.
Or the same equation as stand-alone and refer to it in the text:
The famous Einstein equation is: \[E=mc^2 \tag{1}\] Everyone knows Equation 1 (but may not understand its implications).
An exercise with some Greek symbols is in Equation 2:
\[ \xi=\frac{n_i-n_0}{\nu_i}=\frac{\Delta n_i}{\nu_i} \tag{2}\]
An equation with an exponent and a fraction is in Equation 3:
\[ r=r_{\mathrm{f}}\left(1-\exp \left(-\frac{\text{A}_{\mathrm{f}}}{R T}\right)\right) \tag{3}\]
A Bayesian expression for a regression equation is in Equation 4:
\[ \begin{split} y_i \sim \mathcal N(\mu,\sigma) \\ \mu = a + b\cdot x \\ a \sim \mathcal N(0,1) \\ b \sim \mathcal N(0,1) \end{split} \tag{4}\]
A covariance matrix looks like in Equation 5:
\[ \begin{split} \begin{align} \Sigma = \begin{pmatrix} \sigma_1^2 & \sigma_{12} \\ \sigma_{21} & \sigma_2^2 \\ \end{pmatrix} \end{align} \end{split} \tag{5}\]
End of exercise 8
If you want to write chemical equations, be aware that equations are printed in italic by default. This is not wanted for chemical equations. An easy way to force chemical equations to be printed in roman font is to use the mathrm command.
\[ \mathrm{CO_3^{2-} + H^+ \rightleftharpoons HCO_3^{2-}} \] Another option for chemical equations is to use a LaTex package called mhchem
, which you then have to call in the yaml, see Figure 26. This is an example where a LaTex package is used within Quarto (to be discussed later in more detail).
mhchem
from the YAMLWith the aid of this latex package mhchem
it is then easy to write chemical equations. The syntax is to put \ce{here comes your chemical equation}. Like normal equations they can be put inline \[\ce{CH3CO2H + H2O <=> CH_3CO^-_2 + H3O+}\] like here. They can also be numbered.
\[ \ce{CH3CO2H + H2O <=> CH_3CO^-_2 + H3O+} \]
The same syntax as for mathematical equations can be used, for instance for fractions:
\[ K_c = \frac{\ce{[CH3CO_2^-][H3O+]}}{\ce{[CH3CO2H][H2O]}} \]
There is also a Quarto extension for the latex package mhchem
that you might find useful if you want to use different output formats.
There is a very handy snip tool called Mathpix with which you can copy equations, printed or even hand-written, from a digital source; the snip tool will translate it into a LaTex equation that you can insert in a document. See Image to LaTeX (mathpix.com).
2.6 Numbering sections, table of contents, list of tables and figures
Some journals may ask to add a list of tables (quarto code: lot) and a list of figures (quarto yaml code: lof) used in a manuscript, or you may want to show that in a thesis. You may also want to include a table of content (quarto yaml code: toc). If you want sections to be numbered, you have to indicate that by the command: number-sections: true. All this is easily achieved by adding to the yaml (next to other commands):
---
toc: true
number-sections: true
lof: true
lot: true
---
And if you want to refer to a section, you can indicate this after the section name with the code. For instance, if you label the Introduction as {#sec-intro} (can be any name, not necessarily intro) then you can refer to it and Quarto will number it for you.
Table 7 shows an overview of how to cross reference figures, tables, equations and sections in a Quarto document.
Element | Label prefix | Cross reference in text |
---|---|---|
Figure | fig-name (in chunk) | @fig-name becomes Figure … |
Table | tbl-name (in chunk) | @tbl-name becomes Table … |
Equation | #eq-name (after equation) | @eq-name becomes Equation …. |
Section | #sec-name (after section) | @sec-name becomes Section … |
Exercise 9: introduce numbered sections and lists of tables and figures and a table of contents in the exercise article
Open your exercise article and modify the yaml as indicated above to produce a table of contents, a list of figures and a list of tables. You can also try to cross reference to sections of your document and refer to a particular section in the text. Render and observe the output.
End of exercise 9
2.7 Home work
Based upon what you have learned during the first part of the workshop, you should be able to produce two more exercise reports on your own. That is the assignment for home work so that we can bind the three exercise papers together in a booklet in the second part of the workshop.
2.7.1 Own data?
If you have own data available that you would like to use in a future manuscript, then you can also practice with that. If you do not have such data, we have provided you with two data sets to practice with, the pottery data set and the wine data set.
2.7.2 The pottery data set
Data were collected from archaeological sites in the UK with the research objective to investigate a if a particular chemical composition of pottery discovered at various sites could be linked to a particular site. The original paper in which the data were published is to be found in TUBB, PARKER, and NICKLESS (1980). (The data are also used in and part of the R package car (companion to applied regression) and SAS). You can find the data in a file called Pottery_data.csv. What is reported is actually the concentration of the oxides of the elements.
Exercise 10: produce an exercise report on the pottery data
Write a small paper using the data provided with the usual scientific format. Then, import the data into a dataframe called Pottery and explore the data: what are in the rows and what in the columns? Indicate in the Introduction using inline R code how many sites there are, how many elements were analyzed and what the ranges of the concentrations are. |
1. Create boxplots for the element Al for each site using ggplot.
2. Do the same for the other elements
3. You may have noticed that this is not a very efficient way of working: repeating the same action five times. There is a much more efficient way but then you need tidy data. The data in Pottery_data.csv is in the so-called wide format, i.e., one observation row per subject (site) with each measurement (chemical element) present in a different column. You can change a wide format in a long format so as to make the data tidy., i.e., one observation row per measurement, i.e., multiple rows per subject (site). The tidyverse
package contains a command to do this: pivot_longer(). Use this command as follows and observe the output in the new dataframe Pottery_long:
Pottery_long <- Pottery %>% pivot_longer(!Site, names_to=“elements”, values_to=“concentration”)
4. Once you have the data in the long format, make use of the ggplot command facet_grid() to produce faceted plots and add some labels to the plots.
5. Apply a linear model for the differences in concentrations and relate them to the different sites and elements. This is achieved by a full factorial ANOVA model with Site and elements as factors with interaction. Write the equation for such a model and make sure the equation will be numbered.
6. Perfom linear regression using the lm function from base R and then do an ANOVA analysis and make a Table of the ANOVA results. Make sure that the table is numbered. Comment on the results.
7. A visualisation of the interaction effect can be made using the function emmip() from the R package emmeans
(you may have to install the package first). Show this in a figure and make sure it is numbered in the text.
8. Perform pairwise comparisons to test which combinations are significant. Use Tukey’s method to correct for false positives (type 1 errors). Use the emmeans() function from the emmeans
package for this. Show the results in a table and make sure it is numbered in the text.
9. Show the results and write some text in which you refer to the figures and tables. Add some references from the exercise.bib file, or other relevant articles you may find on the internet.
End of exercise 10
2.7.3 The wine data set
A large data set is available on quality of wine in relation to several chemical parameters. This dataset is related to Portuguese “Vinho Verde” wine. For more details, consult the reference (Cortez et al. 2009). Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
Exercise 11: produce a exercise report on wine data
Write a short note on the prediction of quality of wine using the variables investigated with the usual format of Introduction, Materials and Methods, Results and Discussion and References.
- Use inline code to indicate how many different samples of wine are available and how many variables.
- Spend some words on exploratory data analysis. Create boxplots to show how the variables are distributed and comment on it.
- Perform a simple linear regression of the dependent variable Sweet taste various the other variables to see which metabolites are significantly related with sweet taste.
- Before putting some trust in this model, you need to check if there are collinearity problems. One way of doing this is by calculating the VIF values for each variable: as long as the VIF values are below 10, there is no collinearity problem.
- Not all independent variables are relevant. Produce a table that shows which variables have a significant effect and which not.
- It is also important to get an impression whether or not the assumption of normality is reasonable. Make a QQ plot of the residuals to check this.
- Use inline code in the text to report an R2 value.
End of exercise 11
Appendix: A very brief introduction to Git and GitHub
What is Git and what is GitHub?
Open and reproducible science requires a paradigm shift in the sense that handling projects and files need to be done differently. Git and GitHub are tools that can be used, together with RStudio and R. However, open and reproducible science is more than R and RStudio, they are useful but not exclusive tools to reach it. Open and reproducible science requires a strict data analysis workflow. The fact that R is a code-based program does help in this respect because it forces to make explicit what you are doing (in contrast to a click-and-run approach in some software packages where things are running and hidden behind the screen). RStudio makes it much easier to work with R and it greatly facilitates project management. Moreover, it offers a very nice and easy-to-use interface with Git and GitHub. Remember that it is important to not only report on the results of scientific research but also on how the results were obtained. Project management and version control are then important tools to achieve that. Version control is a different approach than most scientists practice, so it takes time to get used to it.
Version control systems
Every scientist will eventually publish research results in whatever format. In doing so, various versions of a manuscript will be written, manuscriptv1.docx, after comments and review procedures, manuscriptv2.docx will appear, etc. At some point it will be difficult to track what has been changed in version x as compared to version 1. This is a manual, non-reproducible way of doing version control. Why not use software that is developed to keep track of the changes? Software for version control was designed first and foremost for software developers. Software is usually developed by teams and then it is necessary to keep team members informed about who has done what. Version control software offers that opportunity. As it happens, this approach appears to be also very useful for scientists to cooperate but it requires some practice to get used to the terminology and types of action. A very popular version control software system is called Git. It tracks changes at the project level. It stores the version history in a .git/folder (that may be hidden). It also keeps track of what needs NOT to be tracked in the .gitignore file (such as temporary files). This is stored in so-called repositories (in short: repo’s, and they can be compared to folders and subfolders). What Git does is that it takes snapshots of changes made to files, when the user makes so-called commits; a commit tracks which files were changed, what was changed, who changed it, and it is accompanied by a short description of the change).
Git operates on local computers. However, that would not help in making a workflow reproducible because it cannot be approached by others. Therefore, there are systems that store a project history in the cloud. A well-known one is GitHub, another is GitLab. These are companies that host Git repositories. While Git has to be installed on a local computer, GitHub and the like are websites and one can open an account there. GitHub is widely used in the R community and also increasingly in the scientific community to make data and code accessible. Git repo’s can be synchronized with cloud services like GitHub and GitLab. The terms used are ‘to push’, which basically means uploading, and ‘to pull’, which means downloading. Another term used is ‘cloning’, which means copying a directory that is stored on GitHub to your own computer.
How to get Git and GitHub to work?
Git needs to be installed on your local computer, while you need to make an account on GitHub. It is free software, at least for academic purposes. There are many tutorials on internet that show how to do this. Git can be downloaded from Git (git-scm.com), while GitHub is found at GitHub: Let’s build from here · GitHub.
Resources on Git and GitHub
There are many good tutorials available on internet. The probably most thorough information can be found in an e-book from Jenny Bryan: Let’s Git started | Happy Git and GitHub for the useR (happygitwithr.com).
Very good tutorials are an introductory course (including how to work with R) 1 Welcome! | Reproducible Research in R (rostools.org), a follow-up intermediate course 1 Welcome! | Reproducible Research in R (rostools.org) and an advanced course Reproducible Research in R (rostools.org). Another one is Git and Github for Advanced Ecological Data Analysis (afredston.github.io). See also Versionner facilement vos scripts R avec git - DellaData for a very clear instruction (in French but the many screenshots on how to put it to work are in English).
3 References
Footnotes
This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.↩︎
RStudio as a company has changed its name in Posit in late 2022, but for the moment the IDE (Integrated Development Environment) software application RStudio is still called RStudio.↩︎
Quarto is the format of a book or pamphlet produced from full sheets printed with eight pages of text, four to a side, then folded twice to produce four leaves. The format of the first published European book (in 1452) was quarto.↩︎
This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.↩︎