--- title: "Summarytools in R Markdown Documents" author: "Dominic Comtois" date: "`r Sys.Date()`" output: html_document: fig_caption: false toc: true toc_depth: 1 css: assets/vignette.css df_print: default vignette: > %\VignetteIndexEntry{Summarytools in R Markdown Documents} %\VignetteEncoding{UTF-8} %\VignetteDepends{kableExtra} %\VignetteDepends{magrittr} %\VignetteEngine{knitr::rmarkdown} --- ```{r, echo=FALSE, results='asis'} summarytools::st_css(main = TRUE, global = TRUE) ``` ```{r setup, include=FALSE} library(knitr) opts_chunk$set(comment = NA, prompt = FALSE, cache = FALSE, echo = TRUE, results = 'asis') library(summarytools) st_options(bootstrap.css = FALSE, # Already part of the theme so no need for it plain.ascii = FALSE, # One of the essential settings style = "rmarkdown", # Idem. dfSummary.silent = TRUE, # Suppresses messages about temporary files footnote = NA, # Keeping the results minimalist descr.silent = TRUE, # To avoid messages when building / checking subtitle.emphasis = FALSE) # For the vignette theme, this gives better results. # For other themes, using TRUE might be preferable. ``` # 1. Introduction This document mainly contains examples showing how best to use **summarytools** in *R Markdown* documents. For a more in-depth view of the package's features, please see `vignette("introduction", "summarytools")` - the online version can be found [here](https://cran.r-project.org/package=summarytools/vignettes/introduction.html). ## 1.1 Methods vs Styles Every time we display **summarytools** objects with `print()`, `view()`, or `stview()`, we pick -- explicitly or not -- one of several display methods. Possible display methods are: *pander*, *render*, *viewer*, and *browser*. It is one of the parameters for `print.summarytools()` and `view()` (alias: `stview()`). Since methods *viewer* and *browser* are mostly meant for interactive work and rely on the same underlying code as *render*, we will assume for the purpose of this document that there are really only two methods: *pander* and *render*. ### Only the *pander* Method Uses Styles The *pander* method is used by default when results are automatically printed to the console, or when we use `print()` without an explicit `method` argument. The *style* parameter is communicated to **pander** (see `?pander::pander` or visit its [GitHub page](https://github.com/Rapporter/pander) to learn more on this very useful package).
![](assets/lightbulb.svg) When we use any of the *viewer*, *browser*, or *render* methods, the package uses **htmltools** to generate results; any specified *styles* are thus ignored.
### **summarytools** styles are **pander** styles Available styles are the ones supported by **pander**: - simple (default, used mainly in R console) - rmarkdown (used by all core functions except `dfSummary()`) - grid (mainly used with `dfSummary()`) - multiline (can be used with `dfSummary()` if you use *ascii* graphs only) - jira (more recent addition, not thoroughly tested) ## 1.2 General Guidelines **Always** set **results='asis'** either explicitly on a chunk-by-chunk basis or by including `opts_chunk$set(results = 'asis')` in your setup chunk. Also, don't forget to specify **`plain.ascii = FALSE`** in all function calls using the *pander* method. It is advised to set this option, as well as the `style` option in the setup chunk: ```{r, eval=FALSE} st_options(plain.ascii = FALSE, style = "rmarkdown") ```
![](assets/exclamation-diamond.svg) If you get repeated, unhelpful warnings, use chunk options `message = FALSE` and/or `warning = FALSE`. Another option is to use the argument `silent = TRUE` to the `print()` method or `view()` / `stview()` functions. See `?st_options` to set this globally for individual functions.
The following table indicates which method / style is better suited for each **summarytools** function in the context of R Markdown documents: | Function | render method | pander method | pander style | |:------------|:-------------:|:-------------:|:-------------| | freq() | ✓ | ✓ | rmarkdown | | ctable() | ✓ | Sub-optimal | rmarkdown | | descr() | ✓ | ✓ | rmarkdown | | dfSummary() | ✓ | ✓ | grid | **Recommended Style When Using *pander* method** For `freq()`, `descr()`, and `ctable()`, _rmarkdown_ style is recommended. For `dfSummary()`, _grid_ is recommended. Note that _multiline_ can also be used, but only _ascii_ graphs will be displayed. Starting with `freq()`, we'll now review the recommended methods and styles to get satisfying results in *R Markdown* documents. -------------------------------------------------------------------------------- # 2. Using freq() in R Markdown `freq()` is best used with method "pander" (default), `style = "rmarkdown"`; *html* rendering is also possible. ## 2.1 Pander Style for freq() With `method = "pander"`, `style = "rmarkdown"` is the easy winner. Since "pander" is the default method, you can usually omit the call to `print()`. But to make things as clear as possible, we'll include it here. ```{r} print(freq(tobacco$gender, plain.ascii = FALSE, style = "rmarkdown"), method = "pander") ``` ## 2.2 HTML Rendering for freq() There are rarely any problems when using the *render* method to display `freq()` results. ```{r} print(freq(tobacco$gender), method = "render") ``` If you find the table is too large, you can use `table.classes = "st-small"`: ```{r, message=FALSE} print(descr(tobacco), method = "render", table.classes = "st-small") ``` -------------------------------------------------------------------------------- Back to top # 3. Using ctable() in R Markdown ## 3.1 Rmarkdown Style for ctable() Tables with multi-row headings are not fully supported in *markdown* (yet), but the result is close to acceptable. This, however, is not true for all themes. That is why the rendering method is preferred. ```{r} ctable(tobacco$gender, tobacco$smoker, plain.ascii = FALSE, style = "rmarkdown") ``` ## 3.2 HTML Rendering for ctable() For best results, use this method. ```{r ctable_html} print(ctable(tobacco$gender, tobacco$smoker), method = "render") ``` -------------------------------------------------------------------------------- Back to top # 4. Using descr() in R Markdown `descr()` gives good results with both `style = "rmarkdown"` and *html* rendering. ## 4.1 Rmarkdown Style for descr() ```{r} descr(tobacco, plain.ascii = FALSE, style = "rmarkdown") ``` ## 4.2 HTML Rendering for descr() We'll use `table.classes = "st-small"` to show how it affects the table's size, compared to the `freq()` table rendered earlier. We'll also use `message = FALSE` as chunk option to avoid the message saying that non-numerical variables have been ignored. ```{r, message=FALSE} print(descr(tobacco), method = "render", table.classes = "st-small") ``` -------------------------------------------------------------------------------- Back to top # 5. Using dfSummary() in R Markdown To get optimal results, whichever method you choose, it is always best to omit at least 1, and if possible 2 columns from the output. Also, pick carefully the value of the `graph.magnif` parameter. ## 5.1 Grid Style for dfSummary() Don't forget to specify `plain.ascii = FALSE` (or set it as a global option with `st_options(plain.ascii = FALSE)`), or you won't get good results. (Note: The following output is an image (screenshot). This is because CRAN doesn't allow writing in "/tmp" or any directory other than R's temp directory, which would pose problems in terms of column widths. The introductory vignette explains this issue in more details.) ```{r dfs_grid, eval=FALSE} dfSummary(tobacco, plain.ascii = FALSE, style = "grid", graph.magnif = 0.75, varnumbers = FALSE, valid.col = FALSE, tmp.img.dir = "/tmp") ``` ### 4.2 HTML Rendering for dfSummary() This method works really well, and not having to specify the `tmp.img.dir` parameter is a plus. ```{r} print(dfSummary(tobacco, varnumbers = FALSE, valid.col = FALSE, graph.magnif = 0.75), method = "render") ``` ## 4.3 Managing Lengthy dfSummary() Outputs in R Markdown Documents For data frames containing numerous variables, we can use the `max.tbl.height` argument to wrap the results in a scrollable window having the specified height, in pixels. ```{r} print(dfSummary(tobacco, varnumbers = FALSE, valid.col = FALSE, graph.magnif = 0.75), max.tbl.height = 300, method = "render") ```
![](assets/exclamation-diamond.svg) Some users reported getting repeated X11 warnings; those can easily be avoided by using the following chunk expression: `{r, results="asis", warning=FALSE}`.
Back to top -------------------------------------------------------------------------------- # 5. Using Other Formatting Packages As explained in the introductory vignette, `tb()` can be used to convert **summarytools** objects created with `freq()` and `descr()` to simple *tibbles*, which packages specialized in table formatting will be able to process. This is particularly helpful with `stby` objects: ```{r} library(kableExtra) library(magrittr) stby(iris, iris$Species, descr, stats = "fivenum") %>% tb() %>% kable(format = "html", digits = 2) %>% collapse_rows(columns = 1, valign = "top") ``` Using `tb(order = 3)` flips the order of the grouping variable(s) and the reported variable(s): ```{r} stby(iris, iris$Species, descr, stats = "fivenum") %>% tb(order = 3) %>% kable(format = "html", digits = 2) %>% collapse_rows(columns = 1, valign = "top") ``` Back to top -------------------------------------------------------------------------------- # 6. Including dfSummaries in PDF Documents Here is a recipe for including fully formatted data frame summaries in *pdf* documents. There is some work involved, but carefully following the instructions given here should give the expected results. There are basically two parts to this: first, you must create a preamble *tex* file. Second, you must indicate in the *YAML* section of your document where to find this file. ### Included Preamble *Tex* File This is the \LaTeX content that needs to be included as preamble. You can either copy this into your own *tex* file, or use the file that is now included in **summarytools** (as of version 1.0), following the instructions provided below. ```` \usepackage{graphicx} \usepackage[export]{adjustbox} \usepackage{letltxmacro} \LetLtxMacro{\OldIncludegraphics}{\includegraphics} \renewcommand{\includegraphics}[2][]{\raisebox{0.5\height}% {\OldIncludegraphics[valign=t,#1]{#2}}} ```` If you choose to create a *tex* file from the above content, the name of the file is arbitrary -- you can use whatever name you want. Its location is also up to you. I suggest you put it in the same location as your *Rmd* file. Along with the `graph.magnif` parameter for `dfSummary()`, you might need to adjust the `0.5` value used as `raisebox` parameter in the preamble. ### The YAML Section Your document should start with a *YAML* header like this one: ``` --- title: "My PDF With Data Frame Summaries" output: pdf_document: latex_engine: xelatex includes: in_header: - !expr system.file("includes/fig-valign.tex", package = "summarytools") --- ``` If you need to customize the content of the preamble, then your header will look something like this (assuming it is in the same directory as your *Rmd* document): ``` --- title: "My PDF With Data Frame Summaries" output: pdf_document: latex_engine: xelatex includes: in_header: fig-valign-modified.tex --- ```
![](assets/lightbulb.svg) The *xelatex* engine option is not mandatory, but there are several advantages to it. I use it systematically and recommend you do the same.
### R Code Here is an example setup chunk: ```` ```{r, message=FALSE}`r ''` library(summarytools) st_options( plain.ascii = FALSE, style = "rmarkdown", dfSummary.style = "grid", dfSummary.valid.col = FALSE, dfSummary.graph.magnif = .52, subtitle.emphasis = FALSE, tmp.img.dir = "/tmp" ) ``` ```` And here is a chunk actually creating the summary: ```` ```{r, results='asis', message=FALSE}`r ''` define_keywords(title.dfSummary = "Data Frame Summary in PDF Format") dfSummary(tobacco) ``` ```` ### Remarks Since we redefined the $\LaTeX$ command `includegraphics`, all images included using `[](some-image.png)` will be impacted. In some cases, this could pose a problem. Eventually, we hope to find a more robust solution, without such side-effects. (If you are well versed in $\LaTeX$ and think you can solve this problem, please get in touch.) -------------------------------------------------------------------------------- # 7. This Vignette's Setup This vignette uses theme `rmarkdown::html_vignette`. Its *YAML* section looks like this: ``` --- title: "Summarytools in R Markdown Documents" author: "Dominic Comtois" date: "`r Sys.Date()`" output: html_document: fig_caption: false toc: true toc_depth: 1 css: assets/vignette.css vignette: > %\VignetteIndexEntry{Summarytools in R Markdown Documents} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\VignetteDepends{magrittr} %\VignetteDepends{kableExtra} --- ```
The *vignette.css* file is copied from the installed **rmarkdown** package's 'templates/html_vignette/resources' directory. ### Global Options The following **global options** for **knitr** and **summarytools** have been set. Other options might also be useful to optimize content, but this is a good place to start from. ```` ```{r setup, include=FALSE}`r ''` library(knitr) opts_chunk$set(comment=NA, prompt=FALSE, cache=FALSE, echo=TRUE, results='asis') st_options(bootstrap.css = FALSE, # Already part of the theme plain.ascii = FALSE, # Essential setting for Rmd style = "rmarkdown", # Essential setting for Rmd dfSummary.silent = TRUE, # Hides redundant messages footnote = NA, # Keeping the results minimal subtitle.emphasis = FALSE) # For the vignette theme, # this gives better results. # For other themes, using # TRUE might be preferable. ``` ```` Finally, **summarytools CSS** has been included in the following manner, before the setup chunck: ```` ```{r, echo=FALSE, results='asis'}`r ''` summarytools::st_css(main = TRUE, global = TRUE) ``` ```` -------------------------------------------------------------------------------- # 8. Final Notes This is by no way a definitive guide; depending on the themes you use, you could find that other settings yield better results. If you are looking to create a _Word_ or a _PDF_ document, you might want to try different combinations of options. If you find problems with the recommended settings or if you find better combinations, you are welcome to [open an issue on GitHub](https://github.com/dcomtois/summarytools/issues) to suggest modifications or make a pull request with your own improvements to this vignette. Back to top