class: center, middle, inverse, title-slide .title[ # Programming Tools in Data Science ] .subtitle[ ## Lecture #2: RMarkdown ] .author[ ### Samuel Orso ] .date[ ### 28 September 2023 ] --- # What is RMarkdown? <img src="images/markdown.png" style="width:150px; position:absolute; top:7%; left:45%" /> * RMarkdown: `R` + `markdown` * `markdown` contrasts `markup` languages (e.g. HTML) which require syntax that can be quite difficult to decipher for the uninitiated * RMarkdown is a framework that provides a literate programming format for data science. * **Literate programming**: programmers add narrative context with code to produce documentation for the program simultaneously. * **Reproducible research**: the whole process (collecting data, performing analysis, producing output,...) can be reproduced the same way by someone else. --- # Is there a reproducibility crisis? <img src="images/reproducibility.png" width="453" height="385" style="display: block; margin: auto;" /> --- # What is RMarkdown? > In a nutshell, R Markdown stands on the shoulders of knitr and Pandoc. The former executes the computer code embedded in Markdown, and converts R Markdown to Markdown. The latter renders Markdown to the output format you want (such as PDF, HTML, Word, and so on). .right[-- <cite>R Markdown: The Definitive Guide</cite>] <img src="images/workflow.png" width="631" height="300" style="display: block; margin: auto;" /> --- # Create a RMarkdown document Within RStudio, click `File` → `New File` → `R Markdown`. <img src="images/rmd_new.png" width="750" height="450" style="display: block; margin: auto;" /> --- # Important features of `markdown` * Three aspect: YAML metadata, text, code chunks --- # Important features of `markdown` ## 1. YAML Ain't Markup Language * Head of the document where options are defined. * Surrounded by `---` * Options comprises: author, date, output format, table of content, themes, code folding, ... --- # Important features of `markdown` ## 2. Text * Core body, essential for explaining your analysis. * Markdown syntax comprises: * emphasis (**italics**, ****bold****, or ``code style``) * headers (# ## ###) * lists (* - + for unnumbered and 1. for numbered) * hyperlinks (<> ()[]) * blockquotes (>) * picture/gif ( !()[]) * tables (usually better to use `knitr::kable()`) --- # Important features of `markdown` ## 2. Text * `\(\LaTeX\)` in RMarkdown using the syntax `$math expression$` * Cross-referencing sections using syntax `\@ref(label)` * Citations and bibliographies can automatically be generated with RMarkdown * **You can always use HTML** --- # Important features of `RMarkdown` ## 3. Code Chunks Code chunks are specific to `RMarkdown`. It allows to embed `R` code within your document. To insert these chunks within your RMarkdown file, use either: - the keyboard shortcut Ctrl + Alt + I (OS X: Cmd + Option + I) - the Add Chunk command in the editor toolbar - by typing the chunk delimiters ` ```{r label, some option}` and ` ``` ` --- # Important features of `RMarkdown` ## 3. Code Chunks Most common chunk options: - `eval`: (TRUE; logical) whether to evaluate the code chunk; - `echo`: (TRUE; logical or numeric) whether to include R source code in the output file; - `warning`: (TRUE; logical) whether to preserve warnings (produced by warning()) in the output like we run R code in a terminal (if FALSE, all warnings will be printed in the console instead of the output document); - `cache`: (FALSE; logical) whether to "*cache*" a code chunk. It may be convenient to avoid re-running the computations and save time. - `dependson`: (label) allows to refer to objects in other chunk. --- # Important features of `RMarkdown` ## 3. Code Chunks Plot figure options: - `fig.path`: ('figure/'; character) prefix to be used for figure filenames (fig.path and chunk labels are concatenated to make filenames); - `fig.show`: ('asis'; character) how to show/arrange the plots; - `fig.width`, `fig.height`: (both are 7; numeric) width and height of the plot, to be used in the graphics device (in inches) and have to be numeric; - `fig.align`: ('default'; character) alignment of figures in the output document (possible values are left, right and center; - `fig.cap`: (NULL; character) figure caption to be used in a figure environment. --- class: sydney-blue, center, middle # Play 5 minutes with `https://tinyurl.com/RMdown` <img src="images/qrcode_data-analytics-lab.shinyapps.io.png" width="300" height="300" style="display: block; margin: auto;" /> <!--- <iframe src="https://data-analytics-lab.shinyapps.io/rmarkdown/" width="100%" height="400px" data-external="1"></iframe> --> --- # Printing an output as a table with `knitr::kable()` ```r data("iris") knitr::kable(iris[1:5,]) ``` | Sepal.Length| Sepal.Width| Petal.Length| Petal.Width|Species | |------------:|-----------:|------------:|-----------:|:-------| | 5.1| 3.5| 1.4| 0.2|setosa | | 4.9| 3.0| 1.4| 0.2|setosa | | 4.7| 3.2| 1.3| 0.2|setosa | | 4.6| 3.1| 1.5| 0.2|setosa | | 5.0| 3.6| 1.4| 0.2|setosa | There are many more options that can be set to have particularly good looking table with the `knitr` and `kableExtra` packages ([click here for a detailled documentation](https://bookdown.org/yihui/rmarkdown-cookbook/kable.html)). --- # Extended table options with `knitr::kable()` and `kableExtra` ```r library(kableExtra) mtcars[1:3, 1:8] %>% kbl() %>% kable_paper(full_width = F) %>% column_spec(2, color = spec_color(mtcars$mpg[1:3]), link = "https://haozhu233.github.io/kableExtra/") %>% column_spec(6, color = "white", background = spec_color(mtcars$drat[1:3], end = 0.7), popover = paste("am:", mtcars$am[1:3])) ``` <table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; width: auto !important; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> mpg </th> <th style="text-align:right;"> cyl </th> <th style="text-align:right;"> disp </th> <th style="text-align:right;"> hp </th> <th style="text-align:right;"> drat </th> <th style="text-align:right;"> wt </th> <th style="text-align:right;"> qsec </th> <th style="text-align:right;"> vs </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Mazda RX4 </td> <td style="text-align:right;color: rgba(68, 1, 84, 1) !important;"> <a href="https://haozhu233.github.io/kableExtra/" style="color: rgba(68, 1, 84, 1) !important;"> 21.0 </a> </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 160 </td> <td style="text-align:right;"> 110 </td> <td style="text-align:right;color: white !important;background-color: rgba(67, 191, 113, 1) !important;" data-toggle="popover" data-container="body" data-trigger="hover" data-placement="right" data-content="am: 1"> 3.90 </td> <td style="text-align:right;"> 2.620 </td> <td style="text-align:right;"> 16.46 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> Mazda RX4 Wag </td> <td style="text-align:right;color: rgba(68, 1, 84, 1) !important;"> <a href="https://haozhu233.github.io/kableExtra/" style="color: rgba(68, 1, 84, 1) !important;"> 21.0 </a> </td> <td style="text-align:right;"> 6 </td> <td style="text-align:right;"> 160 </td> <td style="text-align:right;"> 110 </td> <td style="text-align:right;color: white !important;background-color: rgba(67, 191, 113, 1) !important;" data-toggle="popover" data-container="body" data-trigger="hover" data-placement="right" data-content="am: 1"> 3.90 </td> <td style="text-align:right;"> 2.875 </td> <td style="text-align:right;"> 17.02 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> Datsun 710 </td> <td style="text-align:right;color: rgba(253, 231, 37, 1) !important;"> <a href="https://haozhu233.github.io/kableExtra/" style="color: rgba(253, 231, 37, 1) !important;"> 22.8 </a> </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 108 </td> <td style="text-align:right;"> 93 </td> <td style="text-align:right;color: white !important;background-color: rgba(68, 1, 84, 1) !important;" data-toggle="popover" data-container="body" data-trigger="hover" data-placement="right" data-content="am: 1"> 3.85 </td> <td style="text-align:right;"> 2.320 </td> <td style="text-align:right;"> 18.61 </td> <td style="text-align:right;"> 1 </td> </tr> </tbody> </table> --- # Mathpix to easily insert math equation in `\(\LaTeX\)` <blockquote> Mathpix Snip digitizes handwritten or printed text, and copies outputs to the clipboard that can be pasted into LaTeX editors like Overleaf, Markdown editors like Typora, Microsoft Word, and more. .right[-- <cite>Mathpix Snip</cite>] </blockquote> <div align="center"> <iframe width="560" height="315" src="https://www.youtube.com/embed/Pc_6aKPYBwQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> </div> Find more informations [here](https://mathpix.com/). --- # `xaringan::infinite_moon_reader()` for live preview of your document Instant preview without fully rebuilding HTML, and the linked navigation <img src="images/gif_inf_m_r.gif" width="606" height="360" style="display: block; margin: auto;" /> --- # From RMarkdown to Quarto <img src="images/quarto.png" style="width:250px; position:absolute; top:9%; left:55%" /> - Similar to RMarkdown, it can render docs that contain code in R, Python, Julia,... - It combines functionalities of RMarkdown and other packages into a **single system**, very useful to collaborate with people who write in a different programming language from you (and do not necessarily have R/R Studio). > Like R Markdown, Quarto uses Knitr to execute R code, and is therefore able to render most existing Rmd files without modification. > <cite>quarto.org/</cite> --- class: sydney-blue, center, middle # Question ? .pull-down[ <a href="https://ptds.samorso.ch/"> .white[<svg viewBox="0 0 384 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M369.9 97.9L286 14C277 5 264.8-.1 252.1-.1H48C21.5 0 0 21.5 0 48v416c0 26.5 21.5 48 48 48h288c26.5 0 48-21.5 48-48V131.9c0-12.7-5.1-25-14.1-34zM332.1 128H256V51.9l76.1 76.1zM48 464V48h160v104c0 13.3 10.7 24 24 24h104v288H48z"></path></svg> website] </a> <a href="https://github.com/ptds2023/"> .white[<svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> GitHub] </a> ] --- # In-class exercise (10 minutes) Basic manipulations: 1. Create a RMarkdown HTML document in `RStudio` and "`knit`" it. 1. Create a new header of type 2. 1. Make a linear regression with "Sepal Length" as a response and "Sepal Width" as an explanatory variable from the `iris` dataset and save the result. 1. Highlight the code with `monochrome` style. 1. Print the summary of the linear regression. 1. Include the QQplot from the linear regression. Change to filled dots. 1. Print the head of the `iris` dataset with `kable`. 1. Remove the `.` from the labels (click [here](https://bookdown.org/yihui/rmarkdown-cookbook/kable.html#change-column-names)). --- # In-class exercise (10 minutes) More advanced manipulations: 1. Install `kableExtra`. And perform the examples shown in the slides with `iris` dataset. 1. Using Mathpix, reproduce equation (6.1) of the paper [https://arxiv.org/abs/math/0303109](https://arxiv.org/abs/math/0303109) 1. Add the Reference and cite it in the RMarkdown. 1. Recreate your RMarkdown into a Quarto document. --- # To go further - <https://www.markdownguide.org/> - <https://rmarkdown.rstudio.com/> - [R Markdown Cookbook](https://bookdown.org/yihui/rmarkdown-cookbook/) by Yihui Xie, Christophe Dervieux, Emily Riederer - [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/) by Yihui Xie, J. J. Allaire, Garrett Grolemund - Visit [bookdown.org]()