--- title: "qwraps2: Formatted Summary Statistics" author: "Peter E. DeWitt" output: rmarkdown::html_vignette: toc: true number_sections: true vignette: > %\VignetteIndexEntry{qwraps2: Formatted Summary Statistics} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r label = "setup", include = FALSE} knitr::opts_chunk$set(collapse = TRUE) ``` ```{r} set.seed(42) library(qwraps2) # define the markup language we are working in. # options(qwraps2_markup = "latex") is also supported. options(qwraps2_markup = "markdown") ``` # Introduction It is common for a manuscript to require a data summary table. The table might include simple summary statistics for the whole sample and for subgroups. There are several tools available to build such tables. In my opinion, though, most of those tools have nuances imposed by the creators/authors such that other users need not only understand the tool, but also think like the authors. I wrote this package to be as flexible and general as possible. I hope you like these tools and will be able to use them in your work. This vignette presents the use of the `r paste0(backtick(summary_table), ",") ` `r paste0(backtick(qsummary), ",") ` and `r backtick(qable) ` functions for quickly building data summary tables. We will be using summary statistic functions, `r paste0(backtick(mean_sd), ",") ` `r paste0(backtick(median_iqr), ",") ` `r paste0(backtick(n_perc), ",") ` and others, from `r CRANpkg(qwraps2) ` as well. ## Prerequisites Example Data Set ```{r} library(qwraps2) ``` We will use the data set `r backtick(mtcars2) ` for the examples throughout this vignette data set for examples throughout this vignette. `r backtick(mtcars2) ` is a modified and extended version of the base R data set `r paste(backtick(mtcars), ".") ` For details on the construction of the `r backtick(mtcars2) ` data set please view the vignette: `r backtick(vignette('mtcars', package = "qwraps2")) ` ```{r} data(mtcars2) str(mtcars2) ``` # Review of Summary Statistic Functions and Formatting ## Means and Standard Deviations `r backtick(mean_sd) ` returns the (arithmetic) mean and standard deviation for numeric vector as a formatted character string. For example, `r backtick(mean_sd(mtcars2$mpg)) ` returns the formatted string `r paste0(mean_sd(mtcars2$mpg), ".") ` There are other options for formatting character string: ```{r} mean_sd(mtcars2$mpg) mean_sd(mtcars2$mpg, denote_sd = "paren") ``` ## Mean and Confidence intervals If you need the mean and a confidence interval there is the function `r paste0(backtick(mean_ci), ".") ` which returns a `r backtick(qwraps2_mean_ci) ` object which is a named vector with the mean, lower confidence limit, and the upper confidence limit. The printing method for `r backtick(qwraps2_mean_ci) ` objects is a call to the `r backtick(frmtci) ` function. You an modify the formatting of printed result by adjusting the arguments pasted to `r paste0(backtick(frmtci), ".") ` ```{r} mci <- mean_ci(mtcars2$mpg) str(mci) mci print(mci, show_level = TRUE) ``` ## Median and Inner Quartile Range Similar to the `r backtick(mean_sd) ` function, the `r backtick(median_iqr) ` returns the median and the inner quartile range (IQR) of a data vector. ```{r} median_iqr(mtcars2$mpg) ``` ## Count and Percentages The `r backtick(n_perc) ` function is the workhorse. `r backtick(n_perc0) ` is also provided for ease of use in the same way that base R has `r backtick(paste) ` and `r paste(backtick(paste0), ".") ` `r backtick(n_perc) ` returns the n (%) with the percentage sign in the string, `r backtick(n_perc0) ` omits the percentage sign from the string. The latter is good for tables, the former for in-line text. ```{r} n_perc(mtcars2$cyl == 4) n_perc0(mtcars2$cyl == 4) n_perc(mtcars2$cyl_factor == 4) # this returns 0 (0.00%) n_perc(mtcars2$cyl_factor == "4 cylinders") n_perc(mtcars2$cyl_factor == levels(mtcars2$cyl_factor)[2]) # The count and percentage of 4 or 6 cylinders vehicles in the data set is n_perc(mtcars2$cyl %in% c(4, 6)) ``` ## Geometric Means and Standard Deviations Let $\left\{x_1, x_2, x_3, \ldots, x_n \right\}$ be a sample of size $n$ with $x_i > 0$ for all $i.$ Then the geometric mean, $\mu_g,$ and geometric standard deviation are $$ \begin{equation} \mu_g = \left( \prod_{i = 1}^{n} x_i \right)^{\frac{1}{n}} = b^{ \sum_{i = 1}^{n} \log_{b} x_i }, \end{equation} $$ and $$ \begin{equation} \sigma_g = b ^ { \sqrt{ \frac{\sum_{i = 1}^{n} \left( \log_{b} \frac{x_i}{\mu_g} \right)^2}{n} } } \end{equation} $$ or, for clarity, $$ \begin{equation} \log_{b} \sigma_g = \sqrt{ \frac{\sum_{i = 1}^{n} \left( \log_{b} \frac{x_i}{\mu_g} \right)^2}{n}} \end{equation} $$ When looking for the geometric standard deviation in R, the simple `r backtick(exp(sd(log(x)))) ` is not exactly correct. The geometric standard deviation uses $n,$ the full sample size, in the denominator, where as the `r backtick(sd) ` and `r backtick(var) ` functions in R use the denominator $n - 1.$ To get the geometric standard deviation one should adjust the result by multiplying the variance by $(n - 1) / n$ or the standard deviation by $\sqrt{(n - 1) / n}.$ See the example below. ```{r} x <- runif(6, min = 4, max = 70) # geometric mean mu_g <- prod(x) ** (1 / length(x)) mu_g exp(mean(log(x))) 1.2 ** mean(log(x, base = 1.2)) # geometric standard deviation exp(sd(log(x))) ## This is wrong # these equations are correct sigma_g <- exp(sqrt(sum(log(x / mu_g) ** 2) / length(x))) sigma_g exp(sqrt((length(x) - 1) / length(x)) * sd(log(x))) ``` The functions `r paste0(backtick(gmean), ",") ` `r paste0(backtick(gvar), ",") ` and `r backtick(gsd) ` provide the geometric mean, variance, and standard deviation for a numeric vector. ```{r} gmean(x) all.equal(gmean(x), mu_g) gvar(x) all.equal(gvar(x), sigma_g^2) # This is supposed to be FALSE all.equal(gvar(x), exp(log(sigma_g)^2)) gsd(x) all.equal(gsd(x), sigma_g) ``` `r backtick(gmean_sd) ` will provide a quick way for reporting the geometric mean and geometric standard deviation in the same way that `r backtick(mean_sd) ` does for the arithmetic mean and arithmetic standard deviation: ```{r} gmean_sd(x) ``` # Building a Data Summary Table The function `r backtick(summary_table) ` appears to be the most widely used tool provided by the qwraps2 package. As such, that function has earned its own vignette. ```{r eval = FALSE} vignette("qwraps2-summary-table") ``` # Session Info ```{r} print(sessionInfo(), local = FALSE) ```