The targets package for academic research

This document provides an example of how the targets package can benefit academic projects, where the goal is to write a journal article or related report. It illustrates how targets works better than caches in Rmarkdown and knitr, and it provides a brief discussion of how to get started.

pipelines
targets
statistics
R
Author
Affiliation
Published

February 29, 2024

I heard about the targets package (Landau 2021b) a few times in the past, the most notable being Will Landau’s presentation at StanCon 2023. As an academic researcher whose projects often lead to journal articles, I did not immediately see the advantage of targets over dynamic report generation packages like Rmarkdown and knitr. This case study illustrates how targets provides useful features that cannot be handled by dynamic report generation alone.

Illustrating the Cache Problem

The targets overview vignette states that the package “reduces the burdens of repeated computation and manual data micromanagement.” To me, that sounded similar to using Rmarkdown with cache = TRUE: I cache the results when I compile the Rmarkdown document, so that I do not have to repeatedly run the same computations.

But the catch is that the Rmarkdown cache is not intelligent, in the sense that it cannot tell when a change to one code chunk influences another code chunk. This leads to cached results that are stale, in the sense the the cached results no longer match what would happen if you re-ran the entire Rmarkdown document from scratch.

To see what I mean, consider the following three code chunks that could plausibly appear in an Rmarkdown file:

```{r}
set.seed(311)
x <- 1:10
y <- x + rnorm(10, sd = .5)
```

```{r, out.height = "2.5in", fig.align = "center"}
plot(x, y)
```

```{r, cache = TRUE}
summary(lm(y ~ x))
```

Notice that the third chunk is cached, while the other two chunks are not.

Imagine that I compile these three code chunks to pdf or docx or some other format. Then I do extra editing and discover that I am missing a negative sign in my first code chunk; that chunk should actually be:

```{r}
set.seed(311)
x <- 1:10
y <- -x + rnorm(10, sd = .5)
```

When I make this change and recompile, the plot will be updated but not the regression summary. This is because the chunk containing the regression summary was already cached.

Solution

At this point, you might say that I was stupid to cache the regression summary, if it was possible for the data to change in the future. That is true, but for larger documents with many code chunks (like journal articles), it is difficult to anticipate what may change in the future, because data and analyses might be added and removed over the life of a project. We could decide to never cache anything, but then it can take a really long time to compile the document. It is better to have a system that will re-run code only when needed, and this is where targets helps.

Getting Started

The targets walkthrough is helpful and illustrates the general process for using targets with your project. But for academic documents, the one thing missing from the walkthrough is the ability to funnel your results into an Rmd or Rnw file. This missing functionality is found in the tar_render() function from the tarchetypes package (Landau 2021a), and an example of how to use it is at this link. The tar_render() function allows you to pass results to an Rmd file for automatic document generation. And when there are changes to the project, targets will know what things need to be re-run and what things don’t need to be re-run.

Conclusion

A good deal of recent attention has been devoted to errors in academic articles. While the focus is often on statistical methodology, cheating, and experimental design, a more mundane source of errors involves outdated and incoherent results. These happen when one part of a project changes, the changes have further implications for downstream results, and the researcher doesn’t fully realize or address the further implications. I speculate that these mundane errors are common because, when the goal is to publish as quickly as possible, errors are bound to slip through the cracks. In this context, we could view targets as an extra research assistant who double-checks that all project results are up to date.

Like other software tools, you have to invest some time in setting up targets for a project. You need to create the _targets.R file, which defines all the targets/steps of one’s project. I also found it helpful to put free-standing code inside functions so that the code can be called more easily. These extra steps will undoubtedly deter some researchers from regularly using targets, and it may not be worth it for some short-term analyses. But the targets setup is like compounding interest for project efficiency, making your life easier as the project drags on through journal submissions, rejections, revisions, and resubmissions.

License

The code on this page is copyrighted by Edgar Merkle and licensed under the GPLv3 license:

https://www.gnu.org/licenses/gpl-3.0.en.html

The text and figures on this page are copyrighted by Edgar Merkle and licensed under the CC BY-NC 4.0 license:

https://creativecommons.org/licenses/by-nc/4.0/

Computing Environment

sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3 
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Chicago
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] targets_1.3.2

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       cli_3.6.2         knitr_1.45        rlang_1.1.3      
 [5] xfun_0.42         processx_3.8.3    jsonlite_1.8.8    data.table_1.15.0
 [9] glue_1.7.0        backports_1.4.1   htmltools_0.5.7   ps_1.7.6         
[13] fansi_1.0.6       rmarkdown_2.25    evaluate_0.23     tibble_3.2.1     
[17] base64url_1.4     fastmap_1.1.1     yaml_2.3.8        lifecycle_1.0.4  
[21] compiler_4.3.2    codetools_0.2-19  igraph_2.0.2      htmlwidgets_1.6.4
[25] pkgconfig_2.0.3   digest_0.6.34     R6_2.5.1          tidyselect_1.2.0 
[29] utf8_1.2.4        pillar_1.9.0      callr_3.7.5       magrittr_2.0.3   
[33] tools_4.3.2       bspm_0.5.5       

References

Landau, William Michael. 2021a. tarchetypes: Archetypes for Targets.
———. 2021b. “The targets R Package: A Dynamic Make-like Function-Oriented Pipeline Toolkit for Reproducibility and High-Performance Computing.” Journal of Open Source Software 6 (57): 2959. https://doi.org/10.21105/joss.02959.

Reuse

Citation

BibTeX citation:
@online{merkle2024,
  author = {Merkle, Edgar C.},
  title = {The *Targets* Package for Academic Research},
  date = {2024-02-29},
  url = {https://ecmerkle.github.io/cs/targets.html},
  langid = {en}
}
For attribution, please cite this work as:
Merkle, Edgar C. 2024. “The *Targets* Package for Academic Research.” February 29, 2024. https://ecmerkle.github.io/cs/targets.html.