EPI XXX: Practical Computing for Population Health Research — Session 01
March 16, 2026
By the end of this lecture, you should be able to:
here::here() to construct file paths instead of setwd().Best Practices (2014)
Good Enough (2017)
This course focuses on the “good enough” practices and will sometimes expose you to the aspirational “best practices.”
Every manuscript integrates hundreds of decisions — which records to exclude, how to handle missing data, which model specification you settled on after trying alternatives.
These decisions are the intellectual core of the analysis. In many projects, they exist only in the analyst’s memory or uncommented scripts.
A research project includes the code, the data, the computational environment, and the documentation — the manuscript is one output among several.
Scripts break when they move between computers. The script might depend on:
These are not exotic bugs. They are the regular consequences of conflating your personal computing environment with the project’s requirements.
Sandve et al. distill reproducibility to a handful of rules:
These rules sound obvious but are rarely applied.
Well-organized ✅
mortality_analysis/
├── mortality_analysis.Rproj
├── README.md
├── config.yml
├── renv.lock
├── .gitignore
├── code/
│ ├── 01_ingest_raw_data.R
│ ├── 02_create_analytic_data.R
│ ├── 03_fit_models.R
│ ├── 04_fig1_trends.R
│ └── utils.R
├── data/
├── data_raw/
├── data_private/
├── output/
├── plots/
├── qmd/
├── lit/
└── manuscript/
Disorganized ❌
stuff/
├── analysis_FINAL.R
├── analysis_FINAL_v2.R
├── analysis_FINAL_v2_ACTUALLY_FINAL.R
├── data.csv
├── data2.csv
├── data_new.csv
├── fig1.png
├── Untitled.R
└── notes.docx
. . .
The first directory tells you what the project contains, how the code should be executed, and where to look for results. The second tells you almost nothing.
setwd() anti-patternIf the first line of your R script is
setwd("C:\Users\jenny\..."), I will come into your office and SET YOUR COMPUTER ON FIRE 🔥.
rm(list = ls()) mythrm(list = ls()) clears objects but does NOT:
library() calls persist)options() you changedCtrl+Shift+F10Cmd+Shift+F10RStudio setting
Go to Tools → Global Options → General: uncheck “Restore .RData into workspace at startup” and set “Save workspace to .RData on exit” to Never. This ensures every R session starts clean.
.Rproj files.Rproj files set the working directory for you. Open one, and RStudio:
No hard-coded paths needed.
To create one: File → New Project → New Directory (or Existing Directory). The .Rproj file sits at the root of your project directory and acts as an anchor.
Tip
Open RStudio, create a new project, and confirm here::here() returns the project root. Follow along.
here packagelibrary(here)
# Finds the project root automatically
here::here()
#> [1] "/Users/matt/projects/mortality_analysis"
# Build paths relative to the project root
dat <- readr::read_csv(here::here("data_raw", "raw_deaths.csv"))
# Same pattern for saving
readr::write_csv(result_df, here::here("data", "cleaned_deaths.csv"))How it works — walks up the directory tree to find an anchor file (.Rproj, .here, .git, among others) and builds paths from there.
Every file path in your scripts should use here::here(). Every project should have an .Rproj file.
Your workflow is personal and ephemeral — which text editor you use, how you organize your desktop, where on your hard drive you keep your projects.
Your product is what you share with the world — the R scripts, the data, the README, the manuscript.
Your product should not depend on your workflow. If your script requires knowledge of your personal file system layout to run, you have embedded your workflow into your product.
A well-organized project separates files by function:
| Directory | Purpose |
|---|---|
code/ |
R scripts, numbered for execution order |
data_raw/ |
Raw input data — never modified |
data/ |
Processed, shareable intermediate datasets |
data_private/ |
Restricted data under DUA (gitignored) |
output/ |
Tables, model objects, logs |
plots/ |
Publication-ready figures (PDF, PNG) |
qmd/ |
Quarto / R Markdown documents |
lit/ |
Reference PDFs (gitignored) |
manuscript/ |
Manuscript drafts |
Each script should do exactly one thing. Read inputs, do the work, save outputs.
02_clean_data.R produces data/clean_deaths.RDS, then 03_fit_models.R reads that fileFour benefits:
targets, Session 19)The number prefix handles order. The slug handles content.
Vague slugs (avoid)
01_data.R
02_analysis.R
03_results.R
04_figure.R
Descriptive slugs (prefer)
01_download_mortality_data.R
02_clean_county_covariates.R
03_fit_apc_models.R
04_fig_trends_by_state.R
Verb-noun pattern: the verb says what the script does; the noun says to what. The good slugs read like a pipeline summary — download, clean, model, plot.
## 01_ingest_raw_data.R ----
##
## Download raw NCHS mortality data from CDC WONDER and save
## as a compressed RDS file. Requires internet access.
## Input: CDC WONDER API
## Output: data_raw/raw_deaths_1999_2020.RDS
## Imports ----
library(tidyverse)
library(here)
## Constants ----
START_YEAR <- 1999
END_YEAR <- 2020
## Download ----
# ... (download code would go here)
## Save ----
saveRDS(raw_df, here::here("data_raw", "raw_deaths_1999_2020.RDS"),
compress = "xz")Conventions: header block with inputs/outputs, ## Section ---- markers for RStudio’s outline (Ctrl+Shift+O / Cmd+Shift+O), UPPER_SNAKE_CASE constants, here::here() paths.
A .csv from 1995 is still readable today.
Try that with .sav (SPSS 12) or .xlsx (Excel 2003).
.R scripts run on macOS, Windows, Linux.
.csv files work in R, Python, Stata, SAS, Excel.
Git tracks line-by-line changes in text files.
Binary files (.docx, .xlsx) are opaque to Git.
Your primary workflow should be plain text. Binary formats are not always wrong — but they should be outputs, not inputs.
Every project needs a README.md. It answers three questions: what does this project do, how do I run it, where is the data?
# Mortality Trends Analysis
## Overview
Analysis of US county-level mortality trends, 1999-2020.
## Requirements
- R >= 4.3.0
- See `renv.lock` for package dependencies
## Reproducing the Analysis
Run scripts in `code/` in numbered order:
1. `01_ingest_raw_data.R` — downloads and caches raw NCHS data
2. `02_create_analytic_data.R` — cleans and reshapes
3. `03_fit_models.R` — fits age-period-cohort models
4. `04_fig1_trends.R` — generates Figure 1Twenty lines orient a reader completely. Update continuously as the project evolves.
.qmd — plain text, version-controllable, reproducibleYour code will break. When it does, a minimal reproducible example (reprex) is the fastest path to effective help.
The key word is reproducible. If the person helping you cannot reproduce the problem, they cannot diagnose it.
reprex packageWhy? Comparisons with NA using == or != always return NA, not TRUE/FALSE.
A good reprex gets useful answers almost anywhere:
The venue matters less than the quality of your question.
Session 02: Your Computer and the Shell
Reading for Session 02:
Note
No lab this week. The hands-on project setup lab begins in Session 04 after we cover Git and GitHub.
here::here() resolves pathsEPI XXX | Session 01