Practical Computing for Population Health Research
EPI XXX | Fall 2099 | Stanford University
Instructor
Mathew Kiang | Department of Epidemiology and Population Health | mkiang@stanford.edu
WARNING: This is not a real course
This is a fictional course. I spent about 18 months interviewing PhD students, gathering notes, and cataloging gaps in what other Stanford courses cover. I wanted to make a class that would equip students with the basic computational skills I would desire for a potential research assistant in my lab. The goal is a class that provides students with the meta-skills of computational work specific to population health research.
I ended up being asked to teach a different course, so this is just a place for me to organize my thoughts and share them. I still think the material will be useful to new members of my lab.
It is also an exercise in getting used to incorporating AI in my workflow. I’m using AI to sort through my notes, flesh out my outlines, find areas of my lecture notes that lack clarity, draft mermaid diagrams for me, make sure the URLs are still working, convert my rmd files into qmd, etc. That said, I review everything so all mistakes are my own.
I have never actually taught any of this. I don’t know if the lectures are too long, the problem sets too easy, the slides too fast, or the notes in the wrong order. I’ll keep updating as I assemble materials, but consider yourself warned.
Use at your own risk.
Course Description
This course teaches the practical computing skills that epidemiologists and population health researchers need for transparent, reproducible, and efficient work. It is not a statistics course or a methods course — it covers the workflows, tooling, and project infrastructure that make good science possible. Students learn to organize projects, write clean code, work with data at scale, communicate results, and collaborate using modern version control. R is the primary language but the principles generalize.
Prerequisites: Comfortable with R, at least introductory statistics, basic epidemiology
Schedule
(Materials will be updated as I make them.)
| Week | Session | Type | Topic |
|---|---|---|---|
| 1 | 01 | Lecture | The Scientific Computing Workflow |
| 1 | 02 | Lecture | Your Computer and the Shell |
| 2 | 03 | Lecture | Reproducible Research and Version Control with Git |
| 2 | 04 | Lab | Git/GitHub Setup Lab |
| 3 | 05 | Lecture | R Data Types, Structures, and Control Flow |
| 3 | 06 | Lab | R Foundations Lab |
| 4 | 07 | Lecture | Tidy Data and the Split-Apply-Combine Strategy |
| 4 | 08 | Lab | Tidy Data Lab |
| 5 | 09 | Lecture | Writing Functions and Abstraction |
| 5 | 10 | Lab | Functions Lab |
| 6 | 11 | Lecture | Data Visualization and Scientific Communication |
| 6 | 12 | Lab | Visualization Lab |
| 7 | 13 | Lecture | Working with Data Larger Than Memory |
| 7 | 14 | Lab | Big Data Lab |
| 8 | 15 | Lecture | Debugging, Profiling, and Speeding Up Your Code |
| 8 | 16 | Lab | Debugging Lab |
| 9 | 17 | Lecture | Working with Messy and Non-Tabular Data |
| 9 | 18 | Lab | Messy Data Lab |
| 10 | 19 | Lecture | Pipelines, HPC, and Putting It All Together |
| 10 | 20 | Lab | Pipelines Lab |
Weekly Materials
Week 1, Session 01: The Scientific Computing Workflow
Notes · Slides · Problem Set (coming soon)
Read before class:
- Wilson et al. (2017). Good Enough Practices in Scientific Computing (§1–2)
- Bryan & Hester. What They Forgot to Teach You About R, Ch. 3: Project-oriented workflow
- Wickham, Çetinkaya-Rundel, & Grolemund (2023). R for Data Science (2e), Ch. 6: Workflow: scripts and projects
Additional reading:
- Bryan (2017). Project-oriented workflow — Short, opinionated argument against
setwd() - Wickham, Çetinkaya-Rundel, & Grolemund (2023). R for Data Science (2e), Ch. 8: Workflow: getting help — The
reprexpackage and how to ask for help - Healy (2019). The Plain Person’s Guide to Plain Text Social Science — The case for plain text workflows in research
- Noble (2009). A Quick Guide to Organizing Computational Biology Projects — The canonical directory structure paper
- Wilson et al. (2014). Best Practices for Scientific Computing — The predecessor to “Good Enough Practices,” aimed at computationally intensive researchers
- Sandve et al. (2013). Ten Simple Rules for Reproducible Computational Research — Checklist-style reproducibility guide
Week 1, Session 02: Your Computer and the Shell
Coming soon.
Resources
- R for Data Science (2e) — Wickham, Çetinkaya-Rundel, & Grolemund
- Advanced R (2e) — Wickham
- Happy Git and GitHub for the useR — Bryan, the STAT 545 TAs, & Hester
- What They Forgot to Teach You About R — Bryan & Hester
- Modern Plain Text Computing — Healy (course site, Duke SOCIOL 703)
- Quarto Documentation
Built with
All materials are authored in Quarto using RStudio. Lecture slides use the quarto-revealjs-clean extension by Grant McDermott.