An R package for working with multiple cause of death micro-data.
This package is still in the alpha stage.
We cannot emphasize this enough. Nothing is guaranteed to work. Please submit an issue if you find a bug.
Certain types of deaths, including drug overdoses or opioid-related deaths, are defined by an ICD code in both the underlying cause field and one of the twenty possible contributory cause fields. Therefore, in order to tabulate these deaths, researches cannot use compressed mortality files (CMF) (which contain only underlying cause of death), but rather must use multiple cause of death (MCOD) data.
This simple package aims to make common operations — such as downloading, munging, and cleaning — on (inherently messy) MCOD data easier.
Additionally, this package includes data necessary for calculating rates. Specifically, standard populations and annual US population counts from 1979 to 2015. Note that if you are only using 1990 to current, the NVSS Bridged Race files are preferred.
This package is largely the result of our internal code getting reused for multiple papers — therefore, the scope and usefulness of the code is likely limited. We’re releasing it publicly in the hopes that other researchers will learn from our mistakes.
This package is not available on CRAN. Use
devtools to install:
Ten lines of code to load packages, download the
csv file, load it, and calculate the number of US residents who died from opioids, by sex, in 2015.
library(tidyverse) library(narcan) download_mcod_csv(2015, "./temp_data") mcod_2015 <- read_csv("./temp_data/mort2015.csv.zip") mcod_2015 %>% subset_residents() %>% unite_records() %>% flag_opioid_deaths() %>% group_by(sex) %>% summarize(deaths_involving_opioids = sum(opioid_death)) # # A tibble: 2 x 2 # sex deaths_involving_opioids # <chr> <dbl> # 1 F 11420 # 2 M 21671
More examples soon.
Standard populations are held in the
std_pops dataframe while annual population estimates (by race, sex, and age) from 1979 to 2015 are held in the
There are also several wiki examples on how to use
It is worth noting that there are several important irregularities in the data. This package addresses some while others are simply the way the data are.
rnifla_column indicates a nature of injury flag for the corresponding
Ncode (nature of injury) while a
0represents all other codes (e.g.,
Efor external causes or
rnifla_while others call it
csvfiles from NBER contain encoding errors. We suggest downloading files as
dtafor ICD-9 years and
csvfiles for ICD-10 years.
Fand others as
Multiple cause of death data (in multiple formats), documentation, dictionaries, and other information are stored on the National Bureau of Economic Research (NBER) website.
Standard populations are stored on the Surveillance, Epidemiology, and End Results (SEER) section of the National Cancer Institute website.
THe annual US population estimates come from the United States Census Bureau’s Population Estimates Program (PEP).