Welcome

This website contains materials for Biostatistics Workshop at the 2022 SSC Annual Meeting Online.

In this workshop, we introduce methods for phenotyping with electronic health record (EHR) data.

Electronic health records phenotyping [Slides]

Slides and code can be found for an example phenotyping problem below.

Module 1 Introduction [Slides] [Rmd]
Module 2 Supervised learning [Slides] [Rmd]
Module 3 Semi-supervised learning (PheCAP) [Slides] [Rmd]
Module 4 Alternative approaches [Slides] [Rmd]

PheCAP

We use real EHR data from PheCAP, an R package that implements high-throughput phenotyping using a common automated pipeline.

Required packages

Please copy and paste the following code into R Console to check and load the packages. We are using R version \(\geq\) 3.6.0.

# Specify the packages from CRAN.
packages <- c(
    "PheCAP",     
    "PheNorm",  
    "MAP", 
    "glmnet",     
    "tidyverse",  
    "ggplot2",  
    "data.table", 
    "mltools",  
    "pROC",
    "parallel",
    "randomForestSRC",
    "SVMMaj"
)

# Load the packages.
# Missing package(s) will be first installed and loaded.  
package.check <- lapply(
  packages,
  FUN = function(x) {
    if (!require(x, character.only = TRUE)) {
      install.packages(x, dependencies = TRUE)
      library(x, character.only = TRUE)
    }
  }
)

Implementation

To start with, please git clone the repository to local.

git clone git@github.com:jlgrons/EHR-Phenotyping-Workshop.git

Resources

Publicly available EHR dataset
- PhysioNet
- National NLP Clinical Challenges (n2c2) NLP competitions
NLP software

Acknowledgments

This website was made with the distill package, drawn mostly from Silvia Canelón’s Sharing Your Work with xaringan.

Electronic Health Records Phenotyping