Classifying clinical notes with pain assessment using machine learning Article

Fodeh, SJ, Finch, D, Bouayad, L et al. (2018). Classifying clinical notes with pain assessment using machine learning . 56(7), 1285-1292. 10.1007/s11517-017-1772-1

cited authors

  • Fodeh, SJ; Finch, D; Bouayad, L; Luther, SL; Ling, H; Kerns, RD; Brandt, C

fiu authors


  • Pain is a significant public health problem, affecting millions of people in the USA. Evidence has highlighted that patients with chronic pain often suffer from deficits in pain care quality (PCQ) including pain assessment, treatment, and reassessment. Currently, there is no intelligent and reliable approach to identify PCQ indicators inelectronic health records (EHR). Hereby, we used unstructured text narratives in the EHR to derive pain assessment in clinical notes for patients with chronic pain. Our dataset includes patients with documented pain intensity rating ratings > = 4 and initial musculoskeletal diagnoses (MSD) captured by (ICD-9-CM codes) in fiscal year 2011 and a minimal 1 year of follow-up (follow-up period is 3-yr maximum); with complete data on key demographic variables. A total of 92 patients with 1058 notes was used. First, we manually annotated qualifiers and descriptors of pain assessment using the annotation schema that we previously developed. Second, we developed a reliable classifier for indicators of pain assessment in clinical note. Based on our annotation schema, we found variations in documenting the subclasses of pain assessment. In positive notes, providers mostly documented assessment of pain site (67%) and intensity of pain (57%), followed by persistence (32%). In only 27% of positive notes, did providers document a presumed etiology for the pain complaint or diagnosis. Documentation of patients’ reports of factors that aggravate pain was only present in 11% of positive notes. Random forest classifier achieved the best performance labeling clinical notes with pain assessment information, compared to other classifiers; 94, 95, 94, and 94% was observed in terms of accuracy, PPV, F1-score, and AUC, respectively. Despite the wide spectrum of research that utilizes machine learning in many clinical applications, none explored using these methods for pain assessment research. In addition, previous studies using large datasets to detect and analyze characteristics of patients with various types of pain have relied exclusively on billing and coded data as the main source of information. This study, in contrast, harnessed unstructured narrative text data from the EHR to detect pain assessment clinical notes. We developed a Random forest classifier to identify clinical notes with pain assessment information. Compared to other classifiers, ours achieved the best results in most of the reported metrics. [Figure not available: see fulltext.].

publication date

  • July 1, 2018

Digital Object Identifier (DOI)

start page

  • 1285

end page

  • 1292


  • 56


  • 7