Predictive Modeling for Differential Diagnosis and Mortality Risk Assessment

Lindsey, Tony; Vega, Saul; Veazey, Sena; Salinas, Jose

The prevalence of electronic health record (EHR) systems has brought prodigious biomedical informatics opportunity. Automated machine learning methods can effectively utilize such data and have become common tools for healthcare predictive modeling. Researches in medical informatics have explored the potential of deep learning and classical models in emergent care scenarios. In particular, predicting differential diagnoses for admissions have proven useful in decreasing unnecessary lab tests and improving inpatient triage decision-making. Moreover, identification of high-risk patients for in-hospital mortality is vitally important to maximize allocation of medical resources.The Medical Information Mart for Intensive Care (MIMIC-III) database, containing de-identified critical care inpatient was used in our study. This data set captures hospital patient laboratory measurements, pharmacologic prescriptions, diagnostic data and procedure event recordings. When considering adult patients and discounting admissions with ICU length of stay less than 24 hours, there were 37,787 unique admissions and 30,414 total patients. We examined the top 25 most prevalent ICD-9 group-level disease specificities in MIMIC-III using a multi-label classification model. In-hospital mortality was modeled as binary classification with 4,155 (13%) adult patients that expired, of which 3,138 (75.5%) were in the ICU setting. The metrics AUC, F1 score, sensitivity and specificity values calculated for each disease label measured prediction performance.The usage of ICD-9 group codes reduced feature dimension from 14,567 to 942 and greatly improved distribution of patient diagnostic categories. Disease temporal patterns were captured by considering the most frequently sampled 6 vital signs and 13 laboratory values. Missing data were imputed at each time-stamp. Time-series raw hourly average values were converted into 5 summary features (mean, standard deviation, number of observations, min & max values). Patient demographic variables such as age, gender, marital status and ethnicity were also factored into the modeling. Choi et al showed that contextual embedding of medical data, diagnostic and procedural codes alone can predict future diagnoses with sensitivity as high as 0.79. We utilized an embedding technique called word2vec which allowed sparse representations of medical history to be transformed into dense word vectors. The mappings captured contextual information by treating each admission as a sentence and learning the most likely neighboring words in a sliding window fashion. Binary and multi-label classification was achieved via collapse models, which do not consider temporal information, as well as recurrent neural networks with regularization, Softmax output layer activation together with categorical cross-entropy as the loss function.

Document ID

20190030812

Acquisition Source

Ames Research Center

Document Type

Abstract

Authors

Date Acquired

September 16, 2019

Publication Date

August 19, 2019

Subject Category

Report/Patent Number

Meeting Information

Meeting: Military Health System Research Symposium (MHSRS)

Location: Kissimmee, FL

Country: United States

Start Date: August 19, 2019

End Date: August 22, 2019

Sponsors: Army Medical Research and Development Command

Distribution Limits

Public

Public Use Permitted.

Technical Review

Single Expert

Keywords

Available Downloads

Name

Type

20190030812.pdf

STI

No Preview Available

NTRS

NTRS - NASA Technical Reports Server

Available Downloads

Related Records