{"id":1521,"date":"2020-01-24T22:29:22","date_gmt":"2020-01-24T22:29:22","guid":{"rendered":"https:\/\/www.danielparente.net\/en\/2020\/01\/24\/benchmarking-deep-learning-architectures-for-predicting-readmission-to-the-icu-and-describing-patients-at-risk\/"},"modified":"2020-01-24T22:29:22","modified_gmt":"2020-01-24T22:29:22","slug":"benchmarking-deep-learning-architectures-for-predicting-readmission-to-the-icu-and-describing-patients-at-risk","status":"publish","type":"post","link":"https:\/\/www.danielparente.net\/en\/2020\/01\/24\/benchmarking-deep-learning-architectures-for-predicting-readmission-to-the-icu-and-describing-patients-at-risk\/","title":{"rendered":"Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-Risk"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div id=\"Sec2-content\">\n<h3 class=\"c-article__sub-heading u-h3\" id=\"Sec3\">Study population<\/h3>\n<p>The algorithms were evaluated on the publicly available MIMIC-III data set (ethics approval was not required)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 32\" title=\"Johnson, A. E. et al. MIMIC-III, a freely accessible critical care database. Scientific data 3, 160035, &#010;https:\/\/doi.org\/10.1038\/sdata.2016.35&#010;&#010; (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR32\" id=\"ref-link-section-d10732e558\" target=\"_blank\" rel=\"noopener\">32<\/a><\/sup>. This data set comprises deidentified health data associated with 61,532 ICU stays and 46,476 critical care patients at Beth Israel Deaconess Medical Center in Boston, Massachusetts between 2001 and 2012.<\/p>\n<p>The supervised learning task consists of predicting, for a given ICU stay, whether the patient will be readmitted to the ICU within 30 days from discharge. Patients were excluded if they died during the ICU stay (N\u2009=\u20094,787 ICU stays), were not adults (18 years old or older) at the time of discharge (N\u2009=\u20098,129 ICU stays) or died within 30 days from discharge without being readmitted to the ICU (N\u2009=\u20093,318 ICU stays). The final data set comprised 45,298 ICU stays for 33,150 patients, labelled as either positive (N\u2009=\u20095,495) or negative (N\u2009=\u200939,803) depending on whether a patient did or did not experience readmission within 30 days from discharge. To develop and evaluate the algorithms, patients were subdivided randomly into training and validation (90%) and test sets (10%). This subdivision was based on patient identifiers and not on ICU stay identifiers to prevent information leaks between data sets (since the prediction is based on the entire clinical history of a patient).<\/p>\n<h3 class=\"c-article__sub-heading u-h3\" id=\"Sec4\">Model variables<\/h3>\n<p>The EMR of a patient can be represented as a set of static variables and timestamped codes. In the present study, static variables included the patient\u2019s gender, age, ethnicity, insurance type, marital status, the previous location of the patient prior to arriving at the hospital (admission location), and whether the patient was admitted for elective surgery. Both length of ICU stay and length of hospital stay prior to ICU admission were recorded. An additional static variable was given by the number of ICU admissions in the year preceding the considered index ICU stay.<\/p>\n<p>Data types of timestamped codes included international classification of diseases and related health problems (ICD-9) diagnosis and procedure codes, prescribed medications, and patient vital signs. All diagnosis and procedure codes in the clinical history of a patient were considered for predictive purposes; however, prescribed medications and recorded vital signs were restricted to the ICU stay of interest. Following the OASIS severity of illness score<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 29\" title=\"Johnson, A. E., Kramer, A. A. &amp; Clifford, G. D. A new severity of illness scale using a subset of acute physiology and chronic health evaluation data elements shows comparable predictive accuracy. Critical care medicine 41, 1711&#x2013;1718 (2013).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR29\" id=\"ref-link-section-d10732e576\" target=\"_blank\" rel=\"noopener\">29<\/a><\/sup>, assessed vital signs comprised the Glasgow Coma Scale score (sum of eye response, verbal response, motor response components), heart rate, mean arterial pressure, respiratory rate, body temperature, urine output, and whether the patient necessitated ventilation. Continuous measurements of vital signs were categorised in the same manner as in OASIS and assigned corresponding codes<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 29\" title=\"Johnson, A. E., Kramer, A. A. &amp; Clifford, G. D. A new severity of illness scale using a subset of acute physiology and chronic health evaluation data elements shows comparable predictive accuracy. Critical care medicine 41, 1711&#x2013;1718 (2013).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR29\" id=\"ref-link-section-d10732e580\" target=\"_blank\" rel=\"noopener\">29<\/a><\/sup>. To reduce redundant information, whenever the same vital sign-related code was recorded consecutively more than once, only the latest observation was kept in the data.<\/p>\n<p>Elapsed times, measured in days, associated with diagnosis and procedure codes were based on the date and time of discharge from the corresponding hospital admission. Elapsed times, measured in hours, associated with medications and vital signs were based on the date and time of prescription start and measurement, respectively. In the present study, the simplifying assumption is made that diagnosis and procedure codes are available immediately at the time of discharge from the ICU. Categorical values of static variables or timestamped codes associated with less than 100 ICU stays were re-labelled as \u201cother\u201d.<\/p>\n<h3 class=\"c-article__sub-heading u-h3\" id=\"Sec5\">Artificial neural network architectures<\/h3>\n<p>Several \u201cdeep\u201d neural network architectures for predicting a patient\u2019s risk of readmission to the ICU were implemented and compared. To make this comparison as fair as possible, all architectures shared a similar high level structure: (1) timestamped codes were mapped to vector embeddings; (2) numerical scores associated with diagnosis and procedure codes, and with medication and vital sign codes, were computed using attention mechanisms and\/or recurrent layers; (3) these scores were concatenated with the static variables and passed on to a \u201clogistic regression layer\u201d (i.e. a fully connected layer with a sigmoid activation function). Further details about individual network components are reported in the following sections.<\/p>\n<h4 class=\"c-article__sub-heading u-h3 c-article__sub-heading--light\" id=\"Sec6\">Embeddings<\/h4>\n<p>Diagnosis and procedure codes, as well as medication and vital sign codes, were mapped to corresponding \u201cembeddings\u201d (real-valued vectors). The size of these embeddings was set proportional to the fourth root of the total number of codes in the dictionary (diagnoses\/procedures and medications\/vital signs were processed separately since they were measured on different time scales)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\" title=\"Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. npj Digital Medicine 1, 18, &#010;https:\/\/doi.org\/10.1038\/s41746-018-0029-1&#010;&#010; (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR7\" id=\"ref-link-section-d10732e602\" target=\"_blank\" rel=\"noopener\">7<\/a><\/sup>. Time-aware code embeddings were computed in three different manners. A first approach used MCEs with time-aware attention<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 19\" title=\"Cai, X. et al. Medical concept embedding with time-aware attention. arXiv preprint arXiv:1806.02873 (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR19\" id=\"ref-link-section-d10732e606\" target=\"_blank\" rel=\"noopener\">19<\/a><\/sup>. MCEs are based on the continuous bag-of-words model<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 33\" title=\"Mikolov, T., Chen, K., Corrado, G. &amp; Dean, J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR33\" id=\"ref-link-section-d10732e610\" target=\"_blank\" rel=\"noopener\">33<\/a><\/sup>, but instead of using fixed-sized temporal windows to determine a code\u2019s context, attention mechanisms learn the temporal scope of a code together with its embedding. A second approach optimised an embedding matrix at the same time as the other parameters of the network and, optionally, concatenated the elapsed times to the resulting vectors. A third approach optimised an embedding matrix at the same time as the other parameters of the network and modelled the dynamics in time of the computed embeddings using neural ODEs<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Chen, T. Q., Rubanova, Y., Bettencourt, J. &amp; Duvenaud, D. K. In Advances in neural information processing systems. 6571&#x2013;6583 (2018).\" href=\"https:\/\/www.nature.com\/#ref-CR16\" id=\"ref-link-section-d10732e614\" target=\"_blank\" rel=\"noopener\">16<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Rubanova, Y., Chen, R. T. &amp; Duvenaud, D. Latent odes for irregularly-sampled time series. arXiv preprint arXiv:1907.03907 (2019).\" href=\"https:\/\/www.nature.com\/#ref-CR17\" id=\"ref-link-section-d10732e614_1\" target=\"_blank\" rel=\"noopener\">17<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 18\" title=\"Dupont, E., Doucet, A. &amp; Teh, Y. W. Augmented neural odes. arXiv preprint arXiv:1904.01681 (2019).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR18\" id=\"ref-link-section-d10732e617\" target=\"_blank\" rel=\"noopener\">18<\/a><\/sup>. More in detail, the embedding of a code at time zero (i.e. at the time of discharge from the ICU) was stored in the embedding matrix whereas the embedding of a code recorded before discharge was computed by solving an initial value problem where derivatives with respect to time were approximated by a multilayer perceptron.<\/p>\n<h4 class=\"c-article__sub-heading u-h3 c-article__sub-heading--light\" id=\"Sec7\">Attention and\/or recurrent layers<\/h4>\n<p>The sequence of code embeddings associated with a patient is usually of arbitrary length and needs to be integrated into a fixed-size vector for further processing. Attention mechanisms, such as dot-product attention<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 34\" title=\"Yang, Z. et al. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1480&#x2013;1489 (2016).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR34\" id=\"ref-link-section-d10732e629\" target=\"_blank\" rel=\"noopener\">34<\/a><\/sup>, compute a weighted average of the code embeddings, where a higher weight is assigned to the most relevant codes. Alternatively, recurrent layers iteratively process an input sequence of codes and, at each iteration, update an internal memory state and generate an output vector<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 8\" title=\"Hochreiter, S. &amp; Schmidhuber, J. Long short-term memory. Neural computation 9, 1735&#x2013;1780, &#010;https:\/\/doi.org\/10.1162\/neco.1997.9.8.1735&#010;&#010; (1997).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR8\" id=\"ref-link-section-d10732e633\" target=\"_blank\" rel=\"noopener\">8<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 9\" title=\"Cho, K. et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR9\" id=\"ref-link-section-d10732e636\" target=\"_blank\" rel=\"noopener\">9<\/a><\/sup>. Information may be integrated for further processing by using either the final memory state of the recurrent cell or by applying an attention mechanism to the set of output vectors<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Choi, E. et al. In Advances in Neural Information Processing Systems. 3504&#x2013;3512 (2016).\" href=\"https:\/\/www.nature.com\/#ref-CR5\" id=\"ref-link-section-d10732e640\" target=\"_blank\" rel=\"noopener\">5<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Zhang, J., Kowsari, K., Harrison, J. H., Lobo, J. M. &amp; Barnes, L. E. Patient2Vec: A Personalized Interpretable Deep Representation of the Longitudinal Electronic Health Record. IEEE Access, &#010;https:\/\/doi.org\/10.1109\/ACCESS.2018.2875677&#010;&#010; (2018).\" href=\"https:\/\/www.nature.com\/#ref-CR6\" id=\"ref-link-section-d10732e640_1\" target=\"_blank\" rel=\"noopener\">6<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 7\" title=\"Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. npj Digital Medicine 1, 18, &#010;https:\/\/doi.org\/10.1038\/s41746-018-0029-1&#010;&#010; (2018).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR7\" id=\"ref-link-section-d10732e643\" target=\"_blank\" rel=\"noopener\">7<\/a><\/sup>. Specifically, in this work, recurrent cells were implemented using bi-directional gated recurrent units<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 9\" title=\"Cho, K. et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR9\" id=\"ref-link-section-d10732e647\" target=\"_blank\" rel=\"noopener\">9<\/a><\/sup>. Time-related information was taken into account by concatenating the time differences between observations to the embedding vectors, by applying an exponential decay proportional to the time differences between observations to the internal memory state of the recurrent cell<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Che, Z., Purushotham, S., Cho, K., Sontag, D. &amp; Liu, Y. Recurrent neural networks for multivariate time series with missing values. Scientific reports 8, 6085 (2018).\" href=\"https:\/\/www.nature.com\/#ref-CR13\" id=\"ref-link-section-d10732e651\" target=\"_blank\" rel=\"noopener\">13<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Cao, W. et al. In Advances in Neural Information Processing Systems. 6775&#x2013;6785 (2018).\" href=\"https:\/\/www.nature.com\/#ref-CR14\" id=\"ref-link-section-d10732e651_1\" target=\"_blank\" rel=\"noopener\">14<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 15\" title=\"Mozer, M. C., Kazakov, D. &amp; Lindsey, R. V. Discrete event, continuous time rnns. arXiv preprint arXiv:1710.04110 (2017).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR15\" id=\"ref-link-section-d10732e654\" target=\"_blank\" rel=\"noopener\">15<\/a><\/sup> or by modelling the dynamics in time of the internal memory state using neural ODEs<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Chen, T. Q., Rubanova, Y., Bettencourt, J. &amp; Duvenaud, D. K. In Advances in neural information processing systems. 6571&#x2013;6583 (2018).\" href=\"https:\/\/www.nature.com\/#ref-CR16\" id=\"ref-link-section-d10732e659\" target=\"_blank\" rel=\"noopener\">16<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" title=\"Rubanova, Y., Chen, R. T. &amp; Duvenaud, D. Latent odes for irregularly-sampled time series. arXiv preprint arXiv:1907.03907 (2019).\" href=\"https:\/\/www.nature.com\/#ref-CR17\" id=\"ref-link-section-d10732e659_1\" target=\"_blank\" rel=\"noopener\">17<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 18\" title=\"Dupont, E., Doucet, A. &amp; Teh, Y. W. Augmented neural odes. arXiv preprint arXiv:1904.01681 (2019).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR18\" id=\"ref-link-section-d10732e662\" target=\"_blank\" rel=\"noopener\">18<\/a><\/sup>. To aid subsequent interpretation without altering network capacity, the fixed-size vectors produced by attention mechanisms and\/or recurrent layers were reduced to two scalar-valued scores (one related to diagnoses\/procedures and one related to medications\/vital signs) using fully connected layers with a linear activation function.<\/p>\n<h4 class=\"c-article__sub-heading u-h3 c-article__sub-heading--light\" id=\"Sec8\">Logistic regression layer<\/h4>\n<p>The computed diagnoses\/procedures and medications\/vital signs scores were concatenated to the vector of static variables and passed to a fully connected layer with a sigmoid activation function. The output of the network corresponds to the risk of readmission to the ICU within 30 days from discharge.<\/p>\n<h4 class=\"c-article__sub-heading u-h3 c-article__sub-heading--light\" id=\"Sec9\">Architectures<\/h4>\n<p>The following neural network architectures were compared for predicting readmission to the ICU:<\/p>\n<ul class=\"u-list-style-bullet\">\n<li>\n<p><b>ODE\u2009+\u2009RNN\u2009+\u2009Attention:<\/b> dynamics in time of embeddings are modelled using neural ODEs, embeddings are passed to RNN layers, dot-product attention is applied to RNN outputs.<\/p>\n<\/li>\n<li>\n<p><b>ODE\u2009+\u2009RNN:<\/b> dynamics in time of embeddings are modelled using neural ODEs, embeddings are passed to RNN layers, the final memory states are used for further processing.<\/p>\n<\/li>\n<li>\n<p><b>RNN (ODE time decay)\u2009+\u2009Attention:<\/b> embeddings are passed to RNN layers with dynamics in time of the internal memory states modelled using neural ODEs, dot-product attention is applied to RNN outputs.<\/p>\n<\/li>\n<li>\n<p><b>RNN (ODE time decay):<\/b> embeddings are passed to RNN layers with dynamics in time of the internal memory states modelled using neural ODEs, the final memory states are used for further processing.<\/p>\n<\/li>\n<li>\n<p><b>RNN (exp time decay)\u2009+\u2009Attention:<\/b> embeddings are passed to RNN layers with internal memory states decaying exponentially over time, dot-product attention is applied to RNN outputs.<\/p>\n<\/li>\n<li>\n<p><b>RNN (exp time decay):<\/b> embeddings are passed to RNN layers with internal memory states decaying exponentially over time, the final memory states are used for further processing.<\/p>\n<\/li>\n<li>\n<p><b>RNN (concatenated \u0394time)\u2009+\u2009Attention:<\/b> embeddings are concatenated with time differences between observations and passed to RNN layers, dot-product attention is applied to RNN outputs.<\/p>\n<\/li>\n<li>\n<p><b>RNN (concatenated \u0394time):<\/b> embeddings are concatenated with time differences between observations and passed to RNN layers, the final memory states are used for further processing.<\/p>\n<\/li>\n<li>\n<p><b>ODE\u2009+\u2009Attention:<\/b> dynamics in time of embeddings are modelled using neural ODEs, dot-product attention is applied to the embeddings.<\/p>\n<\/li>\n<li>\n<p><b>Attention (concatenated time):<\/b> embeddings are concatenated with elapsed times, dot-product attention is applied to the embeddings.<\/p>\n<\/li>\n<li>\n<p><b>MCE\u2009+\u2009RNN\u2009+\u2009Attention:<\/b> MCE is used to compute the embeddings, embeddings are passed to RNN layers, dot-product attention is applied to RNN outputs.<\/p>\n<\/li>\n<li>\n<p><b>MCE\u2009+\u2009RNN:<\/b> MCE is used to compute the embeddings, embeddings are passed to RNN layers, the final memory states are used for further processing.<\/p>\n<\/li>\n<li>\n<p><b>MCE\u2009+\u2009Attention:<\/b> MCE is used to compute the embeddings, dot-product attention is applied to the embeddings.<\/p>\n<\/li>\n<\/ul>\n<p>The dimension of the internal memory state of RNN cells was set equal to the dimension of the input embeddings. Similarly, the dimension of the hidden representation of embeddings when computing dot-product attention was left unchanged. Derivatives with respect to time used to implement neural ODEs were approximated by a multilayer perceptron with three hidden layers of constant width equal to the size of the input. The Euler method was used as ODE solver.<\/p>\n<p>An overview of the considered neural network architectures is presented in Fig.\u00a0<a data-track=\"click\" data-track-label=\"link\" data-track-action=\"figure anchor\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#Fig1\" target=\"_blank\" rel=\"noopener\">1<\/a>. For completeness, the deep learning approaches were also compared with a <b>logistic regression<\/b> model using all static variables and the most recent vital signs for each patient as covariates.<\/p>\n<div class=\"c-article-section__figure js-c-reading-companion-figures-item\" data-test=\"figure\" data-container-section=\"figure\" id=\"figure-1\">\n<figure><figcaption><b id=\"Fig1\" class=\"c-article-section__figure-caption\" data-test=\"figure-caption-text\">Figure 1<\/b><\/figcaption><div class=\"c-article-section__figure-content\">\n<div class=\"c-article-section__figure-item\"><a class=\"c-article-section__figure-link\" data-test=\"img-link\" data-track=\"click\" data-track-category=\"article body\" data-track-label=\"image\" data-track-action=\"view figure\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z\/figures\/1\" rel=\"nofollow noopener\" target=\"_blank\"><picture><source type=\"image\/webp\" srcset=\"https:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1038%2Fs41598-020-58053-z\/MediaObjects\/41598_2020_58053_Fig1_HTML.png?as=webp\"\/><img decoding=\"async\" aria-describedby=\"figure-1-desc\" src=\"https:\/\/media.springernature.com\/lw685\/springer-static\/image\/art%3A10.1038%2Fs41598-020-58053-z\/MediaObjects\/41598_2020_58053_Fig1_HTML.png\" alt=\"figure1\" loading=\"lazy\"\/><\/picture><\/a><\/div>\n<div class=\"c-article-section__figure-description\" data-test=\"bottom-caption\" id=\"figure-1-desc\">\n<p>Overview of the considered neural network architectures.<\/p>\n<\/div>\n<\/div>\n<\/figure>\n<\/div>\n<h3 class=\"c-article__sub-heading u-h3\" id=\"Sec10\">Interpretation of attention-based models<\/h3>\n<p>For the proposed neural network architectures, the weights of the final fully connected layer can be used to determine the impact of static variables and timestamped codes on estimated risk. As in traditional logistic regression, these weights can be interpreted as increases in log-odds for unplanned early ICU readmission if the corresponding static variables or scores are increased by one unit.<\/p>\n<p>It is also of interest to determine which codes (i.e. diagnoses, procedures, medications, vital signs) are associated with a prediction of high risk. Dot-product attention computes a weighted average of embedded codes; fully connected layers are then used to output scores associated with diagnoses\/procedures and medications\/vital signs. By passing single codes (i.e. the rows of the embedding matrix) to the fully connected layers computing these scores, it is possible to associate each code with a score. The higher the score, the higher the risk of ICU readmission when a patient\u2019s EMR contains that code.<\/p>\n<p>To estimate Bayesian credible intervals around network weights and computed risk scores, the posterior distribution of weights was approximated using stochastic variational inference with mean-field approximation<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 35\" title=\"Hinton, G. &amp; Van Camp, D. In Proc. of the 6th Ann. ACM Conf. on Computational Learning Theory. (1993).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR35\" id=\"ref-link-section-d10732e784\" target=\"_blank\" rel=\"noopener\">35<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 36\" title=\"Graves, A. In Advances in neural information processing systems. 2348&#x2013;2356 (2011).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR36\" id=\"ref-link-section-d10732e787\" target=\"_blank\" rel=\"noopener\">36<\/a><\/sup>. In the present study, the variational posterior is assumed to be a diagonal Gaussian distribution and is estimated using the Bayes by Backprop algorithm<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 37\" title=\"Blundell, C., Cornebise, J., Kavukcuoglu, K. &amp; Wierstra, D. Weight uncertainty in neural networks. arXiv preprint arXiv:1505.05424 (2015).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR37\" id=\"ref-link-section-d10732e791\" target=\"_blank\" rel=\"noopener\">37<\/a><\/sup>. Following the original paper, a priori sparsity of the network weights is encouraged by formulating the prior distribution as a scale mixture of two zero-mean Gaussian densities with standard deviations of \u03c3<sub>1<\/sub>\u2009=\u20091 and \u03c3<sub>2<\/sub>\u2009=\u2009<i>e<\/i><sup>\u22126<\/sup>, respectively, and mixture weight \u03c0\u2009=\u20090.5<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 37\" title=\"Blundell, C., Cornebise, J., Kavukcuoglu, K. &amp; Wierstra, D. Weight uncertainty in neural networks. arXiv preprint arXiv:1505.05424 (2015).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR37\" id=\"ref-link-section-d10732e804\" target=\"_blank\" rel=\"noopener\">37<\/a><\/sup>. After the posterior distribution has been computed, 95% credible intervals around network weights (or combinations thereof) can be estimated by repeated sampling. Sampling of network weights may also be used to compute credible intervals around the risk prediction for a given patient.<\/p>\n<h3 class=\"c-article__sub-heading u-h3\" id=\"Sec11\">Training<\/h3>\n<p>To compare the classification accuracy of the considered neural network architectures, maximum likelihood estimates of network parameters were obtained using a log-loss cost function on the training data, extensive use of dropout with 50% probability after each embedding, RNN, and attention layers<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 38\" title=\"Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. &amp; Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1929&#x2013;1958 (2014).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR38\" id=\"ref-link-section-d10732e817\" target=\"_blank\" rel=\"noopener\">38<\/a><\/sup>, and stochastic gradient descent with an Adam optimizer (batch size of 128 and learning rate of 0.001)<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 39\" title=\"Kingma, D. P. &amp; Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR39\" id=\"ref-link-section-d10732e821\" target=\"_blank\" rel=\"noopener\">39<\/a><\/sup>. Class imbalance was taken into consideration by assigning a proportionally higher cost of misclassification to the minority class<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 40\" title=\"Weiss, G. M., McCarthy, K. &amp; Zabar, B. Cost-sensitive learning vs. sampling: Which is best for handling unbalanced classes with unequal error costs? Dmin 7, 24 (2007).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR40\" id=\"ref-link-section-d10732e825\" target=\"_blank\" rel=\"noopener\">40<\/a><\/sup>. Training was terminated after 80 epochs since overfitting of the training data started to become apparent with additional training epochs (based on average precision on the validation data). For interpretation purposes, Bayes by Backprop was used to train the \u201cAttention (concatenated time)\u201d neural network architecture on the entire data set, terminating if the loss function (the expected lower bound) did not decrease for 10 consecutive epochs.<\/p>\n<h3 class=\"c-article__sub-heading u-h3\" id=\"Sec12\">Statistical analysis<\/h3>\n<p>Baseline characteristics were determined for the analysed patient population. The prediction accuracy of each considered algorithm was evaluated based on average precision, AUROC, F<sub>1<\/sub>-Score, sensitivity, and specificity. Average precision may reflect algorithmic performance on imbalanced data sets better than AUROC as it does not reward true negatives<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 41\" title=\"Davis, J. &amp; Goadrich, M. In Proceedings of the 23rd international conference on Machine learning. 233&#x2013;240 (2006).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR41\" id=\"ref-link-section-d10732e839\" target=\"_blank\" rel=\"noopener\">41<\/a>,<a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 42\" title=\"Saito, T. &amp; Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS one 10, e0118432 (2015).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR42\" id=\"ref-link-section-d10732e842\" target=\"_blank\" rel=\"noopener\">42<\/a><\/sup>. The F<sub>1<\/sub>-Score was maximised over different threshold values on risk predictions. Sensitivity and specificity were computed by maximising Youden\u2019s J statistic<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 43\" title=\"Youden, W. J. Index for rating diagnostic tests. Cancer 3, 32&#x2013;35 (1950).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR43\" id=\"ref-link-section-d10732e848\" target=\"_blank\" rel=\"noopener\">43<\/a><\/sup>. 95% confidence intervals associated with each metric were computed by bootstrapping, i.e. by sampling the test set with replacement 100 times and re-evaluating the models each time<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 44\" title=\"Efron, B. &amp; Tibshirani, R. J. An introduction to the bootstrap. (CRC press, 1994).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR44\" id=\"ref-link-section-d10732e852\" target=\"_blank\" rel=\"noopener\">44<\/a><\/sup>. Since the bootstrap estimator assumes the resampling of independent events, sampling was based on patient identifiers rather than on ICU stay identifiers.<\/p>\n<p>Training the \u201cAttention (concatenated time)\u201d network using Bayes by Backprop allowed computation of odds ratios (OR) associated with static variables and ranking of the timestamped codes (diagnoses, procedures, medications, and vital signs) according to their associated average scores (a high positive score corresponds to increased risk of readmission to the ICU); corresponding 95% credible intervals were determined using 10,000 network samples.<\/p>\n<p>Software was implemented in Python using Scikit-learn<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 45\" title=\"Pedregosa, F. et al. Scikit-learn: Machine learning in Python. Journal of machine learning research 12, 2825&#x2013;2830 (2011).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR45\" id=\"ref-link-section-d10732e862\" target=\"_blank\" rel=\"noopener\">45<\/a><\/sup> and PyTorch<sup><a data-track=\"click\" data-track-action=\"reference anchor\" data-track-label=\"link\" data-test=\"citation-ref\" aria-label=\"Reference 46\" title=\"Paszke, A. et al. In NIPS 2017 Workshop. (2017).\" href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z#ref-CR46\" id=\"ref-link-section-d10732e866\" target=\"_blank\" rel=\"noopener\">46<\/a><\/sup>; the developed algorithms are publicly available at <a href=\"https:\/\/github.com\/sebbarb\/time_aware_attention\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/sebbarb\/time_aware_attention<\/a>.<\/p>\n<\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/www.nature.com\/articles\/s41598-020-58053-z\" target=\"_blank\" rel=\"noopener\">Original article by  Sebastiano Barbieri, James Kemp, Oscar Perez-Concha, Sradha Kotwal, Martin Gallagher, Angus Ritchie, Louisa Jorm  <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] Study population The algorithms were evaluated on the publicly available MIMIC-III data set (ethics approval was not required)32. This data set comprises deidentified health data associated with 61,532 ICU stays and 46,476 critical care patients at Beth Israel Deaconess Medical Center in Boston, Massachusetts between 2001 and 2012. The supervised learning task consists of [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":1522,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","jetpack_post_was_ever_published":false},"categories":[94,92,98],"tags":[],"class_list":["post-1521","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence","category-data-science","category-machine-learning"],"blocksy_meta":[],"jetpack_featured_media_url":"https:\/\/e928cfdc7rs.exactdn.com\/info\/uploads\/sites\/3\/2020\/01\/Benchmarking-Deep-Learning-Architectures-for-Predicting-Readmission-to-the-ICU.png?strip=all","jetpack_shortlink":"https:\/\/wp.me\/p2TFCd-ox","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/posts\/1521","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/comments?post=1521"}],"version-history":[{"count":0,"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/posts\/1521\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/media\/1522"}],"wp:attachment":[{"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/media?parent=1521"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/categories?post=1521"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.danielparente.net\/en\/wp-json\/wp\/v2\/tags?post=1521"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}