The early prediction of deterioration could have an important role in supporting healthcare professionals, as an estimated 11% of deaths in hospital follow a failure to promptly recognize and treat deteriorating patients1. To achieve this goal requires predictions of patient risk that are continuously updated and accurate, and delivered at an individual level with sufficient context and enough time to act. Here we develop a deep learning approach for the continuous risk prediction of future deterioration in patients, building on recent work that models adverse events from electronic health records2–17 and using acute kidney injury—a common and potentially life-threatening condition18—as an exemplar. Our model was developed on a large, longitudinal dataset of electronic health records that cover diverse clinical environments, comprising 703,782 adult patients across 172 inpatient and 1,062 outpatient sites. Our model predicts 55.8% of all inpatient episodes of acute kidney injury, and 90.2% of all acute kidney injuries that required subsequent administration of dialysis, with a lead time of up to 48 h and a ratio of 2 false alerts for every true alert. In addition to predicting future acute kidney injury, our model provides confidence assessments and a list of the clinical features that are most salient to each prediction, alongside predicted future trajectories for clinically relevant blood tests9. Although the recognition and prompt treatment of acute kidney injury is known to be challenging, our approach may offer opportunities for identifying patients at risk within a time window that enables early treatment.
早期預測在醫療決策方面發揮重要作用,約11%住院死亡是由於未能及時識別病情的惡化並給予相應治療。為實現這一目標,需要對患者的風險因素進行持續更新和準確預測,並在個體層面上提供儘可能多的背景資料和足夠的時間以採取行動。我們開發了一種深度學習模型,用於對患者病情惡化風險的持續預測。該模型的建立是基於最近的研究工作,模擬電子病歷中的不良事件,並使用急性腎損傷這一常見且高病死率的疾病作為示例。該模型利用了涵蓋不同臨床環境的大型縱向電子病歷資料庫,共納入172家醫院和1062家門診,共703782名成人患者。該模型預測住院患者中急性腎損傷的發生率為55.8%,其中90.2%的患者需要後續透析治療。該模型的預測先於急性腎損傷確診48小時,雖然每個真陽性預測結果伴隨兩個假陽性預測結果。除了預測急性腎損傷外,該模型還提供了置信度評估和每個預測最突出的臨床特徵列表,以及臨床相關血液檢測的預期趨勢。我們知道,雖然對急性腎損傷的識別和及時治療具有挑戰性,但是該方法或能提供一種機會,在一定時間窗內,及時發現有病情惡化風險的患者,並實施早期治療。
Adverse events and clinical complications are a major cause of mortality and poor outcomes in patients, and substantial effort has been made to improve their recognition18,19. Few predictors have found their way into routine clinical practice, because they either lack effective sensitivity and specificity or report damage that already exists20. One example relates to acute kidney injury (AKI), a potentially life-threatening condition that affects approximately one in five inpatient admissions in the United States21. Although a substantial proportion of cases of AKI are thought to be preventable with early treatment22, current algorithms for detecting AKI depend on changes in serum creatinine as a marker of acute decline in renal function. Increases in serum creatinine lag behind renal injury by a considerable period, which results in delayed access to treatment. This supports a case for preventative 『screening』-type alerts but there is no evidence that current rule-based alerts improve outcomes23. For predictive alerts to be effective, they must empower clinicians to act before a major clinical decline has occurred by: (i) delivering actionable insights on preventable conditions; (ii) being personalized for specific patients; (iii) offering sufficient contextual information to inform clinical decision-making; and (iv) being generally applicable across populations of patients24.
不良事件和臨床併發症是患者死亡和不良轉歸的主要原因,醫護人員一直致力於努力提高對其的認識。很少有預測因子進入常規臨床實踐,原因在於要麼這些預測因子缺乏有效的敏感性和特異性,要麼確認的損傷本已被報告。例如急性腎損傷(AKI),這是一種可能危及生命的疾病,在美國大約五分之一的住院患者罹患AKI。雖然大部分AKI病例被認為可以通過早期幹預來預防,但是,目前預測AKI的算法依賴於血肌酐的變化來作為腎功能急性下降的標誌物。血肌酐的變化滯後於腎損傷相當長時間,這將導致對病情的評估不足而延誤治療。這一指標支持預防性「篩查」型警報,但沒有證據表明,基於當前規則的警報能夠改善臨床結局。為了能使預警有效,預測因子必須能夠幫助臨床醫生在病情急劇惡化之前就採取行動:(i)為可預防的情況提供可行的見解; (ii)提供個性化解決方案; (iii)為臨床決策提供足夠的背景信息; (iv)具有普適性。
Promising recent work on modelling adverse events from electronic health records2–17 suggests that the incorporation of machine learning may enable the early prediction of AKI. Existing examples of sequential AKI risk models have either not demonstrated a clinically applicable level of predictive performance25 or have focused on predictions across a short time horizon that leaves little time for clinical assessment and intervention26.
最近,利用電子病歷進行不良事件的建模工作表明,結合機器學習或可實現AKI的早期預測。連續AKI風險模型的現有實例,要麼缺乏臨床適用的預測性,要麼專注於短期預測,幾乎沒給臨床醫生預留時間進行臨床評估和幹預。
Our proposed system is a recurrent neural network that operates sequentially over individual electronic health records, processing the data one step at a time and building an internal memory that keeps track of relevant information seen up to that point. At each time point, the model outputs a probability of AKI occurring at any stage of severity within the next 48 h (although our approach can be extended to other time windows or severities of AKI; see Extended Data Table 1). When the predicted probability exceeds a specified operating-point threshold, the prediction is considered positive. This model was trained using data that were curated from a multi-site retrospective dataset of 703,782 adult patients from all available sites at the US Department of Veterans Affairs—the largest integrated healthcare system in the United States. The dataset consisted of information that was available from hospital electronic health records in digital format. The total number of independent entries in the dataset was approximately 6 billion, including 620,000 features. Patients were randomized across training (80%), validation (5%), calibration (5%) and test (10%) sets. A ground-truth label for the presence of AKI at any given point in time was added using the internationally accepted 『Kidney Disease: Improving Global Outcomes』 (KDIGO) criteria18; the incidence of KDIGO AKI was 13.4% of admissions. Detailed descriptions of the model and dataset are provided in the Methods and Extended Data Figs. 1–3.
我們提出的系統是一個遞歸神經網絡,依次處理電子病歷,一次一步地處理數據並構建內部存儲器,跟蹤到目前為止所能獲取的相關信息。在每個時間點,模型會預測出後續48小時內發生任何嚴重程度的AKI概率(儘管該方法可以擴展到其他時間窗或AKI的嚴重程度;見擴展數據表1)。當預測概率超過指定閾值時,預測被認為是陽性的。該模型使用來自美國退伍軍人事務部(美國最大的綜合醫療保健系統)所有可用站點703782名成人患者的多站點回顧性資料庫進行訓練。資料庫由數字格式的醫院電子病歷信息組成。資料庫中獨立條目的總數約為6億,包括620000個特徵。將患者隨機分組為:訓練(80%),驗證(5%),校準(5%)和測試(10%)。使用國際公認的KDIGO標準,增加了在任何特定時間點存在AKI的真實標籤;入院時使用KDIGO標準AKI的發病率為13.4%。有關模型和資料庫的詳細說明,請參見「方法和擴展數據圖1-3」。
Fig. 1
Figure 1 shows the use of our model. At every point throughout an admission, the model provides updated estimates of future AKI risk along with an associated degree of uncertainty. Providing the uncertainty associated with a prediction may help clinicians to distinguish ambiguous cases from those predictions that are fully supported by the available data. Identifying an increased risk of future AKI sufficiently far in advance is critical, as longer lead times may enable preventative action to be taken. This is possible even when clinicians may not be actively intervening with, or monitoring, a patient. Supplementary Information section A provides more examples of the use of the model.
圖1顯示了模型的使用。在入院後的每個階段,該模型提供了對未來AKI風險的更新評估以及相關的不確定性。提供與預測相關的不確定性,可以幫助臨床醫生區分模糊的預測或證據充分的預測。提前確定未來AKI的風險增加是至關重要的,因為可以提供更多的時間以採取預防措施。即使臨床醫生忽略了這一問題,該模型也使預警成為可能。補充信息A提供了使用該模型的更多實例。
With our approach, 55.8% of inpatient AKI events of any severity were predicted early, within a window of up to 48 h in advance and with a ratio of 2 false predictions for every true positive. This corresponds to an area under the receiver operating characteristic curve of 92.1%, and an area under the precision–recall curve of 29.7%. When set at this threshold, our predictive model would—if operationalized—trigger a daily clinical assessment in 2.7% of hospitalized patients in this cohort (Extended Data Table 2). Sensitivity was particularly high in patients who went on to develop lasting complications as a result of AKI. The model provided correct early predictions in 84.3% of episodes in which administration of in-hospital or outpatient dialysis was required within 30 days of the onset of AKI of any stage, and in 90.2% of cases in which regular outpatient administration of dialysis was scheduled within 90 days of the onset of AKI (Extended Data Table 3). Figure 2 shows the corresponding receiver operating characteristic and precision–recall curves, as well as a spectrum of operating points of the model. An operating point can be chosen to further increase the proportion of AKI that is predicted early or to reduce the percentage of false predictions at each step, according to clinical priority (Fig. 3). Applied to stage 3 AKI, 84.1% of inpatient events were predicted up to 48 h in advance, with a ratio of 2 false predictions for every true positive (Extended Data Table 4). To respond to these alerts on a daily basis, clinicians would need to attend to approximately 0.8% of in-hospital patients (Extended Data Table 2).
通過該方法,住院期間55.8%的AKI都能夠提前預測,並且早於AKI被診斷達48小時,雖然每個真陽性預測結果伴隨兩個假陽性預測結果。對應的受試者工作曲線(ROC)下面積為92.1%,查準-查全率(PRC)曲線下面積為29.7%。當設此閾值時,該預測模型將觸發該隊列中2.7%住院患者的每日臨床評估(擴展數據表2)。在AKI導致持續併發症的患者中,顯示了很高的敏感性。該模型提供了正確的早期預測,在84.3%的情況下,需要在AKI發病後30天內進行住院或門診透析;在90.2%的情況下,在AKI發病後90天內安排常規門診透析(擴展數據表3)。圖2顯示了相應的ROC和PRC曲線,以及模型的工作點範圍。根據臨床優先級,可以選擇一個工作點來進一步增加早期預測的AKI比例或降低每個步驟的錯誤預測百分比(圖3)。應用於AKI 3期患者,84.1%的住院事件提前48小時被預測,每個真陽性的誤差預測比率為2(擴展數據表4)。為了每天響應這些警報,臨床醫生需要查看約0.8%的住院患者(擴展數據表2)。
The model correctly identifies substantial future increases in 7 auxiliary biochemical tests in 88.5% of cases (Supplementary Information, section B), and provides information about the factors that are most salient to the computation of each risk prediction. The greatest saliency was identified for laboratory tests that are known to be relevant to renal function (Supplementary Information, section C). The predictive performance of our model was maintained across time and hospital sites, as demonstrated by additional experiments that show generalizability to data acquired at time points after the model was trained (Extended Data Table 5).
該模型正確地識別了未來7天生化檢查異常的88.5%案例(補充信息,B節),並提供了每個預測風險最為重要的相關因素。並確定了已知與腎功能最密切相關的實驗室檢查項目(補充信息,C節)。該模型的預測效能,能在所有時間點和不同醫院站點均得以保持,正如通過額外的實驗所證明的那樣,該模型經訓練後對於不同時間點獲得的數據具有普適性(擴展數據表5)。
Our approach significantly outperformed (P
該方法明顯優於現有的最先進的基線模型(P
Of the false-positive alerts made by our model, 24.9% were positive predictions that were made even earlier than the 48-h window in patients who subsequently developed AKI (Extended Data Fig. 4). Of these, 57.1% occurred in patients with pre-existing chronic kidney disease, who are at a higher risk of developing AKI. Of the remaining false-positive alerts, 24.1% were trailing predictions that occurred after an AKI episode appeared to have resolved; alerts such as these can be filtered out in clinical practice. For positive risk predictions in which no AKI was subsequently observed (in this retrospective dataset), it is probable that many occurred in patients at risk of AKI to whom appropriate preventative treatment was administered—which would have averted subsequent AKI. In addition to these early and trailing predictions, 88% of the remaining false-positive alerts occurred in patients with severe renal impairment, known renal pathology or evidence in the electronic health record that the patient required clinical review (Extended Data Fig. 4).
在該模型所做的假陽性警報中,24.9%是陽性預測,甚至早於48小時的時間窗(擴展數據圖4)。其中,57.1%發生於已有慢性腎病的患者中,他們患AKI的風險較高。在剩餘的假陽性警報中,24.1%是在AKI發生後的追蹤預測,這些警報可以在臨床實踐中過濾掉。對於隨後沒有發生AKI的陽性風險預測(在該回顧性資料庫中),很可能對於有AKI風險的患者給予適當的預防性治療,從而避免了可能發生的AKI。除此之外,88%的持續假陽性警報發生於嚴重腎功能不全患者中,已知的腎臟病理或電子病歷中的證據表明這些患者需要加以注意(擴展數據圖4)。
Our aim is to provide risk predictions that enable personalized preventative action to be delivered at a large scale. The way these predictions are used may vary by clinical setting: a trainee doctor could be alerted in real time to each patient under their care, and specialist nephrologists or rapid-response teams27 could identify high-risk patients to prioritize their response. This is possible because performance was consistent across multiple clinically important groups—notably, those at an increased risk of AKI (Supplementary Information, section G). Our model is designed to complement existing routine care, as it is trained specifically to predict episodes of AKI that happened in this retrospective dataset despite existing best practices.
我們的目標是提供風險預測,以採取最大程度的個性化預防措施。這些預測的使用方式可能因臨床情況而異:低年資醫生可以實時地向患者發出預警,腎科醫生或快速反應小組可以識別高風險患者予以優先處理。此外,在多個患者群體中,特別是在那些AKI風險增加的患者中,該模型的表現穩定一致(補充信息,G節)。這一模型旨在補充現有的常規照護,並專門應用於預測AKI的發生。
Fig. 2
Although we demonstrate a model that is trained and evaluated on a clinically representative set of patients from the entire US Department of Veterans Affairs healthcare system, this demographic is not representative of the global population. Female patients comprised 6.38% of patients in the dataset, and model performance was lower for this demographic (Extended Data Table 6). Validating the predictive performance of the proposed system on a general population would require training and evaluating the model on additional representative datasets. Future work will need to address the under-representation of sub-populations in the training data28 and overcome the effect of potential confounding factors that relate to hospital processes29. KDIGO is an indicator of AKI that has a long lag time after the initial renal impairment, and model performance could be enhanced by improvements in the ground-truth definition of AKI and in data quality30.
雖然我們的模型是在整個美國退伍軍人事務部醫療保健系統的患者群體中進行訓練和評估的,但這一人群並不能夠代表全世界患者。該資料庫中,女性患者只佔6.38%,該人群的模型表現較低(擴展數據表6)。驗證該系統對一般人群的預測性能,需要在其他具有代表性資料庫中進行訓練和評估。未來工作需要解決的問題包括培訓數據中亞組人群的代表性不足,以及克服與醫院流程相關的潛在混雜因素的影響。KDIGO是AKI的一個指標,其在初始腎功能損傷後具有較長的診斷滯後。通過AKI定義和數據質量的改進,模型的性能或許可能得到進一步提升。
Despite the state-of-the-art retrospective performance of our model compared to existing literature, future work should now prospectively evaluate and independently validate the proposed model to establish its clinical utility and effect on patient outcomes, as well as explore the role of the model in researching strategies for delivering preventative care for AKI.
儘管我們的模型與現有文獻相比,具有最先進的回顧性表現,但未來的工作仍應該致力於前瞻性評估和獨立驗證,以確定該模型的臨床效用和對患者預後的影響,並探討其在制定AKI防治策略中所能夠發揮的作用。
Fig. 3
In summary, we demonstrate a deep learning approach for the continuous prediction of AKI within a clinically actionable window of up to 48 h in advance. We report performance on a clinically diverse population and across a large number of sites to show that our approach may allow for the delivery of potentially preventative treatment—before the physiological insult itself, in a large number of the cases. Our results open up the possibility for deep learning to guide the prevention of clinically important adverse events. With the possibility of risk predictions delivered in clinically actionable windows, alongside the increasing size and scope of electronic health record datasets, we now shift to a regime in which the role of machine learning in clinical care can grow rapidly, supplying tools for enhancing patient and clinician experiences and potentially becoming a ubiquitous and integral part of routine clinical pathways.
總之,我們展示了一種深度學習方法,用於在臨床可操作的窗口內連續預測AKI,最多可提前48小時進行預警。我們在臨床多樣化的患者群體和眾多醫療場所進行了研究,結果顯示該方法可能應用於多種情況,尤其是在患者病情惡化之前提供潛在的預防性治療。該研究結果提示,深度學習模型或可指導預防臨床重要不良事件的發生。隨著在臨床可操作窗口中風險預測成為可能,以及電子病歷資料庫的規模不斷擴大,我們正在轉向一種機制,其中機器學習在臨床醫療中的作用不斷迅速增長,強化患者和臨床醫生的經驗,並可能成為臨床路徑中普遍存在且不可或缺的一部分。
翻譯
張琳琳 副主任醫師
醫學博士,旅美博士後,碩士生導師,首都醫科大學附屬北京天壇醫院副主任醫師;兼任中華醫學會腸外腸內營養學會分青年委員,山東省醫師協會重症醫學醫師分會青年委員會副主任委員,山東省醫學會重症分會青年委員會委員。
在線速遞
翻譯:張琳琳
編輯:丁瑞琪/宋 璇
審校:張繼承/王春亭
原文章權限歸原作者所有,以上譯文版權歸譯者所有,如需轉載請於雲ICU後臺留言。
備註:
張琳琳:首都醫科大學附屬北京天壇醫院
王春亭:山東省立醫院
張繼承:山東省立醫院
丁瑞琪:山東省立醫院
宋 璇:聊城市心臟病醫院