Head and neck cancer predictive risk estimator to determine control and therapeutic outcomes of radiotherapy (HNC-PREDICTOR): development, international multi-institutional validation, and web implementation of clinic-ready model-based risk stratification for head and neck cancer

Background: Personalised radiotherapy can improve treatment outcomes of patients with head and neck cancer (HNC), where currently a ‘one-dose-fits-all’ approach is the standard. The aim was to establish individualised outcome prediction based on multi-institutional international ‘big-data’ to facilitate risk-based stratification of patients with HNC. Methods: The data of 4611 HNC radiotherapy patients from three academic cancer centres were split into four cohorts: a training (n = 2241), independent test (n = 786), and external validation cohorts 1 (n = 1087) and 2 (n = 497). Tumour- and patient-related clinical variables were considered in a machine learning pipeline to predict overall survival (primary end-point) and local and regional tumour control (secondary end-points); serially, imaging features were considered for optional model improvement. Finally, patients were stratified into high-, intermediate-, and low-risk groups. Results: Performance score, AJCC8th stage, pack-years, and Age were identified as predictors for overall survival, demonstrating good performance in both the training cohort (c-index = 0.72 [95% CI, 0.66–0.77]) and in all three validation cohorts (c-indices: 0.76 [0.69–0.83], 0.73 [0.68–0.77], and 0.75 [0.68–0.80]). Excellent stratification of patients with HNC into high, intermediate, and low mortality risk was achieved; with 5-year overall survival rates of 17–46% for the high-risk group compared to 92–98% for the low-risk group. The addition of morphological image feature further improved the performance (c-index = 0.73 [0.64–0.81]). These models are integrated in a clinic-ready interactive web interface: https://uic-evl.github.io/hnc-predictor/ Conclusions: Robust model-based prediction was able to stratify patients with HNC in distinct high, intermediate, and low mortality risk groups. This can effectively be capitalised for personalised radiotherapy, e.g., for tumour radiation dose escalation/de-escalation.


Introduction
Head and neck cancer (HNC) affects almost 650,000 individuals and causes 350,000 deaths worldwide annually [1]. Historically, the main etiological HNC risk factor was smoking; hence, HNC incidence rates were expected to decrease along with the decline in societal smoking [2][3][4][5]. Yet, HNC cases increased due to a relatively new epidemiological subtype, human papilloma virus (HPV)-related HNC, which affects relatively younger patients and is associated with much better prognosis compared to HPV-negative HNC [6,7].
Radiotherapy is a cornerstone for curative HNC treatment. To date, a 'one-dose-fits-all' approach is deployed, i.e., all patients receive roughly similar tumour radiation dose prescription based mainly on historic pre-HPV clinical trials. Currently, personalising radiation dose to optimise tumour control is relatively unexplored. For instance, only tumour stage (i.e., early stage versus locally advanced) is used to select eligible patients in recent dose-escalation clinical trials, aiming to improve treatment control by increasing the radiation tumour dose [8][9][10][11]. The risk of severe radiation-induced sequelae from dose-escalation [10] makes improved selection a vital unmet need. On the other hand, patients with a low risk of treatment failure might benefit from de-intensified treatment, e.g., MR-guided dose de-escalation [12]. To date, attempts at therapeutic de-intensification in large heterogenous cohorts without patient-specific criteria have been uncompelling [13][14][15]; consequently, granular treatment outcome estimation for directed dose modification remains a substantive opportunity for HNC treatment personalisation.
Robust treatment outcome prediction based on multifactorial clinical variables is thus crucial to improve treatment success and establish effective personalised radiotherapy [16,17]. While clinical models have been developed [18][19][20][21], they are largely unused; clinical implementation has been hampered due to the lack of clinically useful prediction tools that are backed by large representative multi-institutional dataset for training and validation. Additionally, radiomics features -tumour-specific characteristics quantified from medical images -have been shown to improve HNC treatment outcome prediction [22][23][24]. An approach to add imaging features to well-established clinical models is needed for robust radiomics applications.
The main aim was to establish a large-scale multi-institutional standard for a more individualised outcome prediction in patients with HNC of overall survival (OS) and oncologic outcomes (i.e., local [LC] and regional control [RC]) following radiotherapy using large high-quality international datasets (>4500 patients with HNC). Additionally, an interactive web-based risk prediction tool was pursued to make the models direct clinically actionable for clinicians. Finally, we present a serial prediction model approach, where the clinical models can be enriched by an optional imaging component (Fig. 1A).

Patient considerations
The MD Anderson Cancer Center (MDACC) Big Data Radiotherapy HNC collection effort has been initiated for this study. The prospective and retrospective data collection was approved by the MDACC Institutional Review Board [PA14-0947/RCR03-0800]. This dataset was used for training and independent validation. Prospectively collected data from the University Medical Center Groningen (UMCG) were used for external validation (Standardized Follow-up Program: NCT02435576). The publicly available data from Princes Margret Hospital (PMH) on The Cancer Imaging Archive (TCIA) were used for additional external validation [25].
Inclusion criteria for all cohorts included: (1) proven squamous cell carcinoma of the head and neck, (2) treatment with definitive or adjuvant radiotherapy with/without chemotherapy, and (3) no prior head and neck radiation. Patients were treated from 2001 to 2019, 2007 to 2020, and 2005 to 2010 at MDACC, UMCG, and PMH, respectively. Prescribed tumour doses were 60-72 Gy, as detailed previously by each institution [23,26,27]. was measured from start of radiotherapy until the event, alternatively data were censored at last follow-up date. Systematic follow-up was part of the standard of care in both treatment centres: every 3 months in year 1, followed by every 6 months thereafter.

Statistical analysis
The MDACC dataset was split into a training and independent validation cohort for the clinical model development (Fig. 1B). The data with all variables collected (i.e., complete cases) were split with a 60:40 ratio into training:validation data. Cases with missing variables (i.e., partial cases) were added to the training set. Only complete cases were considered for the independent and external validation cohorts.
Step-wise forward variable selection was employed to select variables for the Cox regression OS, LC, and RC model based on likelihood ratio test with a Bonferroni-corrected significance level of p < 0.005. Repeated selection was performed on 10 imputed datasets using Multivariate Imputation by Chained Equations (R-package 'mice' v3.13.0) with predictive mean matching across 25 iterations [29]. Based on the variable selection and intervariable correlation results, potential models were tested in the validation cohorts. The final models were used for patient stratification. The final OS model was compared with a model based on AJCC 8th alone with the likelihood ratio test.

Risk-based patient stratification
Patients were stratified into high-, intermediate-, and low-risk groups based on the predicted 2-year mortality risk derived from the Cox regression clinical models. These 2-year mortality risk thresholds were visually determined in the training cohort by evaluating the Kaplan-Meier curves for the different risk groups.

Imaging prediction component
For a subset of patients with available pre-treatment contrast-enhanced CT scans, image characteristics of the primary tumour were quantified in geometric and texture radiomics features using previously developed libraries [30,31], according to the Image Biomarker Standardisation Initiative [32]. Features were selected with bootstrapped forward stepwise variable selection (1000 samples). Subsequently, model improvement was tested for the addition of these features to the clinical risk prediction (i.e., linear predictor).

Association of clinical variables and treatment outcome
For OS, univariable analyses showed that all clinical variables were significant (p < 0.0001), except gender (eTable 1). For LC or RC, all variables were significant, except age and gender (p > 0.106), and N-stage for LC (p = 0.189).
For comprehensive multivariable model analyses and iterations, please refer to eResults 1.
For OS, the final model included the following clinicodemographic variables: performance score, AJCC 8th stage, pack-years, and age (

Model-based patient stratification
The survival curves of patients stratified based on their model-based predicted 2-year mortality risk (2y-risk) are shown in Fig. 2. Based on the training cohort, the best separation was seen for predicted 2y-risk lower than 5% (low-risk), between 5 and 25% (intermediaterisk), and higher than 25% (high risk Prediction based on AJCC 8th staging alone gives a single 2y-risk per category (x-axis Fig. 3-left), while a sizeable spread can be seen per category in 2y-risk calculated by the clinical model (y-axis). Fig. 3 shows that only a select portion of the Stage I is low risk (2y-risk<5%), and limited number of Stage III-IV patients are high risk (2y-risk>25%). The 'by-the-model-identified' high-risk patients were correctly classified as the majority of these patients died ( Fig. 3-right).

Web interface prediction and stratification tool
The clinically usable prediction tool was implemented in an interactive web interface https://uic-evl.github.io/hnc-predictor/employing the final clinical models. Here, the clinical variables of a new patient (e.g., age) can be interactively submitted, whereafter the patientspecific predicted OS, LC, or RC curves can be calculated. Finally, by submitting the desired 2-y risk threshold, the new patient is stratified into being low, intermediate, high risk of OS, LC, and/or RC.

Models in tumour site sub cohorts
The clinical models performed well in two largest sub-cohorts: Overall, the calibration of the models was good, yet the actual mortality risk was higher than predicted for the OPC and oral cavity patients (HosmereLemeshow p-value<0.05), which was comparable to the total cohort. The number of hypopharynx (n = 136), nasopharynx (n = 56), and unknown primary (n = 73) patients was too low to draw reliable conclusions (eFig. 3).

Discussion
The clear stratification of patients with HNC into high, intermediate, and low risk of mortality (Fig. 2) by the models can be effectively used for personalised radiotherapy, e.g., selecting high-risk patients for tumour radiation dose escalation or low-risk patients for dose de-escalation. The impressive survival differences for patients who are nominally in the same AJCC (including HPV) risk category allows for more directive and granular patientby-patient risk differentiation. For example, OPC HPV-positive patients are considered for de-escalation trials [13][14][15], yet our findings show that 4% and 14% of these patients have a 2-y mortality of >25% and >15%, respectively, for which dose de-escalation may not be advisable. By using this international big dataset of more than 4500 patients, this study establishes a benchmark for robust OS, LC, and RC prediction in patients with HNC.  [36]. While the modelling procedure was quite different, similar input predictors were identified: T, N-stage, HPV status, age, smoking status; notably, tobacco pack-years and performance score were not included. Overall, these findings suggest that despite distinct modelling approaches and datasets, convergent phenomena have been observed.
For the LC prediction, T-stage, HPV status, performance score, and pack-years were selected. Since HPV status was highly correlated to tumour site (Rho = 0.89; p < 0.0001), it is difficult to determine the impact of tumour location on LC. In contrast, for RC, tumour site showed added predictive value to AJCC 8th staging, which is interesting as it based on the tumour site. This is likely due to the difference of the lymphatic tumour spread per tumour location [37].
Outcome prediction was robust across multi-institutional cohorts, even though they had distinct patient demographic profiles (Table 1); particularly, the HPV-positive HNC incidence was substantially lower in European compared to the North American cohorts. and unknown primary cancer sub-cohorts, caution is advised when applying these models due to the sparse patient numbers.
The serial approach of building the prediction model presented in this study (Fig. 1A) allows for flexible addition of imaging features. Higher OS risk was associated with larger minor axis length of the tumour [32], which represents an intuitive metric for tumour size. Previous studies showed the relation between OS and features indicating larger or more irregular tumours [22,23]. Texture features, in contrast to prior works [22][23][24], failed to improve our model discrimination (eResults 2.2); similar to a previous study [38]. This may be due to the sensitivity of intensity/texture features to image acquisition discrepancies [39], arguing for improved image harmonisation, standardisation, and image quality.
Limitations of the study cohort are that the majority of tumour locations were OPC, larynx, and oral cavity, underrepresenting hypopharynx, nasopharynx, and unknown primary cases. While this is a representative of the HNC clinical incidence, this may mean that the presented models are not sufficiently tested for under-represented tumour sites. Another challenge is the definition of local and RC, for which an event was broadly defined as recurrent, progressive, or residual disease. The detection residual/returning disease can be challenging [40] and is further complicated when no salvage treatments are available or when patients are lost from follow-up, and thus, no pathologic conformation, clinical progression, or imaging can be obtained. This may therefore potentially result in an underdetection bias of disease control in the cohorts, which can influence accuracy of the LC and RC models.
As with multi-site data aggregation and risk modelling efforts at large scale, there are intrinsic limitations as function of data availability, e.g., anaemia identified by Beesley et al. was not recorded in these datasets [36]. Consequently, the utility of this (or any) predictive model is necessarily predicated on input variables and could be modified or altered with updated or augmented data. Moreover, stage migration considerations between AJCC 7th and 8th edition should be noted; for example, extranodal extension was not always specified/recorded as a formalised component of AJCC 7th ed. and may have been obscured. Improved incorporation could improve the models, or alternatively, it could be added as a separate variable [36]. While we focus on OS, LC, and RC, future work will focus on predicting distant metastases and disease-free survival.
Nonetheless, this study is to our knowledge based on the largest head and neck extant multi-site dataset, which allowed for the development of statistically robust and clinic-ready HNC risk models. This provides a benchmark platform for extended future developments of image-incorporating prediction methods, such as deep learning. Moreover, the enduser-enabled web interface (GUI) provides an accessible decision support tool for patientindividual risk stratification for therapeutic selection.

Conclusion
Developed and assessed in this international 'big-data'-set, our prediction models presented excellent capacity to stratify patients with HNC at high, intermediate, and low mortality risk -outperforming AJCC 8th staging. This work sets a benchmark for robust OS, LC, and RC risk prediction in radiotherapy HNC patients, which can effectively be capitalised for personalised radiotherapy with the clinic-ready web-based tool prediction tool for new patients that does not require under-the-hood knowledge of model mechanics (https://uicevl.github.io/hnc-predictor/).

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.    Demographics for training, independent validation, and two validation cohorts.  Table 2 Clinical model parameters and c-index model performance.