Incorporating progesterone receptor expression into the PREDICT breast prognostic model

Background: Predict Breast (www.predict.nhs.uk) is an online prognostication and treatment beneﬁt tool for early invasive breast cancer. The aim of this study was to incorporate the prognostic effect of progesterone receptor (PR) status into a new version of PREDICT and to compare its performance to the current version (2.2). Method: The prognostic effect of PR status was based on the analysis of data from 45,088 European patients with breast cancer from 49 studies in the Breast Cancer Association Consortium. Cox proportional hazard models were used to estimate the hazard ratio for PR status. Data from a New Zealand study of 11,365 patients with early invasive breast cancer were used for external validation. Model calibration and discrimination were used to test the model performance. Results: Having a PR-positive tumour was associated with a 23% and 28% lower risk of dying from breast cancer for women with oestrogen receptor (ER)-negative and ER-positive breast cancer, respectively. The area under the ROC curve increased with the addition of PR status from 0.807 to 0.809 for patients with ER-negative tumours ( p Z 0.023) and from 0.898 to 0. 902 for patients with ER-positive tumours ( p Z 2.3 (cid:1) 10 (cid:3) 6 ) in the New Zealand cohort. Model calibration was modest with 940 observed deaths compared to 1151 predicted. Conclusion: The inclusion of the prognostic effect of PR status to PREDICT Breast has led to an improvement of model performance and more accurate absolute treatment beneﬁt predictions for individual patients. Further studies should determine whether the baseline hazard function requires recalibration.


Introduction
Accurate predictions of individualised survival estimates and benefits of adjuvant therapy following surgery are essential for clinical decision-making for patients with early invasive breast cancer. PREDICT Breast (www. breast.predict.nhs.uk) is an online prognostication and treatment benefit tool to aid clinical decision-making for adjuvant therapy after surgery for patients with early invasive breast cancer [1]. The model uses information about age at diagnosis and tumour characteristics to predict 5-, 10-and 15-year mortality and the benefit of treatment of adjuvant cytotoxic chemotherapy, hormone therapy, trastuzumab and/or bisphosphonate therapy. The clinico-pathological factors used in the current version (v2.2) are tumour size, tumour grade, number of positive lymph nodes, oestrogen receptor (ER) status, human epidermal growth factor receptor 2 (HER2) status, KI67 status and mode of detection [1e3]. PREDICT Breast was developed using cancer registry data from 5694 women diagnosed in East Anglia, United Kingdom, between 1999 and 2003 [4]. Separate breast cancer-specific mortality models were derived for ER-negative tumours and ERpositive tumours. The survival for patients with breast cancer is estimated by the hazard ratios of the risk factors in combination with the baseline survival function derived from a Cox proportional hazards regression model. It is possible to include additional prognostic factors into the model, even if data on those factors were not available in the data used to derive the model, by applying the external estimates of prognostic effects to the baseline hazard function. This approach was used to incorporate HER2 status and KI67 status, which led to an improvement in predictive performance [2,3].
Progesterone receptor (PR) status is a biomarker that has been shown to be prognostic in early invasive breast cancer in a large number of studies [5e11]. It is usually assessed by immunohistochemistry and, in combination with ER status and HER2 status, can be used to classify the breast carcinoma subtype [7]. Furthermore, the expression levels of PR predict clinical outcomes and the beneficial effect of adjuvant hormonal treatments [6,8e10]. Thus, the addition of PR status to the PREDICT Breast model has the potential to improve the discrimination of the model and improve its clinical utility.
We had two specific aims. The first was to obtain estimates of the relative hazard for breast cancer-specific mortality associated with PR status after adjusting for the prognostic factors included in PREDICT Breast v2.2. The second was to incorporate this hazard ratio estimate into the PREDICT Breast model and compare the performance of the new model against the current model (PREDICT Breast version 2.2).

Prognostic effect of biomarker PR status
We evaluated the prognostic effect of PR status using data on patients with breast cancer of European ancestries collected by 49 studies in the Breast Cancer We estimated the hazard ratio for PR-positive tumours compared with PR-negative tumours using a Cox proportional hazards model for time to death from breast cancer stratified by study and adjusted for the PREDICT Breast v2.2 prognostic score. The PREDICT Breast v2.2 prognostic score (a log hazard ratio) was calculated for each case according to the formula reported in Candido dos Reis et al. (Table 1) [1]. Follow-up time was defined as the time from diagnosis to last follow-up or death from breast cancer or 15 years after diagnosis, whichever came first. In order to account for prevalent cases, time at risk started at the study entry (left truncation). This provides an unbiased estimate of the hazard ratio [12]. Separate models were derived for ER-negative breast cancer cases and ER-positive breast cancer cases.

Incorporation of PR status into PREDICT breast
The absolute risk of breast cancer-specific mortality is estimated in PREDICT Breast by applying the prognostic score to an estimate of the baseline hazard that was developed using a cohort of breast cancer cases with unknown information on PR status. Thus, the underlying baseline hazard represents breast cancer cases with an average PR status. The estimates of the prognostic effects of PR status for ER-negative tumours and ER-positive tumours were, therefore, rescaled to give an average hazard ratio of unity using a prevalence of PR positivity of 14% in ER-negative cases and 83% in ER-positive tumours.

Validation study population
Data from a New Zealand population-based cancer registry were used for model validation [13]. Data were available on 11,365 patients with early invasive breast cancer (2194 ER-negative and 9171 ER-positive) diagnosed between 2000 and 2014 after the exclusion of cases with metastasis at diagnosis (639), those younger than 25 or older than 85 years old (524), tumour diameter larger than 20 cm (5), more than 20 positive lymph nodes (232), inconsistent follow-up time information (2) and those that did not undergo primary surgery (938).
Information on adjuvant systemic cancer treatments, chemotherapy and hormone therapy were also recorded. The New Zealand cohort did not include information on Table 2 Hazard ratios (95% C.I.) for progesterone receptor (PR) status and other prognostic factors for breast cancer-specific mortality stratified by oestrogen receptor (ER) status and study derived from the BCAC data for European ancestries. specific chemotherapy regimes. To derive the prognostic score, we assumed that patients who underwent chemotherapy before 2010 were treated with anthracycline-based regimen, and for those treated after this time, we assumed a taxane-based regimen. This is based on data for the most commonly used regimen in New Zealand (Mark Elwood personal communication).
In addition, information on the use of trastuzumab was not collected during follow-up. We assume that patients with a positive HER2 tumour and that were diagnosed after 2010, underwent trastuzumab treatment.
The dates and causes of death were extracted from the hospital records and from mortality records until 31st December 2014 and all patients were censored after this date. The primary end-point was breast cancerspecific survival. The expected survival probability for each patient was based on a follow-up time that was different for each patient up to a maximum of 15 years. For patients who survived, follow-up was from the date of diagnosis until the date of last follow-up. For patients who died, potential follow-up time was calculated as if the patient had survived to the end of the study, which is from the date of diagnosis until 31st December 2014.
For each patient, their breast cancer risk predictions were estimated using the two models; PREDICT version 2.2 and PREDICT version 2.2 with the inclusion of PR status (v2.3). Model calibration was performed to investigate the accuracy of the mortality estimates predicted by each model compared to the observed mortality rate. Additionally, a Chi-square test was used as a goodness-of-fit test in which the observed events were also compared with the number of predicted events (1 d.f.). Model discrimination was also evaluated through the calculation of the AUC (area under the receivereoperatorecharacteristic curve) for up to 15-year breast cancer mortality. The AUC was used to measure the accuracy of the classification of cases and non-cases for the two prediction models and to test for any beneficial effect of the addition of PR status to PRE-DICT Breast. The comparison of AUCs was done using the method of De Long et al. [14] implemented in the R package pROC. All analyses were conducted using R v4.1.2 in the R Studio environment.

Results
The 49 BCAC studies included 45,088 eligible European patients of whom 13,706 (30%) had PR-negative tumours and 31,382 (70%) had PR-positive tumours ( Table 1). During follow-up, there were 6974 recorded deaths with approximately 11 breast cancer deaths per 1000 person-years. The patient characteristics of the New Zealand cohort were very similar to those in the studies of BCAC apart from the proportion of patients that underwent chemotherapy (35%), which was lower than that for BCAC (46%). Initial analyses were restricted to patients of European ancestries. In univariate analyses, PR expression was associated with a better prognosis, with the magnitude of the effect being greater in ER-positive disease ( Table 2). The effect of PR expression was attenuated after adjusting for other prognostic factors. We evaluated whether the effect of PR varied by age or HER2 status by including an interaction term in the multi-variable model. There was little evidence for the interaction in either age at diagnosis (p Z 0.65 in ERpositive and p Z 0.43 in ER-negative) or HER2 status (p Z 0.36 in ER-positive and p Z 0.91 in ER-negative).
We also assessed between-study heterogeneity and plotted the estimated beta coefficient of PR status per study adjusted for the prognostic index ( Supplementary  Fig. S1). There was no evidence of heterogeneity in the ER-negative model (p Z 0.99) or in the ER-positive model (p Z 0.26).
The visual examination of plots of log-cumulative hazard against log-time and the Schoenfeld residuals against time showed that there was no serious violation of the proportional hazards assumption (Supplementary Figs. S2 and S3). The hazard ratios for the other prognostic factors from the multivariable model that included each prognostic factor separately were slightly different to those in the PREDICT model. Of particular note is that in the BCAC dataset, a significant association was observed for the mode of detection in ERnegative disease. It has previously been reported to be associated only in ER-positive tumours.
In order to apply the PR hazard ratio to the PRE-DICT Breast baseline hazard, it needed to be rescaled such that the mean hazard ratio was unity with the purpose that the reference category for the hazard ratio is a hypothetical case with average PR status. The proportion of cases that are PR-positive used for rescaling was the average from the combined BCAC studies (14% for ER-negative and 83% for ER-positive cases). The rescaled hazard ratios were 1.03 for PR-negative/ERnegative, 0.80 for PR-positive/ER-negative, 1.30 for PRnegative/ER-positive and 0.94 for PR-positive/ERpositive. The hazard ratios for all the other prognostic Table 3 The discrimination for up to 15-year breast cancer-specific mortality in the New Zealand validation cohort. The discrimination for up to 15-year breast cancerspecific mortality of PREDICT as measured by the AUC increased from 0.807 to 0.809 (p Z 0.023) for patients with ER-negative breast cancer and from 0.898 to 0.902 (p Z 2.3 Â 10 À6 ) for ER-positive cases ( Table  3). The calibration of the model was modest, with 1151 breast cancer deaths predicted compared to 940 that were observed during a 15-year follow-up (goodness-of-fit Chi-squared test p Z 5.0 Â 10 À10 ) ( Table 4). Over-estimation was worse in European patients with ER-negative tumours (366 predicted compared with 281 observed, p Z 8.9 Â 10 À6 ) than European patients with ER-positive tumours (442 predicted compared to 414 observed, p Z 0.183). Across ethnicities, the model performs better in ER-positive cases in comparison to ER-negative cases. Fig. 1 shows the calibration of PREDICT Breast including PR status across the quintiles of predicted risk.
The number of observed and predicted deaths from other causes and deaths from all causes in the New Zealand cohort are shown in Tables 5 and 6. Overall, PREDICT Breast with the inclusion of PR status shows to be well-calibrated in predicting non-breast-cancerspecific mortality with an over-estimation of 0.4% (670 predicted compared with 667 observed, p Z 0.908). The model shows to be slightly over-estimating the number of non-breast cancer deaths in patients of European descent by 6.8% (546 predicted compared with 511 observed, Table 4 Cumulative observed versus predicted breast cancer deaths at up to 15 years follow-up by ethnicity in the New Zealand cohort.  We then carried out a sensitivity analysis using the alternative assumptions for chemotherapy and trastuzumab treatment. Table S2 shows the predicted breast cancer deaths with the assumption that patients who underwent chemotherapy were treated with anthracycline-based regimen (second-generation regimen). Table S3 shows the predicted breast cancer deaths with the assumptions that all patients with HER2-positive tumours were treated with trastuzumab, and patients who underwent chemotherapy and were diagnosed before 2010 were treated with anthracycline-based regimen and for those diagnosed after this time were treated with a taxane-based regimen. The model appears to be miscalibrated and results show that the calibration is sensitive to the treatment assumptions made prior to the analyses.
In order to determine the clinical impact of the small improvement in discrimination, we estimated the reclassification of risk for PREDICT v2.2 þ PR compared to PREDICT v 2.2 based on classifying cases from the New Zealand cohort into three categories of breast cancer-specific mortality at ten years, less than 15%, 15% to less than 20% and 20% or greater. These thresholds are approximately equivalent to the thresholds for the absolute risk reduction of chemotherapy of 3% and 5% used by the Cambridge Breast Unit Multidisciplinary Team for clinical decision-making [15]. Table 7 shows that in total 4.2% of cases changed risk category, of which 2.4% changed from a lower risk category to a higher risk category.

Discussion
The primary aim of this study was to estimate the prognostic effect e as the relative hazard e of PR expression in breast cancer after adjusting for the other prognostic factors incorporated in the PREDICT Breast prognostic tool. Importantly, the effects of other prognostic factors were constrained to the same effect sizes as used in the PREDICT Breast model. This enabled us to incorporate progesterone expression into PREDICT Breast by applying the relative hazard to the baseline hazard which is specified in the PREDICT Breast model. The BCAC data set on which this analysis was based is large, with over 45,000 cases of European ancestries from 49 separate studies from around the world and over 3500 deaths from breast cancer during followup. In addition to the large sample size, the heterogeneity inherent in combining data from multiple studies is strength as the findings should be robust and widely generalisable. While a large number of cases of south Asian ancestries were also available from the BCAC data set, there were a small number of breast cancer deaths during the follow-up and impact of ancestry on the association between PR expression and prognosis could not be reliably assessed.
The heterogeneity of study design and conduct is also reflected in the measurement of the prognostic factors included in the analyses. In particular, different studies used different data sources to determine ER, HER2 and PR status including clinical records and research data. Consequently, different studies used slightly different definitions to classify ER, HER2 and PR status and Table 6 Cumulative observed versus predicted all-cause deaths at up to 15 years follow-up by ethnicity in the New Zealand cohort.  Table 7 Reclassification of predicted breast cancer-specific mortality following the inclusion of PR status into PREDICT Breast. these data could not be fully harmonised across studies. Any measurement error resulting from this is likely to have biased the association of PR status with survival towards the null but any such bias is expected to be small. Our results are broadly similar to the extensive published data [5e11, 16,17] and show that patients with a positive PR tumour have a better survival than patients with a PR-negative tumour regardless of their ER status. There was little difference in the relative hazard estimates after adjusting for a prognostic index constrained to the effect size used in the PREDICT Breast model or in full, multi-variable model that allowed the hazard ratios for the other prognostic factors to fit the data. Previous reports have shown that the prognostic effect of PR status varies with age at diagnosis with a bigger effect being observed in younger patients [16,17], particularly during the first five years of follow in one of the studies [16]. However, we found little evidence for a difference in the effect with age.
We used the relative hazard estimates to incorporate progesterone receptor expression into the PREDICT Breast model and compared the performance of the modified model with that of the current version of PREDICT Breast as used in the online web tool (v2.2). This was done using a completely independent data set from New Zealand. The addition of a single prognostic factor to a multi-variable prediction model would not be expected to improve the performance of the model substantially. Nevertheless, the addition of PR status resulted in a small, but statistically significant improvement in the discrimination of PREDICT Breast compared with the current version. Similarly, the small proportion of patients being reclassified when using clinically relevant categories of risk that was observed was as would be expected. The calibration of the modified version of PREDICT Breast would not be expected to change much as calibration is primarily dependent on the baseline hazard which was the same in the modified and current models and then depends on the assumption about the proportion of cases that are PR-positive used to rescale the hazard ratios as described in the methods. The calibration of the modified models in an independent data set was modest with the number of breast cancer deaths in the New Zealand cohort being over-estimated by 22%. This was, as expected, similar e albeit slightly worse e to the calibration of the current model. The miscalibration was similar for all ancestries and was worse in patients with ER-negative. PREDICT Breast has previously been shown to be well-calibrated in cases series from the UK, Canada, the Netherlands and Malaysia, and the reasons for the poorer performance in the New Zealand data set are not clear. One possible explanation is that the baseline hazard for PREDICT is based on a cohort of patients from the UK diagnosed from 1999 to 2004 whereas the New Zealand cohort was diagnosed from 2000 to 2014. There have been improvements in prognosis over time and so some over-estimation of deaths is expected. This is supported by the observation that there is an improvement in the calibration of PREDICT Breast including PR status when performing analysis on patients diagnosed between 2000 and 2004, with an over-estimation in breast cancer deaths of 7.7% in all patients and 3.6% in European patients, compared to 22.4% and 16.3% for patients diagnosed between 2000 and 2014. Some of these improvements are the result of the introduction of newer therapies such as bisphosphonates, increased the duration of hormone therapies and improvements in the management of disease at the time of relapse. However, information on these therapies was not available for the validation data and so could not be accounted for in the analyses. A simple country-specific recalibration of the baseline hazard function or a reestimation of the baseline hazard using more contemporaneous data would improve the model performance.
The expression of biomarkers such as ER, HER2 and PR is continuous but then dichotomised based on a threshold for use in clinical practice. For ER and HER2 status, this is primarily done to facilitate decisionmaking for specific adjuvant therapies. There is good evidence that the prognostic effect of these biomarkers varies with the level of expression [18e20] and the inclusion of a multi-category ordinal scale or a continuous measure of expression in the model has the potential to improve model performance.
In conclusion, the incorporation of the prognostic effect of PR status into PREDICT Breast has resulted in a small, statistically significant improvement in discrimination with some reclassification in clinically relevant risk thresholds. On the other hand, the calibration of the modified PREDICT model in an independent data set was slightly poorer. The improvement in discrimination is likely to be generalisable across diverse case cohorts as it is primarily dependent on the magnitude of the hazard ratio associated with progesterone receptor status which is likely to be robustly estimated. In contrast, calibration is dependent on the baseline hazard which may vary across different populations and time periods as well as the distribution of the biomarker in different populations. Thus, progesterone receptor expression will be included into a new version of PREDICT Breast (v2.3) based on the improvement in discrimination and the reclassification. Further studies should investigate the potential improvement that recalibrating the baseline hazard function could have on country-specific model performance.