New benchmarks to design clinical trials with advanced or metastatic liposarcoma or synovial sarcoma patients: An EORTC e Soft Tissue and Bone Sarcoma Group (STBSG) meta-analysis based on a literature review for soft-tissue sarcomas

Background: Recently, we performed a meta-analysis based on a literature review for STS trials (published 2003 e 2018, (cid:1) 10 adult patients) to update long-standing reference values for leiomyosarcomas. This work is extended for liposarcomas


Introduction
Soft-tissue sarcomas (STS) are very heterogeneous rare mesenchymal malignancies that account for about 1% of all adult tumours. In general, over the years more than 100 histologic subtypes have been recognised with a widely varying presentation, sensitivity to treatment and long-term outcomes [1]. The prognosis for advanced STS is poor with median overall survival (OS) now ranging from 12 to 18 months [2]. The most common site of metastasis is the lungs, but other (intraabdominal, bone) locations are not uncommon [1e3]. Systemic treatment represents the mainstay for the management of the locally advanced or metastatic disease. For first-line treatment of STS, doxorubicin alone or in combination with ifosfamide has been considered the most active drug (combination) for several decades [2]. After first-line drugs, subsequent treatments depend on subtype. Among the most used ones in second-and further-line are gemcitabine with/without docetaxel, trabectedin, pazopanib, and dacarbazine with/without gemcitabine which have been associated with a progression-free survival (PFS) benefit in doxorubicintreated patients [4]. The combination of olaratumab þ doxorubicin appeared to show a survival benefit compared with doxorubicin alone in a randomised phase II study [5], but eribulin is the only drug to have shown a survival benefit, although curiously no benefit in PFS.
Liposarcomas (LPS), one of the most common STS types (15e20% of all STS), are complex and diverse neoplasms [6]. These tumours can be separated into three biological subtypes based on specific genetic alterations: well-differentiated/dedifferentiated LPS that is the most common (w70%), myxoid LPS (w20%), and pleomorphic LPS (w5%) which has the worst prognosis [7,8]. Currently, available systemic therapies include anthracycline-based treatment for first-line typically with doxorubicin (with/without ifosfamide), and trabectedin or eribulin after anthracycline failure.
A somewhat less common STS type with varying clinical behaviour and response to treatment is synovial sarcoma (SS; 5e10% of STS) [9,10]. Patients with SS have a relatively young age at diagnosis (mean 39 years) [10]. These tumours are either monophasic (pure sarcomas), biphasic (epithelioid and sarcomatous components combined), or poorly differentiated and have a unique biology among STS characterised by SYT-SSX1, 2 or 4 translocations [11]. In the advanced/metastatic setting, SS usually shows a higher chemosensitivity compared to other STS histotypes. SS is commonly treated with anthracyclines and/or ifosfamide in firstline, while high-dose continuous infusion ifosfamide, pazopanib, and trabectedin represent the most used agents in pre-treated patients [10,11].
In 2002, Van Glabbeke et al. [12] published a pooled analysis with independent patient data calculating progression-free rates for first-line or pre-treated STS patients who had been included in phase II trials of the European Organisation for Research and Treatment of Cancer (EORTC) e Soft Tissue and Bone Sarcoma Group (STBSG) database. Efficacy thresholds were estimated in order to make a distinction between active and inactive antineoplastic agents. In first-line, a 6month rate of 30e56% was considered as a reference value for drug activity depending on histology. For the pre-treated population, a 3-month rate !40% suggested drug activity and 20% inactivity for any histologic subtype. These values have been applied extensively (>420 citations) to design new studies for all STS.
In a previous study by our group (Kantidakis et al., 2021) [13], we collected summary estimates from an extensive literature review of phase II, III, or IV studies published between 2003 and 2018 on advanced or metastatic STS to provide an update for leiomyosarcomas (LMS) e the most frequently appearing histologic type in these publications. The primary endpoint was defined as progression-free survival rate at 3 or 6 months (PFSR; counting any death as an event) which is nowadays a preferred and more popular endpoint than progression-free rate (censoring deaths not related to disease). Drugs were classified as recommended or not based on the European Society for Medical Oncology (ESMO) 2018 guidelines [14]. Since the differences between recommended and non-recommended agents were not significant, the overall pooled PFSR was used as a reference. The ESMO Magnitude of Clinical Benefit Scale (ESMO-MCBS) [15] pinpointed the treatment effects to target for a clinically relevant benefit in future phase II trials. For first-line LMS, a PFSR at 6 months !70%, and for pre-treated population, a 3month PFSR !62% or at 6-month PFSR !44% would suggest drug activity.
Historically, the majority of STS trials have been designed with a one-size-fits-all principle mixing several histologic types. However, our recent study is in accordance with a trend towards histology-specific tailored research [1,3]. Importantly, the 2002 efficacy thresholds should be updated and recalibrated for prevalent advanced/metastatic STS types to reflect modern clinical practice, as future agents should perform better than currently available standards of care. Here, the aim is to extend our 2021 study for advanced/metastatic LPS or SS, the second and third most common types in our literature review (2003e2018), which differ from real-life incidence [16], to provide benchmarks to design new phase II studies with PFSR as the primary endpoint.

Search strategy and selection criteria for the literature review
The literature review and meta-analyses were conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [17]. An electronic search was performed in PubMed for phase II, III, or IV clinical trials with advanced/metastatic or non-operable STS patients. Studies were published in English between January 1, 2003 and December 31, 2018. Eligible study designs included non-randomised trials, randomised controlled trials, and prospective real-life studies. Study domain included first, second, or later line systemic therapy. Papers with retrospective clinical data, case-control studies, early phase trials, pooled analyses, and reports were excluded as well as those devoted to bone sarcomas, GIST, or paediatric population.
A two-step procedure was performed by three authors (G.K., A.N., and M.V.) to construct the database. More information about the trial selection can be found in Ref. [13] and in Appendix pp. 3e4.

Extracting information for the meta-analyses
For this work, the focus was on the second and third most prevalent STS in the database; LPS and SS. Two meta-analysis databases were designed with a row per treatment arm and treatment line (first-line versus pretreated population). For each of them, G.K. extracted the number of evaluable LPS or SS patients for PFSR (those included in the efficacy dataset based on the statistical plan's criteria), the PFSR at 3 and 6 months together with the 95% confidence intervals (95%-CIs), and the year of study activation. Placebo/best supportive care arms, arms with <10 patients, mixed treatment lines, or studies activated before 2000 were removed from the database. When summary PFS estimates (at 3 and 6 months) could not be retrieved from a publication, they were requested from first authors and/or study sponsors.

Statistical methods
In both databases (for LPS and SS), a random-effects model was employed to estimate the overall PFSR at 3 and 6 months per line of treatment (first-line versus pretreated). The DerSimonian and Laird method was used to estimate the between-study variance in clinical trials [18,19]. The inverse variance method was used to pool treatment-specific PFS estimates (more weight is given to larger studies). For each treatment arm, the number of cases (patients alive and progression-free) was approximated based on the total number of evaluable patients and the recorded PFS estimate; the equivalent PFS proportion is defined as cases/evaluable patients. The calculated number of cases was employed under a binomial distribution to estimate the variance (unknown quantity) for each drug or combination and the 95%-CIs [20,21]. The treatment-specific PFS estimates are presented on forest plots. The overall heterogeneity between studies is provided by the I 2 statistic (variability between the study-specific effect sizes which cannot be explained by random variation) [22].
The ESMO 2021 guidelines [23] were used to classify each drug (or drug combination) as recommended treatment (R-T) or non-recommended treatment (NR-T) per treatment line and histologic subtype. The difference in PFS between the two groups of drugs (R-T versus NR-T) was formally compared using metaregression (subgroup meta-analysis) with a chi-square statistic. The effect of other predictors on PFS (phase of the trial, study design, year of activation, sample size) was also tested in univariate models to address if they can explain part of the residual heterogeneity. Funnel plots and formal regression tests were used to assess the risk of publication bias [24e26]. Potentially influential studies and studies contributing to heterogeneity were detected with Baujat plots [27]. The choice of the therapeutic benefit to target in future trials was guided by the ESMO-MCBS [15]. Analyses were performed using packages metafor and meta in R version 4.1.2 [28,29]. Reported p-values are two-sided. Further methodological details can be found in Appendix pp. 12e14.

Clinical trials included
The study selection is provided in Fig. 1  Collecting extra information (PFS estimates at 3e6 months) was of paramount importance because of the very limited data availability before the enrichment of the databases by the sponsors (PFS estimates could only be recovered by the publications for two studies with LPS, and one study with SS patients). Table 1 Main characteristics of all studies included in the LPS or SS meta-analyses. Studies in the SS database are presented in shade. Treatments were classified as recommended (yes or no) according to ESMO 2021 guidelines [23]. Study period Z period of first to last patient accrual. Evaluable patients were those who satisfied the study's statistical plan criteria for inclusion in efficacy datasets. Trabectedin 24h Z trabectedin 24-h infusion treatment schedule. The 3-h infusion treatment arm was excluded from the LPS meta-analysis due to limited number of patients (n Z 6 < 10). The Gelderblom study (2014) [54] contained two treatment arms: doxorubicin and brostallicin. The doxorubicin arm was excluded from the LPS meta-analysis because it did not reach the predetermined number of patients (n Z 9 < 10). In the Blay study (2014) [53], the control arm: doxorubicin þ ifosfamide was removed from SS meta-analysis as it did not reach the required sample size (n Z 9 < 10). Placebo/best supportive care arms were also not included (Mir et [23]. Heterogeneity refers to variability in outcomes (PFS proportions) between the studies that cannot be attributed to random variation. A PFSR is the proportion Â 100.

Characteristics of trials
A total of 1030 patients were evaluable for the LPS meta-analysis (range 10e93 patients per trial, Table 1) and 348 for the SS meta-analysis (range 10e46, Table 1).
In first-line, the most common regimens were doxorubicin alone or in combination with ifosfamide (eight times) for LPS and doxorubicin monotherapy or in combination with evofosfamide or ifosfamide (four times) for SS. In pre-treated population, eribulin and trabectedin were the most common drugs for LPS (three times) and pazopanib for SS (two times).

Risk of publication bias
The contour enhanced funnel plots did not indicate systematic asymmetry between the studies included for LPS or SS meta-analyses with the exception of pretreated LPS population at 6 months. Tests for funnel plot asymmetry indicated low risk of publication bias in the databases for SS and first-line LPS, as well as high risk of bias for pre-treated LPS at 6 months (see Appendix sections 2.3 and 2.4). However, publication bias cannot be excluded for first-line SS patients because of the very limited number of studies (three trials, five treatment regimens).

Sensitivity meta-analyses
Regarding LPS (see Appendix section 2.3), Baujat plots detected 'Blay 2014: Trabectedin' [53] as a potentially influential treatment regimen for first-line at 3 and 6 months (overall pooled PFSR decreased 2% and 3% after the exclusion of this treatment regimen). Overall heterogeneity slightly decreased. For patients previously treated with systemic therapy, 'Samuels 2017: pazopanib' [44] was identified by Baujat plots and diagnostics (overall PFSR decreased 2% and 1% at 3 and 6 months but heterogeneity did not go down). Results were robust to the candidate outlier in the pre-treated setting and less robust in the first-line setting. Secondly for SS (see Appendix section 2.4), the plots and diagnostics for first-line agents pointed out 'Judson 2014: doxorubicin þ ifosfamide' [32] as the most influential study (overall pooled PFSR decreased 6% at 3 months and 10% at 6 months after removing it from the database). Overall heterogeneity dropped substantially, which could be expected because of the limited studies here (three clinical trials, five regimens). For pre-treated population, the treatment regimen of 'Robbins 2015: cyclophosphamide þ fludarabine þ TCR transduced cells' [57] was detected as outlier (overall rate decreased 2% and 1% but heterogeneity did not change substantially). Findings showed that meta-analyses were robust in the pre-treated but not robust in first-line setting (because of the only five treatment regimens in total).

New benchmarks
Similar to our previous LMS meta-analysis [13], the overall pooled PFSRs at 3 or 6 months are used as the reference values for the parameter P 0 (null hypothesis). To elaborate on this, for all LMS, PFS rates did not differ significantly between the two groups of drugs (R-T, NR-T) for first or further lines of treatment. Here, results for LPS and SS were concordant with the exception of previously treated LPS patients where differences between R-T and NR-T were significant. For the sake of consistency, it was decided to use the overall pooled rates to guide P 0 . To calculate the reference values of the parameter P 1 (alternative hypothesis), the ESMO-MCBS suggestions [15] in an advanced/metastatic setting were employed assuming an exponential PFS curve. The tool recommends a hazard ratio (HR) 0.65 (scale evaluation form 2b).
Parameters P 0 and P 1 are provided in Table 2 per treatment line and analysed group. For LPS, the minimum values to reach for suggesting drug activity in firstline patients are 79% and 69% (82% and 69% for SS) at 3 and 6 months. For pre-treated patients, recommended rates are 63% and 44% (60% and 41% for SS), respectively. Owing to the limited numbers of studies and the differences between primary and sensitivity analyses, benchmarks for first-line SS patients have to be interpreted with caution. Please see fig. 6 of Ref. [13] for further details on how to use these benchmarks (P 0 , P 1 ) to aid the design of new phase II studies.
Suppose that we would like to calculate the sample size for a new phase II trial with pre-treated LPS patients and a single-stage A'Hern design [58], given the new thresholds, assuming that the primary endpoint is PFSR at 3 months (i.e. P 0 Z 49%, P 1 Z 63%). The power and sample size are computed under the alternative hypothesis that P Z P 1 . For a type I error 10% (a Z 0.10) and 80% power (b Z 0.20), a total of 60 eligible patients will need to be treated and followed for the assessment of the primary endpoint. This design would then require 35 patients alive and progressionfree to justify further drug investigation.

Discussion
This research project yielded efficacy thresholds to design new phase II clinical trials for advanced/metastatic LPS or SS patients with PFSR at 3 or 6 months as the primary endpoint, based on meta-analyses of summary data collected from sponsors and published papers (2003e2018). Reference values were estimated for the parameter of null hypothesis (P 0 ) as the overall pooled PFSR per treatment line, and new values were calculated for the parameter of alternative hypothesis (P 1 ) using the recommended treatment effects to target by the ESMO-MCBS recommendations [15].
Two decades ago, the Van Glabbeke study [12] suggested benchmarks for various STS patients who participated in phase II clinical trials of the EORTC-STBSG database for treatment with inactive (used for P 0 ) or active agents (used for P 1 ). Hereto, the authors Table 2 Treatment effect (PFSR) for the null hypothesis (H 0 ) parameter P 0 and the alternative hypothesis (H 1 ) parameter P 1 of a study for LPS or SS patients. Note: LPS, liposarcoma. SS, synovial sarcoma. PFSRs for SS are presented in shade. The overall pooled PFSRs at 3 and 6 months were used to provide reference values for P 0 . Using the recommended treatment effect for PFS by the ESMO Magnitude of Clinical Benefit Scale (ESMO-MCBS), minimum values to target for P 1  performed an individual patient data (IPD) metaanalysis using progression-free rate as the principal endpoint. In first-line setting with anthracycline-containing regimens, a rate of 64% or 55% suggested drug activity for LPS (n Z 110), whereas for SS (n Z 115), a rate of 77% or 56% at 3 or 6 months, respectively. On the other hand, in the pre-treated setting, reference values for activity were calculated based on 146 patients from all STS subgroups (39% or 14% at 3 and 6 months, respectively). These values have now been updated and re-evaluated for LPS and SS, per treatment line, to reflect current practice (see Table 2). However, a direct comparison is not meaningful since here we used summary estimates (and not IPD) of a larger number of patients per histotype: 1030 LPS (310 first-line), 348 SS (106 first-line) from phase II, III, or even IV clinical trials, defined benchmarks separately for first-line or pre-treated LPS or SS patients by employing the overall pooled PFSR as P 0 (based on inactive and active agents) and the ESMO-MCBS tool to target P 1 , and used PFSR (any death counted as an event) instead of progressionfree rate as the primary endpoint. Based on our sensitivity meta-analyses, the new thresholds were shown to be robust (stable) in pretreated LPS and SS patients. However, values were less robust for first-line LPS, and not robust for first-line SS. Removing one outlier decreases the 6-month PFSR by 10% for first-line SS. This indicates an inconsistent estimate (there), which was expected due to the very limited number of studies (three clinical trials e five treatment regimens). Publication bias was not observed based on the tests except for pre-treated LPS patients at 6 months. A high risk of publication bias could lead to a biased estimate of the summary effect. This is a further reason to push publication of trials regardless of their results. Heterogeneity between studies was moderate to high for first-line LPS or SS patients (I 2 > 40%), as well as high for pre-treated LPS or SS (I 2 > 60%). Note that a (very) high overall heterogeneity (I 2 ) indicates a large variation between-study-specific effect sizes which could challenge the validity of the meta-analyses. In particular, results for pre-treated subjects should be interpreted with caution due to substantial variability. Heterogeneity could not be explained by meta-regressions (subgroup meta-analyses). Findings of excessive heterogeneity are consistent with our previous work for all LMS (Kantidakis et al., 2021 [13]). Further research is needed to better address this heterogeneity.
Benchmarks provided in this manuscript are directly comparable with those for LMS [13] since they are based on the same literature review for STS and estimated using the same methodology. For first-line treatment, to suggest drug activity, the proposed 3-month PFSRs are slightly higher for all LMS/SS (82% for both) versus LPS (79%). Differences at 6 months are minimal (70% for all LMS, versus 69% for LPS/SS). For second or later lines, values to reach for LPS (63% and 44% at 3 and 6 months, respectively) and all LMS (62% and 44%) are a bit higher than those recommended for SS patients (60% and 41%). Thus, a need to raise the bar of thresholds for the commonest STS types in future phase II trials is indicated by both of our studies, which aligns with the perspective of the American Society of Clinical Oncology [59]. The cost-benefit of new systemic therapies for cancer should be balanced against the societal resources in this era of rapidly rising healthcare costs.
These manuscripts share a number of limitations. First, the large majority of the trials were designed for several STS types and are therefore underpowered for specific subgroup analyses (i.e. here for LPS and SS). This could explain the non-significant difference between recommended and non-recommended treatments based on the standard ESMO guidelines [23] for firstline LPS/SS and pre-treated SS patients (and also for all LMS in the previous study). Secondly, PFSRs were calculated based on summary estimates per treatment arm and treatment line, which are less reliable than IPD data but require a smaller amount of time to be collected from the different study sponsors. Thirdly, LPS were addressed as a single disease while it is known that there are three different LPS histologic subtypes (e.g. well differentiated/dedifferentiated, myxoid, or pleomorphic) that exhibit different clinical behaviour and sensitivity to treatments. Yet, in older studies, such information might not have been collected at the subtype level. Moreover, the condition of any meta-analysis that the effect sizes between drugs of the same trial are independent may be violated in the randomised studies, as a random-effect model was used for each treatment regimen. We observed a high unexplained overall heterogeneity indicative of a large variation between effect sizes, which may limit our meta-analytic results. Finally, as emphasised in our previous meta-analysis for LMS [13], strong surrogacy properties between PFS and OS are questionable based on two meta-analyses of randomised studies with advanced STS [60,61]. Thus, PFS might lead to exaggerated enthusiasm for a new anticancer therapy (see Refs. [5,62]). As such, PFS can be used as the primary endpoint in phase II trials or as futility endpoint in phase III trials, but OS should remain the optimal primary endpoint in phase III trials.
For instance, the sample size of EORTC 1202 study for second-line patients with metastatic or inoperable locally advanced dedifferentiated LPS with cabazitaxel [63] was calculated based on a Simon two-stage optimal design (a Z b Z 0.10) [64] and the Van Glabbeke rules (P 0 Z 20%, P 1 Z 40%). Stage one required 4/17 eligible patients progression-free, and stage two required 11/37 eligible patients progression-free at 12 weeks. Hence, according to these rules, the 1202 study has met its primary endpoint (21/38 or 55.3% of patients progression-free at 12 weeks) indicating activity of cabazitaxel. Nevertheless, according to the new values (i.e. P 0 Z 49%, P 1 Z 63%, see Table 2), it may be challenging to obtain a significant and relevant improvement over a standard of care in a prospective randomised phase III trial. Note that our new benchmarks might require relatively large sample sizes for new phase II studies because of the smaller target difference between P 0 and P 1 compared to the ones previously proposed. This could be overcome by targeting a larger treatment difference, e.g. P 1 Z 69% instead of 63% for a P 0 Z 49%, or to choose PFSR at 6 months as the endpoint where the differences between P 0 and P 1 are larger. Our analyses clearly show that the cut-offs provided by Van Glabbeke et al. are suboptimal (3-and 6month rates of P 1 Z 64%, 55% for LPS, and 77%, 56% for SS in first-line, 40% and 20% for any histologic subtype in pre-treated setting), they can no longer pave the way to new standard of care. Our benchmarks are setting the bar higher, aiming to identify earlier in the drug development process compounds which have a higher chance to impact clinical practice. If traditional clinical trial designs are deemed unfeasible, more complex and flexible options (e.g. adaptive designs) could be considered. Especially in ultra-rare sarcomas or when accrual is particularly demanding in terms of numbers or timeframe, recruitment challenges could be overcome through international, multi-centre trials.
There are certain LPS subtypes that could benefit from non-licensed agents. For instance, trabectedin was shown to be highly active for first-line myxoid LPS in the Blay 2014 study [53] (3-and 6-month PFSR of 96%), and pembrolizumab is currently used on an individual basis for dedifferentiated LPS, but as they are not formally approved for front-line treatment of STS, they are not recommended for first-line treatment of LPS according to the ESMO 2021 clinical practice guidelines. Prospective data to support emerging agents are currently lacking, and this is preventing their adoption in practice. Even if randomised controlled trials are the golden standard, real-world evidence or single-arm phase I/II trials can be helpful for cancer types with rare/ ultra-rare indications e including many STS e to accelerate the development and approval of new anticancer treatments [65].
Mesenchymal tumours (i.e. STS) are regarded as one of the most challenging fields of diagnostic pathology [66]. An accurate diagnosis is laborious for nonspecialised pathologists. Data have indicated a proportion of diagnostic error 25e40% in STS [6,67]. It may also be challenging to obtain the correct classification within a histological type (e.g., well differentiated could be re-graded as dedifferentiated LPS) [68]. Patients should have computed tomography (CT) scans performed within reference sarcoma centres to improve diagnosis and tailoring treatment allocation [11]. Furthermore, STS have demonstrated a tremendous heterogeneity (genetic and histologic diversity, clinical prognosis, metastatic patterns, etc.) [69]. Therefore, the management of adult STS requires a multidisciplinary approach where collaboration is key to allow sufficiently large studies [3].
In advanced/metastatic STS, therapeutic options beyond first-line (anthracyclines) are increasingly driven by histology. An urgent need remains for the development of individualised treatment plans such as targeted therapy to move away from the conventional chemotherapy options. This work provides modern thresholds for suggesting drug activity, this time for LPS and SS patients, to aid the design of new histology-tailored phase II trials using PFSR at 3 or 6 months as endpoint. We hope that phase II studies which meet the updated thresholds for these histotypes will then lead to higher success rates in new prospective phase III trials to avoid the large costs associated with their failure.

Role of the funding source
This work was supported by the European Organisation for Research and Treatment of Cancer -Soft Tissue and Bone Sarcoma Group (EORTC-STBSG). The funder of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. work. L.D'.A. has reported advisory board for PSI CRO Italy, GSK, AstraZenca, and Eisai; received travel support for meeting participation from Pharmamar, Eli Lilly, Celgene, and AstraZeneca; all outside the submitted article. W.G. has reported advisory role for Bayer, GSK, Springworks and PTC Therapeutics; received research grant support to the Institute from Eli Lilly; all outside the submitted work. All remaining authors have declared no conflicts of interest.