previous day
next day
all days

View: session overviewtalk overview

13:00-14:30 Session IS3: INVITED : Optimal design of longitudinal cluster randomised trials
Faster and more agile designs: speeding up the stepped wedge with batched designs

ABSTRACT. Stepped wedge designs are an increasingly popular variant of longitudinal cluster randomised trial designs. Stepped wedge designs roll interventions out across clusters in a randomised, but step-wise fashion, and gain power over standard cluster randomised trials through within-cluster comparisons. However, the standard stepped wedge design is typically neither fast nor agile. All clusters must start and end trial participation at the same time, implying that ethics approvals and data collection procedures must be in place in all clusters before a stepped wedge trial can start in any cluster. Hence, although stepped wedge designs are useful for testing the impacts of many cluster-based interventions on outcomes, this requirement means that there can be lengthy delays before a trial can commence.

In this talk we will discuss the “batched” stepped wedge design. Batched variants of stepped wedge designs allow for clusters to come on-line to the study in batches, instead of all at once, and thus can be deployed more quickly. However, like the stepped wedge, the batched step wedge rolls the intervention out to all clusters in a randomised and step-wise fashion. Provided that the effect of time is appropriately included in the regression model for the outcome, sample size calculations are straightforward and the power of the study will be robust to delays with the start-up of batches. Researchers can also modify sample size calculations to accommodate adaptations such as early stopping for futility or success, or for sample size re-calculation.

Longitudinal cluster randomised trials with continuous recruitment

ABSTRACT. When a stepped wedge or other longitudinal cluster randomised trial recruits/identifies a consecutive sample of participants from a continuous stream presenting at clusters over a given calendar period, it is quite a different prospect to sampling in a series of discrete, cross-sectional slices. For one thing, introducing an intervention mid-stream to a cluster could contaminate participants recently recruited under the routine care condition. For another, it is inadequate to speak of distinct time “periods”: two individuals recruited at either end of the “same” period may have less in common than two individuals recruited just on either side of a “division” between periods. A continuous timescale also offers a continuously adaptable framework for designing a longitudinal trial: timing when to intervene, and when to start or stop recruitment.

This talk focuses particularly on maximising statistical efficiency in two very different design problems. In the simple case of a trial randomising clusters to two groups, intervention and routine care, with an initial, prospective baseline period during which all clusters receive routine care, I show how close-to-optimal efficiency is generally obtained either with no baseline period at all, or with a baseline period that divides the available time in half. This finding is robust to the form of the underlying, fixed effect of time, assuming this is correctly specified in the analysis model (I hope to have simulation results looking at how well different approaches to analysis fare under mis-specification.)

At the other end of the spectrum of design complexity is the case of a longitudinal cluster randomised trial where we choose when each cluster crosses from routine care to the intervention along a continuous timescale, and try to achieve the required statistical power by recruiting the smallest number of participants out of the total presenting at all clusters over the calendar period – an incomplete stepped wedge design. Search algorithms identify surprising solutions – in some instances resembling a series of before-and-after studies rather than concurrent comparisons of intervention and control – though for a design robust to the form of the underlying time effect a smooth “staircase” design may be preferable.

Optimal design of cluster randomized trials with baseline data comparing routine care to a new intervention

ABSTRACT. Background: In cluster randomised trials (CRTs) it is sometimes possible to choose a different cluster size (number of individuals measured per cluster) between trial arms, or between baseline and endline e.g. in the SNEHA-TARA trial where clusters are large communities and only a sample of individuals are surveyed. In most trials clusters can be allocated unequally to arms if desired. An optimal design minimises the total number of measurements required for a given number of trial clusters. For CRTs with cross sectional data and a continuous outcome, it is known how to (i) optimally allocate measurements between baseline and endline when the cluster autocorrelation (CAC) is the same across trial arms [1], and (ii) optimally allocate clusters and measurements when the variance or intra-cluster correlation coefficient (ICC) are affected by the intervention [2]. Objective: To extend previous work to trials comparing routine care to a new intervention, assuming a similar ICC and variance for both trial arms at baseline and in the routine care arm at endline, that the intervention is likely to reduce the CAC, and may affect the ICC and variance. Results: We present algebraic results, and graphical methods, to help identify optimal desgns for this setting. The reduction in number of measurements required compared to the standard design, where clusters are allocated equally to arms and the cluster size is equal over time and between trial arms, can be substantial where cluster sizes or ICC values are large. If the intervention reduces the CAC, but does not affect the variance or ICC, then the optimal design will typically involve (i) smaller cluster size in the intervention arm compared to rountine care at both baseline and endline, and (ii) more clusters allocated to the intervention arm. Conclusions: Optimal designs can save resources but designs must be chosen to maintain power across plausible ranges for the correlation and variance parameters which will often be wide. We recommend trialists report these parameters separately by arm to inform the design of future trials.

13:00-14:30 Session OC3A: Propensity score in causal studies
Multiple imputation in propensity score matching: obtaining correct confidence intervals

ABSTRACT. Propensity score matching is a popular method for handling confounding in observational analyses of observational data. This method builds a matched dataset, sometimes discarding many subjects from the final analysis, and where baseline covariates are balanced using the balancing properties of the propensity score. From this matched dataset, an average treatment effect can be estimated and under some conditions, it is an unbiased estimate of the true causal treatment effect.

One non negligible challenge when applying propensity score matching procedure is the presence of missing data. A classic approach in this case is to use multiple imputation to build several imputed datasets. From each of these imputed datasets, propensity scores are computed and matching is done leading to several completed matched datasets. From each of these matched datasets, an average treatment effect is estimated. Then, these treatment effects are aggregated using Rubin’s rules (Rubin, 1987) to compute an aggregated average treatment effect and its variance.

Previous work by Reiter et al., undertaken in the area of measurement error, highlighted a previously unrecognized phenomenon: that using individuals (or units) to develop an imputation model who are not subsequently included in the analysis can result in over-coverage of confidence intervals obtained using Rubin’s rules. We show, via simulation studies, that using a sample to perform multiple imputation and then matching, thereby potentially discarding a substantial number of individuals from the final analysis, leads to this phenomenon. We find the coverage of standard application of Rubin’s rules to be often > 99%. A simulation study to evaluate the Reiter’s procedure has been conducted and gives very satisfying results with most of the coverage rates around the nominal value of 95%. Finally, this result has been illustrated through an application to real data on lung cancer.

Effectiveness of screening colonoscopy in reducing colorectal cancer incidence: emulated target trials from German claims data

ABSTRACT. Introduction: Observational studies suggest a strong effect of screening colonoscopy in reducing both colorectal cancer (CRC) incidence and mortality. The preventive effect appears more pronounced for distal vs. proximal CRC (Brenner et al. 2014), but there is conflicting evidence regarding the size of this difference. Interpretation of available observational studies reporting site-specific effects is often hampered by low sample size or a statistical analysis that does not explicitly address avoidable biases, while ongoing randomized controlled trials (RCTs) on screening colonoscopy are not powered to assess site-specific effects. Methods: The emulation of target trials is a framework for analysing observational data aiming at causal conclusions, when an RCT is not available or desirable. Explicitly describing the target study protocol ensures a clear formulation and communication of the research question and avoids self-inflicted biases, such as immortal time bias. Building on and extending the approach of García-Albéniz et al. (2016), we emulate target trials using health claims data from Germany, where screening colonoscopy is routinely offered to individuals aged ≥55 years (the database GePaRD covers 20% of the German population). In contrast to García-Albéniz et al. (2016), our database includes individuals younger than 65 years, and we assess overall CRC incidence, but also proximal and distal CRC incidence by estimating event-specific cumulative incidence functions (CIFs). This is implemented using flexible pooled logistic models, avoiding an implausible proportional hazards assumption. CIFs will be compared between subjects undergoing screening at baseline and subjects not undergoing screening at baseline, corresponding to an intention-to-screen effect. Confounding will be adjusted for by inverse probability of treatment weighting. Confidence intervals will be estimated, using subject-level bootstrapping to account for repeated inclusion of subjects. Results: We will estimate covariate-adjusted event-specific cumulative incidence curves (CIFs), assessing the screening effect over 11 years of follow-up. Moreover, we will discuss the strengths and weaknesses of using the emulation of target trials framework in assessing screening effectiveness from observational data.

Variance estimators for weighted and stratified linear dose-response function estimators using generalized propensity score

ABSTRACT. Propensity score methods are widely used in observational studies for evaluating marginal treatment effects. The generalized propensity score (GPS) is an extension of the propensity score framework, historically developed in the case of binary exposures, for use with quantitative or continuous exposures. In this paper, we proposed variance estimators for treatment effect estimators on continuous outcomes. Dose-response functions (DRF) were estimated through weighting on the inverse of the GPS, or using stratification. Variance estimators were evaluated using Monte Carlo simulations. Despite the use of stabilized weights, the variability of the weighted estimator of the DRF was particularly high, and none of the variance estimators (a bootstrap-based estimator, a closed-form estimator especially developed to take into account the estimation step of the GPS, and a sandwich estimator) were able to adequately capture this variability, resulting in coverages below the nominal value, particularly when the proportion of the variation in the quantitative exposure explained by the covariates was large. The stratified estimator was more stable, and variance estimators (a bootstrap-based estimator, a pooled linearized estimator, and a pooled model-based estimator) more efficient at capturing the empirical variability of the parameters of the DRF. The pooled variance estimators tended to overestimate the variance, whereas the bootstrap estimator, which intrinsically takes into account the estimation step of the GPS, resulted in correct variance estimations and coverage rates. These methods were applied to a real data set with the aim of assessing the effect of maternal body mass index on newborn birth weight.

Confounder selection strategies targeting stable treatment effect estimators

ABSTRACT. Clinical research problem and statistical challenges: Inferring the causal effect of a treatment on an outcome in an observational study requires adjusting for observed baseline confounders to avoid bias. However, adjusting for all observed baseline covariates, when only a subset are confounders of the effect of interest, is known to yield potentially inefficient and unstable estimators of the treatment effect. Furthermore, it raises the risk of finite-sample bias and bias due to model misspecification. For these stated reasons, confounder (or covariate) selection is commonly used to determine a subset of the available covariates that is sufficient for confounding adjustment.

Objective: In this article, we propose a confounder selection strategy that focuses on stable estimation of the treatment effect. In particular, when the propensity score model already includes covariates that are sufficient to adjust for confounding, then the addition of covariates that are associated with either treatment or outcome alone, but not both, should not systematically change the effect estimator.

Statistical Methods: The proposal, therefore, entails first prioritizing covariates for inclusion in the propensity score model, then using a change-in-estimate [1] approach to select the smallest adjustment set that yields a stable effect estimate. The proposal therefore explicitly assesses the stability of the treatment effect estimator across different (nested) covariate subsets as a selection criterion.

Results: The ability of the proposal to correctly select confounders, and to ensure valid inference of the treatment effect following data-driven covariate selection, is assessed empirically and compared with existing methods using simulation studies. We demonstrate the procedure using three different publicly available datasets commonly used for causal inference.

Conclusion: The proposal was demonstrated empirically to yield approximately valid inference following a data-driven selection of covariates through the combined use of (i) double selection for prioritizing the covariates, (ii) stability-based assessment to select covariates for confounding adjustment, and (iii) randomization inference using full matching to control the type I error when testing the null of no (individual) treatment effect.

[1] Greenland S, Daniel R, Pearce N. Outcome modelling strategies in epidemiology: traditional methods and basic alternatives. Int J Epidemiol. 2016;45(2):565-575.

Causal inference for combining RCTs and observational studies: methods comparison and medical application

ABSTRACT. The simultaneous availability of observational and experimental data for the same medical question of a treatment effect is at the same time an opportunity and a theoretical and methodological challenge. In this work we address the question of how to leverage the advantages and information and how to address the shortcomings of both data sources to improve the validity and scope of the treatment effect estimates. This work is motivated by the analysis of a large prospective database counting about over 20,000 severely traumatized patients in France and a multi-centered international randomized controlled trial (RCT) studying the effect of tranexamic acid administration on mortality among patients with traumatic brain injury. We first discuss identification and estimation methods that improve generalizability of RCTs using the representativeness of observational data. Classical estimators include weighting, difference between conditional outcome models, and doubly robust estimators, especially calibration weighting. We then discuss methods that combine RCTs and observational data to improve the (conditional) average treatment effect estimation, handling possible unmeasured confounding in the observational data. We compare the methods with extensive simulations and highlight the very good behaviour of calibration weighting. Additionally, we propose an implementation of the different methods to provide analysis pipelines for reproducible data analyses. Finally, we propose to combine both structural causal model and potential outcomes frameworks to provide a complete workflow to analyze both data sources. The analysis shows that both RCT and the observational data conclude on a zero effect of the drug. The same conclusion is obtained when generalizing the effect of the RCT on the observational Traumabase data while taking into account the distributional shift. In the proposed analysis we also discuss additional challenges such as missing values and mixed data and propose several solutions to tackle them.

13:00-14:30 Session OC3B: COVID19 modelling
Factors involved in COVID-19 prognosis of patients hospitalized in Campania Region. Findings from COVOCA Study

ABSTRACT. Italy has been the first Western country heavily affected by COVID-19 and among pioneers of pandemic’s clinical management. Although investigated, the association between pre-existing comorbidities and clinical outcome remains controversial [1]. Identification of patients at the highest risk seems mandatory to improve the outcome. We thus aimed to identify comorbidities/clinical conditions upon admission associated with in-hospital mortality in Campania Region (Italy) during first peak. COVOCA is a multicentre retrospective observational cohort study, involving 18 COVID Centers, with data from patients who completed their hospitalization between March-June 2020. The primary endpoint was in-hospital mortality. Data were described both in the overall population and by considering comorbidity status (0 to +3 sum of comorbidities). The association between presence/absence of single comorbidity for each possible crossing, was assessed calculating the phi coefficient. Univariable/multivariable logistic regression models were performed to evaluate association between in-hospital mortality and exposure variables. Comparison between models, in order to evaluate model’s improvement and significance, was performed applying the Likelihood-ratio test for nested models or Bayesian Information Criterion (BIC) for non-nested models, preferring lower BIC models. As sensitivity analysis, Firth's correction by van Smeden M et al. was used to increase beta-estimates’ accuracy [2]. Cumulative incidence function (CIF) showed cumulative failure rates over time due to in-hospital mortality, with discharge as competing event. Among 618 COVID-19 hospitalized patients included (62% males, mean age 60 yrs.), 143 in-hospital mortality events were recorded (CIF 23%). At multivariable analysis, male sex (OR 2.63, 95%CI 1.42-4.90), Chronic Liver Disease (OR 5.88, 95%CI 2.39–14.46) and malignancies (OR 2.62, 95%CI 1.21–5.68) disclosed an independent association with a poor prognosis, as well as need for NIV ventilation or intubation. Higher Glasgow-Coma-Score values were instead associated with a better prognosis. Sensitivity analysis further enhanced these findings. Mortality of patients hospitalized for COVID-19 appears strongly affected by clinical conditions on admission and comorbidities. Originally, we observed a very poor outcome in subjects with chronic liver diseases, alongside with an increase of hepatic damage. Our findings allow to underline the fundamental importance of early identification of high-risk patients from hospitalization to improve the pandemic’s clinical management.

Laplace approximations for fast Bayesian inference of the time-varying reproduction number under misreported epidemic data

ABSTRACT. In epidemic models, the effective reproduction number Rt is of central importance to dynamically assess the transmission mechanism of an infectious disease and to orient health intervention strategies. Publicly shared data during an outbreak often suffers from two sources of misreporting (underreporting and delay in reporting) that should not be overlooked when estimating Rt. The main statistical challenge in models that intrinsically account for a misreporting process lies in the joint estimation of the time-varying reproduction number and the delay/underreporting parameters. Existing Bayesian methods typically rely on Markov chain Monte Carlo (MCMC) [1] that are extremely costly from a computational perspective. We propose a much faster alternative based on Laplace-P-splines (LPS) [2] that combines Bayesian penalized B-splines for flexible and smooth estimation of Rt and Laplace approximations to selected posterior distributions. Assuming a known generation interval distribution, the incidence at a given calendar time is governed by the epidemic renewal equation and the delay structure is specified through a composite link framework. Laplace approximations to the conditional posterior of the spline vector are obtained from analytical versions of the gradient and Hessian of the log-likelihood, implying a drastic speed-up in the computation of posterior estimates. Furthermore, the proposed LPS approach can be used to obtain point estimates and approximate credible intervals for the delay and underreporting parameters. Simulation of epidemics with different combinations for the underreporting rate and delay patterns (one-day, two-day and weekend delays) show that the proposed LPS methodology delivers fast and accurate estimates and highlights the added value from a computational point of view. Finally, we conclude by illustrating the use of LPS on a real case study of an epidemic outbreak.

[1] Azmon, A., Faes, C., & Hens, N. (2014). On the estimation of the reproduction number based on misreported epidemic data. Statistics in medicine, 33(7), 1176-1192.

[2] Gressani, O., & Lambert, P. (2021). Laplace approximations for fast Bayesian inference in generalized additive models based on P-splines. Computational Statistics & Data Analysis, 154, 107088.

Evaluating the effectiveness of local tracing partnerships on NHS test and trace for Covid19

ABSTRACT. In the UK, the NHS Test & Trace programme (TT) was developed to ensure that individuals who test positive for Covid19 and their close contacts are notified that they must self-isolate in order to stop further spread of the virus. Since August 2020, several local authorities have introduced local tracing partnerships (LTP) to assist TT. In this work, we are interested in evaluating the impact that LTP had on the effectiveness of TT in terms of case completion, timeliness (within 48 hours) of case completion and average number of contacts obtained. Further, we are interested in identifying effect modifiers, that is variables that affect the magnitude of the effect of LTP within each area. Causal inference in the setting outlined above is typically carried out through factor analysis (FA) or synthetic control methods. However, these methods are not appropriate for count data, particularly when the counts are low, as in our context. Further, some of these methods can only be implemented on a single outcome at a time and thus do not allow sharing of information between the outcomes. To overcome these limitations, we propose a Bayesian multivariate FA model for mixed outcomes and show how this model can be adapted to account for effect modification. Application of our methods on the motivating NHS TT dataset provides valuable insights regarding the effectiveness of LTP.

Accounting for time-dependant confounding variables in mechanistic ODE model: simulations and application to a vaccine trial
PRESENTER: Melanie Prague

ABSTRACT. Mechanistic models based on ordinary differential equations (ODE) represent an alternative approach for causal modelling (Commenges J.R.Statist.Soc.B 2009). The model is defined with three parts: i) a structural model defined by compartments that are interacting ii) a statistical model defining how the model parameters are varying across units/individuals iii) an observation model relating the observed quantities to compartments. It can be used to analyze the within-host response to vaccine in experimental studies. A structural mathematical model is defined for the dynamics of the virus which is infecting susceptible cells that are producing viral particles once infected. Then, a statistical model is defined to explain the variation of parameters between individuals which could be explained by explanatory variables (X) or captured through random effects. The explanatory variables can be fixed (e.g. experimental groups) or time varying such as the immunological markers measuring the response to the vaccine. The parameters (fixed effects and variances of random effects) are estimated using standard approaches (stochastic EM algorithm). This approach has been applied to a recent study evaluating a vaccine against SARS-Cov2 in 18 macaques (6 being vaccinated) who were experimentally infected. The estimation of model parameters using the repeated measurements of viral load showed a decrease of the viral infectivity by 99% in the vaccinees. The next step was to explain the decrease of viral infectivity by various immunological markers repeatedly measured over time such as the neutralizing and binding antibody titres as time-varying explanatory variables (X). However, this approach for analyzing the influence of X over some of the model parameters is not taking into account that a given model compartment (e.g. the virus) may itself influences X. Ignoring this relationship between the virus and X may lead to a biased estimates of the effect of X. An alternative approach is to model X as a compartment of the structural model which gives the opportunity of capturing an effect of V over X and vice versa. We are exploring the impact of the two approaches for the validity of the estimates through simulations before applying it on the real dataset.

Interventions to control nosocomial transmission of SARS-CoV-2: a modelling study

ABSTRACT. Background: Emergence of more transmissible SARS-CoV-2 variants requires more efficient control measures to limit nosocomial transmission and maintain healthcare capacities during pandemic waves. Yet, the relative importance of different strategies is unknown. Methods: We developed an agent-based model and compared the impact of personal protective equipment (PPE), screening of healthcare workers (HCWs), contact tracing of symptomatic HCWs, and HCW cohorting on nosocomial SARS-CoV-2 transmission. The model was fit on hospital data from the first wave in the Netherlands (February until August 2020) and assumed that HCWs used 90% effective PPE in COVID-19 wards and self-isolated at home for seven days immediately upon symptom onset. We accounted for a variable infectiousness and sensitivity of the diagnostic test from time since infection. Intervention effects on the effective reproduction number (R), HCW absenteeism and the proportion of infected individuals among tested individuals (positivity rate) were estimated for a more transmissible variant. Results: Introduction of a variant with 56% higher transmissibility increased – all other variables kept constant – R from 0.4 to 0.65 (+63%) and nosocomial transmissions by 303%, mainly because of more transmissions caused by pre-symptomatic patients and HCWs. Compared to baseline, PPE use in all hospital wards (assuming 90% effectiveness) reduced R by 85% and absenteeism by 57%. Screening HCWs every three days with perfect test sensitivity reduced R by 67%, yielding a maximum test positivity rate of 5%. Screening HCWs every three or seven days assuming time-varying test sensitivities reduced R by 9% and 3%, respectively. Contact tracing reduced R by at least 32% and achieved higher test positivity rates than screening interventions. HCW cohorting reduced R by 5%. Sensitivity analyses for 50% and 70% effectiveness of PPE use did not change interpretation. Implications: In response to the emergence of more transmissible SARS-CoV-2 variants, PPE use in all hospital wards might be considered to effectively prevent nosocomial transmission. Regular screening and contact tracing of HCWs may also increase the effectiveness of control strategies, but critically depend on the sensitivity of the diagnostic test used.

13:00-14:30 Session OC3C: Dynamic prediction
Individual dynamic prediction of clinical endpoint from large dimensional longitudinal biomarker history: a landmark approach

ABSTRACT. The individual data collected throughout patient follow-up constitute crucial information for assessing the risk of a clinical event, and eventually for adapting a therapeutic strategy. Joint models and landmark models have been proposed to compute individual dynamic predictions from repeated measures to one or two markers. However, they hardly extend to the case where the complete patient history includes much more repeated markers possibly. Our objective was thus to propose a solution for the dynamic prediction of a health event that may exploit repeated measures of a possibly large number of markers.

We combined a landmark approach extended to endogenous markers history with machine learning methods adapted to survival data. Each marker trajectory is modeled using the information collected up to landmark time, and summary variables that best capture the individual trajectories are derived. These summaries and additional covariates are then included in different prediction methods. As we need to handle a possibly large dimensional history, we rely on machine learning methods adapted to survival data, namely including regularized regressions and survival random forests, to predict the event from the landmark time, and we show how they can be combined into a superlearner. Performances of the prediction tools are evaluated by cross-validation using estimators of Brier Score and the area under the Receiver Operating Characteristic curve adapted to censored data.

We demonstrate in a simulation study the benefits of machine learning survival methods over standard survival models, especially in the case of numerous and/or nonlinear associations between the predictors and the event. We illustrate the methodology in a public health context with the prediction of death in the general elderly population at different ages using multiple markers of aging (depressive symptoms, cognitive functions, dependency, ...).

Our methodology enables the prediction of an event using the entire individual longitudinal history, even when the number of repeated markers is large. Although introduced with mixed models for the repeated markers and survival methods for a single right censored time-to-event, the same methodology can be used with any other appropriate modeling technique for the markers and can be easily extended to competing risks setting.

Comparison of multiple dynamic predictive accuracies

ABSTRACT. In the clinical environment, due to recent technological advances, more and more information can be collected during the patient follow-up. Dynamic models are particularly well adapted for the analysis of this type of data, allowing some potential changes in the follow-up to be taken into account. In particular, this makes it possible to obtain more accurate predictions by updating the available information throughout the patient monitoring. Blanche et al.a have developed some mathematical tools to quantify and compare the effectiveness of dynamic predictions. Dynamic versions of the Area Under the ROC Curve (AUC) and the Brier score are used for quantification and some tests are provided for comparison. Nevertheless, only two predictions can be compared, which may be too restrictive in a clinical context. Here, we propose a new procedure, based on dynamic AUC or Brier score of Blanche et al.a, which allows multiple comparisons. First, we want to assess if at least one of the accuracy of the considered dynamic predictions differs from the others at a fixed prediction horizon time. Under the null, we proved that our statistical tests converge to a gamma law. Then, if the test is significant, post-hoc tests were conducted to find where the differences occurs, which entails a multiplicity issue. To address this point, based on Blanche et al.b, the Shaffer’s procedure was used to strongly control the Family-Wise Error Rate (FWER). Performances of our new testing procedure were assessed by simulations. For two predictions, a power closed to Blanche et al.a was reached. In all studied scenarios, the FWER control was checked. Moreover, a motivating application in hepatology will be presented. We aim at identifying the most appropriate biomarker to predict liver-related complications of patients with liver fibrosis. The new procedure will select it among a set of candidate biomarkers while controlling the probability to make at least one false discovery. Finally, this work allows to compare more than two dynamic predictive accuracies and will available through an R package as soon as these results are published.

Spatio-temporal score driven modeling of resting state fMRI data

ABSTRACT. Resting state functional Magnetic Resonance Imaging (R-fMRI) data are receiving an increasing attention both in the clinical and the statistical research. Indeed, R-fMRI signals represent the neuronal activity that is intrinsically generated by the brain (Fox and Raichle, 2007) and have been already described as the candidate tool for getting deeper insights on neural spontaneous activations that cannot be explained by external stimuli or structural connectivity. However, statistical models capable of detecting spontaneous activation are still limited. Furthermore, the available studies and a relevant part of the fMRI literature relies on the often untested Gaussian assumption, Eklund et al. (2016).

In this work, we provide a novel statistical model that goes beyond the Gaussian assumption and addresses the complex nature of R-fMRI data at the same time. Specifically, we introduce a spatial simultaneous autoregressive score driven model with multivariate Student-t distributed errors. The model specification lies on a novel spatio-temporal filter that delivers robust estimates of the time varying blood oxygenation level dependent (BOLD) signal. As a model by-product, we develop a procedure for detecting spontaneous activations, based on the assumption that they correspond to residuals peaks (in line with the clinical literature) of a possibly heavy tailed distribution. It is important to highlight that the proposed model can collapse to a classical spatial autoregressive (SAR) model with Gaussian distributed errors, as the Student-t degrees of freedom parameter tends to infinity. Inference estimation is based on the method of maximum likelihood and asymptotic theory is developed. We evaluate the whole procedure through an extensive simulation study.

To conclude, we apply the proposed model on the R-fMRI data coming from the pilot study of the Enhanced Nathan Kline Institute-Rockland Sample project. The data consists of multi-subject brain imaging data (fMRI and Diffusion Tensor Imaging, DTI) collected on 70 Region of Interests based of the Desikan Atlas. We exploited the information on structural connectivity by defining a subject-specific spatial weight matrix based on DTI. We run subject-specific analysis and we show the obtained results via dynamic activations brain images.

Breast cancer risk prediction in mammography screening cohorts: an approach based on modeling tumor onset and growth

ABSTRACT. Mammography screening programmes are aimed at reducing mortality due to breast cancer by detecting tumours at an early stage. There is currently interest in moving away from the age-based screening programmes, and towards personalised screening based on individual risk factors. To accomplish this, risk prediction models for breast cancer are needed to determine who should be screened, and when. We use a Swedish cohort to predict the short-term risk of breast cancer, based on a number of established risk factors, by using a (random effects) continuous growth model. It jointly models breast cancer tumour onset, tumour growth rate, symptomatic detection rate, and screening sensitivity. Unlike existing breast cancer prediction models, this approach can account for each woman's individual screening history in the prediction. In addition to predicting the short-term risk of breast cancer, this model can make separate predictions regarding specific tumour sizes, and the mode of detection (e.g. detected at screening, or through symptoms between screenings). It can also predict how these risks change depending on whether or not a woman will attend her next screening. In our study, we predict that the probability of having a tumour less than 10mm diameter when detected is increased by 140%, on average, if a woman in the cohort attends their next screening. This indicates that the model can also be used to evaluate the short-term benefit of screening attendance, on an individual level.

Accounting for improvements in survival when developing risk prediction models in a competing risks setting

ABSTRACT. Introduction: Risk prediction models are often developed with data from patients diagnosed across a long time period. If there have been improvements in survival over this time, not accounting for the temporal trend can lead to predictions which over-estimate the risk for recently diagnosed patients.

Methods: Temporal recalibration was proposed to address this issue. This method involves first developing the risk prediction model using the standard approach and then using delayed entry techniques to re-estimate the baseline in a recent time window with the most recent data. This allows improvements in baseline survival to be captured.

Here we show an example of how this method can be applied in a competing risks setting by fitting a cause-specific hazard model to each cause of death and temporally recalibrating each model separately. This allows more up-to-date cause-specific cumulative incidence functions to be estimated for each cause of death, improving the risk predictions for new patients.

This process is illustrated using an example of survival following a diagnosis of colon cancer, where the event of interest is death from colon cancer and the competing event is death from other causes. These models are fitted using cancer registry data from the United States Surveillance, Epidemiology and End Results (SEER) database.

Results: Using the standard approach and not accounting for the improvements in baseline survival led to predictions which over-estimated the risk of death from cancer and other causes for more recently diagnosed patients. However, the calibration of the risk predictions for these patients was improved by using temporal recalibration.

Conclusion: Temporal recalibration can easily be applied in a competing risks setting by updating the baseline of each cause-specific hazard model to take account of improvements in survival in each cause. This can lead to more up-to-date and accurate risk predictions for patients who are currently being diagnosed.

References: Booth, S.; Riley, R. D.; Ensor, J.; Lambert, P. C.; Rutherford, M. J. Temporal recalibration for improving prognostic model development and risk predictions in settings where survival is improving over time. International Journal of Epidemiology 2020; 49(4):1316-1325.

13:00-14:30 Session OC3D: Meta-analysis for prediction models
COVID-PRECISE: A living methodological review of prediction models for diagnosis and prognosis of covid-19

ABSTRACT. Objective Critically appraise all diagnostic and prognostic models for individualized prediction of covid-19 risk, with a focus on methodology and statistics.

Methods A living systematic review1 of studies that developed or validated a multivariable covid-19 related prediction model for diagnosis or prognosis purposes, using any combination of predictors including demographic, clinical or imaging input data. Data sources included PubMed and Embase through Ovid, arXiv, medRxiv, and bioRxiv. At least two authors independently extracted data using CHARMS (critical appraisal and data extraction for systematic reviews of prediction model studies); risk of bias was assessed using PROBAST (prediction model risk of bias assessment tool)1. See

Results 37 420 titles were screened, and 170 studies describing 236 prediction models were included. We identified 11 models for identifying people at risk in the general population; 118 diagnostic models for detecting covid-19 (75 were based on medical imaging, 10 to diagnose disease severity); and 107 prognostic models for predicting mortality risk, ICU admission, or other adverse outcomes. The models were build using logistic regression (34%), neural networks/deep learning (32%), tree-based methods (7%), Cox regression (6%), support vector machines (4%), or other methods (17%). Predictive performance of 212 newly developed models was evaluated with internal validation only (53%), external validation (24%), or neither (24%). 24 studies independently validated an existing prediction model. C-indexes range from 0.54 to 0.99. Risk of bias was low in 4 models, unclear in 6 models, and high in 226 models. Most common reasons for high risk of bias were insufficient data for the chosen modeling strategy (70%), inappropriate or incomplete evaluation of discrimination and calibration (68%) and inappropriately dealing with overfitting and optimism (53%).

Conclusion Prediction models for covid-19 are quickly entering the academic literature to support medical decision making at a time when they are urgently needed. Most proposed prediction models are at high risk of bias, and their reported performance is probably optimistic. To date, six promising models warrant further validation. External validation using individual patient data from multiple cohorts is ongoing research by the COVID-PRECISE consortium. The review will be updated before the conference.

A Bayesian model for heterogeneous treatment effects on the additive risk scale in meta-analysis

ABSTRACT. Faced with a newly diagnosed patient, clinicians consider which of the available treatments will provide the largest absolute risk reduction in their individual patient. To answer this question requires statistical methods that quantify heterogeneous ‘personalized’ treatment effects on the clinically relevant scale.

We propose a Bayesian (meta-)regression model for binary outcomes on the additive risk scale. The model allows treatment effects, covariate effects, interactions and variance parameters to be estimated directly on the scale of clinical interest. The model was applied in single trial analysis, meta-analysis and network meta-analysis of the TherapySelector (TS) dataset, containing 5,842 hepatitis C patients from 20 randomized trials. We compared our model to two other approaches: an alternative additive risk model (Warn et al., 2002)) and a logistic model that transforms predictions back to the natural scale after regression (Chalkou et al., 2020).

Some trials in the TS database have cure rates close to 100%, illuminating the main differences between the approaches. Our model is very sensitive to the effect of treatment at the boundaries of the risk parameter support [0,1]. This can be a strength and a weakness; on the TS data it was helpful. Patients with predicted risks close to 0 or 1 contribute little to posterior precision in the model by Warn et al., making it less suitable for trial arms with ~100% successes. In such cases, the logistic model sometimes produces extreme effect estimates, leading to instability in the network setting. A conceptual advantage of the additive models is that variance parameters capture heterogeneity on the scale of interest. On the other hand, it can be argued that meta-analysis should be done on the scale where there is least heterogeneity, the log(odds) scale for the TS data.

Their respective characteristics make the compared models suitable for different analysis settings. Therefore, our proposed model is a useful addition to the available statistical methods to model heterogeneous treatment effects on the additive risk scale.

Assessing risk of bias in individual participant data meta-analyses for prediction model research

ABSTRACT. Background: Assessing risk of bias and applicability (RoB) of included studies is critical for interpreting meta-analysis results. In meta-analyes of multivariable prediction models, RoB can be assessed using PROBAST. However, individual participant data meta-analyses (IPDMAs) differ from aggregate-data MAs in that in IPDMAs, datasets may include additional information, eligibility criteria may differ from the original publications, definitions for predictors and outcomes can be standardized across studies, and analysis methods can be improved. Therefore, a tailored RoB tool may be needed.

Objectives: To review how RoB is currently assessed in IPDMAs of multivariable diagnostic or prognostic prediction model studies, and to preliminarily examine PROBAST, with the goal of developing an IPDMA extension (PROBAST-IPD).

Methods: We reviewed RoB assessments in IPDMAs of prediction model studies published from January 2018 to May 2020. We then examined how PROBAST items might be evaluated in an IPDMA context, noting which items might be removed, edited, or added; and we hypothesized how results may be incorporated into IPDMA analyses.

Results: Twenty-five prediction model IPDMAs were included. We observed that current IPDMAs rarely and inconsistently evaluate RoB of included IPD, and most do not incorporate RoB judgements into analyses. Our findings indicate using PROBAST to assess RoB in the IPD datasets themselves, rather than solely based on study publications. In initial considerations for developing PROBAST-IPD, we propose that certain items need to be evaluated and coded at the participant level (e.g., timing between predictor assessment and outcome determination), whereas others (e.g., quality of assessments tools) may apply uniformly to an included study. Most analysis items (e.g., pre-specification of variables for analysis) are no longer relevant, as the IPDMA researchers perform the analyses themselves. RoB results may be incorporated into analyses by conducting subgroup analyses among studies and participants with overall low RoB or by conducting formal interaction analyses with item-level RoB responses.

Conclusions: Development and dissemination of PROBAST-IPD will allow improved RoB assessments in IPDMAs of prediction model studies.

Using meta-analysis for external validation of prediction models in big data, accounting for competing risks

ABSTRACT. Background: In prediction modelling, any event occurring prior to the event of interest, and thus preventing that event from happening, is known as a competing risk. Prediction models must be properly developed to account for competing risks to avoid overestimating the risk for the event of interest. Upon validation of such prediction models, competing risks should also be accounted for to ensure reliable performance estimation.

Aims: To describe the methods used to externally validate a competing risk model in a large primary care database; discussing how the competing risk was accounted for in the meta-analysis of each performance statistic across GP practices.

Methods: We externally validated a multivariable model predicting Acute Kidney Injury (AKI) at 10 years, while accounting for the competing risk of death, in 3,805,366 eligible patients from the CPRD Aurum database. Missing data were multiply imputed. Meta-analysis techniques were used to examine heterogeneity in model performance across GP practices, where case-mix and outcome prevalence varied. Predictive performance was assessed using calibration plots, and measures of discrimination (D-statistic and Time-dependant AUC), calibration (Observed/Expected ratio) and clinical utility (Net Benefit).

Results: Accounting for the competing risk in model predictions required incorporating the log-log baseline Cumulative Incidence Function (CIF) of AKI at 10 years across imputations from the development data. Central to the validation analysis was incorporating the observed CIF from the validation data for comparison. This required calculation of imputation-specific CIFs to calculate performance statistics prior to pooling. Calibration plots additionally required observed CIFs to be generated within subgroups: by magnitude of predicted probabilities for standard plots; and within individual GP practices for practice level calibration. Random-effects meta-analysis of GP level performance estimates showed considerable heterogeneity for calibration, with over-prediction of AKI at 10 years on average. Discrimination, however, was more consistent, with both D-statistic and C-statistic showing low heterogeneity and narrower distributions in funnel plots of value by practice size.

Conclusions: When validating competing risks models, the competing event must be accounted for by properly incorporating the CIF for the event of interest in all analyses. Meta-analysis techniques are then helpful to summarise predictive performance across data clusters.

13:00-14:30 Session OC3E: Multi-state model
MSMplus: A dynamic interactive web tool for presentation of multi-state model analysis results

ABSTRACT. Multi-state models are used in complex disease pathways to describe a process where an individual moves from one state to the next, taking into account competing states during each transition. In a multi-state setting, there are various measures to be estimated that are of great epidemiological importance. However, increased complexity of the multi-state setting and predictions over time for individuals with different covariate patterns may lead to increased difficulty in communicating the estimated measures. The need for easy and meaningful communication of the analysis results motivated the development of a web tool to address these issues. MSMplus is a publicly available web tool, developed in RShiny, and is primarily targeted to researchers conducting multi-state model analyses. The results from any multi-state model analysis are uploaded to the application in a pre-specified format. Through a variety of user-tailored interactive graphs, the application contributes to an improvement in communication, reporting and interpretation of multi-state analysis results as well as comparison between different approaches. The predicted measures that can be supported by MSMplus include, among others, the transition probabilities, the transition intensity rates, the length of stay in each state, the probability of ever visiting a state and user defined measures. Representation of differences, ratios and confidence intervals of the aforementioned measures are also supported. MSMplus is a useful tool that enhances communication and understanding of multi-state model analyses results. Further use and development of web tools should be encouraged in the future as a means to communicate scientific research.

Statistical models for the natural history of breast cancer, with application to data from a Milan cohort study

ABSTRACT. We develop a new class of multi-state models for the natural history of breast cancer, where the main events of interest are the start of asymptomatic detectability of the disease and the start of symptomatic detectability. The former kind of detection occurs through screening, while the latter through the insurgence of symptoms. We develop a cure rate parametric specification that allows for dependence between the times from birth to the two events, and present the results of the analysis of data collected as part of a motivating study from Milan. Participants in the study had a varying degree of compliance to a regional breast cancer screening program. The subjects' ten-year trajectories have been obtained from administrative data collection performed by the Italian national health care system. We first present a tractable model for which we develop the likelihood contributions of the possible observed trajectories, and perform maximum likelihood inference on the latent process. Likelihood based inference is not feasible for flexible models and we rely on a likelihood-free method, Approximate Bayesian Computation (ABC), for inference on such more flexible models. Issues that arise from the use of ABC for model choice and parameter estimation are discussed, with a focus on the problem of choosing appropriate summary statistics. The estimated parameters of the underlying disease process allow for the study of the effect of different examination schedules (ages and frequencies for screening examinations) and different adherence patterns on a population of asymptomatic subjects.

Reevaluating dementia incidence trends: The critical role of adequate design and methodology

ABSTRACT. A seeming decline in dementia incidence in Western nations has been a topic of continuous debate resulting in a recently published analysis of data from seven population-based cohort studies (Wolters et al., 2020). Constructing several nonoverlapping 5-year epochs, the corresponding design and analysis closely follows a framework previously used within the Framingham Heart Study (FHS) cohort. However, we challenged the finding of the FHS cohort on the basis that bias may have resulted from the failure to adequately account for potential disease onset in the period between last observation and death. Re-analyzing the FHS data using spline-based analytic methods, we did not find convincing evidence for a decline in dementia incidence over the epochs (Binder et al 2019). Yet, there is further room for improvement. First, the classification of calendar time into 5-year epochs is both unnecessary and arbitrary; the conclusion of a linear decline in dementia incidence in the FHS data would not have held had e.g., a 4-year follow-up period been used (Binder et al, 2019). Second, two separate cohorts (the ‘original’ FHS cohort and a cohort of their offspring) were combined for analysis, which may be inappropriate if they differ markedly. A more suitable approach for analyzing the question of how dementia incidence has evolved over time in the FHS would be to consider the two cohorts separately, and to dispense with the epoch structure by analyzing age as the time scale. This results in the analysis of separate generations which are ageing over time and being subject to death in greater numbers over time without replenishment from younger participants. If missing dementia cases due to death lead to a bias, the effect of this would therefore be to underestimate the incidence of dementia cases to a greater extent over time. This problem requires the use of statistical methods based on the illness-death multi-state model, such as spline-based penalized likelihood as employed in our earlier study. We will present the findings of the proposed design and analysis strategy, aiming for a realistic quantification of the dementia incidence trend in the Framingham Heart Study cohort.

Statistical inference for transition probabilities in non-Markov multi-state models subject to both random left-truncation and right-censoring

ABSTRACT. The Aalen-Johansen estimator generalizes the Kaplan-Meier estimator for independently left-truncated and right-censored survival data to estimate the transition probability matrix of a time-inhomogeneous Markov model with finite state space. Such multi-state models have a wide range of applications for modelling complex courses of a disease over the course of time, but the Markov assumption may often be in doubt. If censoring is entirely unrelated to the multi-state data, it has been suggested that the Aalen-Johansen estimator still consistently estimates the state occupation probabilities. This approach has been extended to transition probabilities using landmarking, which is, inter alia, useful for dynamic prediction. We complement these findings in three ways. Firstly, we provide a rigorous proof of consistency of the Aalen-Johansen estimator for state occupation probabilities, correcting and simplifying the earlier result. Secondly, delayed study entry is a common phenomenon in observational studies, and we extend the earlier results to multi-state model data also subject to left-truncation. Thirdly, our proof is suggestive of wild bootstrap resampling. Studying wild bootstrap is motivated by the fact that it is desirable to have a technique that works both with non-Markov models subject to random left-truncation and right-censoring and with Markov models where left-truncation and right-censoring need not be entirely random. In our motivating data example, the occurrence and the impact of Methicillin-resistant staphylococcus aureus (MRSA) infection in hospital compared to patients only colonized with MRSA, is investigated using an illness-death multi-state model. Violations of the Markov assumption arise if the time of MRSA infection affects the hazard of end of hospital stay. Patients may have a delayed study entry if a positive laboratory result is only available some time after admission. We use landmarking to compare the residual length of stay of those in the infectious state with those still in the initial state of colonization. We present both the results of the real data example and the results of simulation studies showing that the landmark Aalen-Johansen estimator performs well and the wild-bootstrap provides confidence intervals close to the nominal level.

Harmonization of endpoints in ICU trials using multi-state modelling

ABSTRACT. Several randomized trials in ICUs assess the effect and/or the ideal timing of organ support therapy (OST, e.g. ventilatory support, renal replacement therapy, or extracorporeal membrane oxygenation). In many cases, the aim is not only to reduce mortality, but also to reduce the duration of OST to prevent iatrogenic consequences. OST can also be part of the outcome of ICU trials assessing the effect of different drugs. For example, most of the randomized trials evaluating Covid-19 treatments have based their primary outcome on the WHO clinical progression scale, an ordinal scale for the different levels of care (including different levels of oxygenation support). These outcomes are evaluated with various methods, like Kaplan-Meier estimation or Cox model for time to event endpoints (e.g. time to improvement), logistic models or proportional odds models for the proportion of patients in a given state, competing risk models for length of stay, or calculation of OST-“free days” for the duration without OST. Recently, multi-state modelling has been proposed as a simple and direct way to analyze such data, by defining each category as a distinct state. It allows a wide range of estimands important from a patient perspective and also from a hospital administration perspective, for planning of resources. The course of disease can be presented by a stacked probability plot which illustrates the probability of being in each specific state over time. Multi-state modelling accommodates problems of competing endpoints, non-monotonic patient trajectories (e.g. multiple intermittent episodes of OST) and censoring. Based on the states occupation probabilities, mean length of stay in each state, or in relevant combination of different states, can be calculated, and compared between arms. Our objective is to highlight how multi-state modelling can complete traditional analyses, provide additional insights and unifies heterogeneous analysis strategies between trials. This will be illustrated using data from 2 published randomized controlled trials in ICU settings: a trial comparing early and delayed renal replacement therapy in patients with severe acute kidney injury, and a trial comparing early non-invasive ventilation to oxygen alone in patients with non-hypercapnic acute hypoxemic respiratory failure.

13:00-14:30 Session OC3F: Neural network and machine learning
Neural networks for survival prediction in medicine: a review and critical appraisal

ABSTRACT. Context: Prediction models with machine learning are becoming ubiquitous in the medical field. Over the years, an increasing number of algorithms have been developed and adapted to deal with censored data. Here, we consider publications that predicted survival with artificial neural networks (ANN), one of the most widely used machine learning techniques in healthcare applications.

Objective(s): A structured overview is presented, which provides a comprehensive understanding of the current literature. We discuss how researchers have used ANN to fit survival data for prediction in the medical field, and critically appraise which aspects of the models should be designed and reported more carefully.

Methods: We performed a global search in PubMed, considering articles published in the period 1990-2019. Additional studies were identified using a “local search” to follow citations of citations. Relevant manuscripts were classified as methodological/technical (novel methodology or new theoretical model) or applications. We identified key characteristics of prediction models (i.e. number of patients/predictors, evaluation measures, validation, calibration), and compared ANN’s predictive performance to that of the Cox proportional hazards model.

Results: Our search yielded 217 studies. Of these, 13 methodological studies and 10 practical applications (7 real-world data, 3 simulations) were considered relevant. We identified two methodological trends; either time was added as part of the input features of the ANN and a single output node was specified, or multiple output nodes were defined for each time interval. The median sample size was 920 patients, and the median number of predictors was 7. Major findings in the 23 studies included poor reporting (e.g., regarding missing data, hyperparameters), use of improper performance measures, as well as inaccurate model development / validation. Calibration was neglected in more than half of the studies. Cox models were not developed to their full potential and claims for the performance of ANN were exaggerated.

Conclusions: Light is shed on the current state of art of survival neural networks in medicine. Recommendations are made for the correct application of clinical prediction models with ANN. Limitations are discussed, and future directions are proposed for researchers who seek to develop existing methodology.

Comparison of imputation methods that solve granularity problem resulting from healthcare structured data integration.

ABSTRACT. Background: Disparate collections of data are inevitably heterogeneous and have made aggregation a difficult challenge. In this paper, we focus on the issue of content heterogeneity in data integration due to granularity i.e when some datasets and/or variables include more categories/levels and subsets than others. Traditional approaches map all source datasets to a common data model that includes only low level items, and thus omit all items that vary between datasets. Objectives: Our focus is on the integration of structured data and solving the granularity problem by keeping the highest level items and therefore use all the available information. We assume that each one of these datasets that needed to be integrated consists of a single table; and that each of these datasets describes a disjoint set of entities. Therefore, record linkage is not needed. Methods: From a probabilistic perspective, imperfect alignment of different data sources is not problematic as long as we can derive which information each of the sources provides in answering our research question. In our case we would like to use all the available information across the datasets being integrated. The general idea behind our integration method is that the problem of content heterogeneity, presented as granularity problem, could be translated into a missing value problem and then solved using well established methods (imputation). Results: We perform a simulation study designed to investigate our probabilistic methods in a simpler and general setting to evaluate the suggested method. We also illustrate and solve granularity problem with the proposed probabilistic data integration approaches on dataset examples provided by MASTERplans. MASTERplans aims to improve care for Systemic Lupus Erythematosus patients by taking a precision medicine approach to identifying groups of patients that respond to particular biologic therapies. Conclusions: Our approach insist on the future existence of health data heterogeneity. Our probabilistic data integration approaches are pragmatic because they always provide an answer. Evaluation and application’s results show that the suggested probabilistic data integration approaches outperform traditional data integration approach and provide similar results to the true models.

Survival Predictions and Uncertainty Measures with Censored Data

ABSTRACT. Background. Neural Networks have been increasingly used for prediction problems. As they are flexible non-linear models, they may be relevant when many candidate covariates and complex interactions are to be evaluated. Objective(s). This work aims to study neural network models with time-to-event data, using specific ways to handle censorship to study their operating characteristics in a simulation study, to introduce auto-encoders in the models and evaluate uncertainty in model predictions. Method(s). We compared survival models based on neural networks with different loss functions: Cox-MLP (Kvamme et al., 2019) uses a loss based on a case-control approximation. DeepHit (Lee et al.,2018) combines a log-likelihood with a ranking loss. DNNSurv (Zhao et al., 2019) uses pseudo observations. We also proposed other ways of computing pseudo-observations. We used random survival forests by Yshwaran et al. (2008) and lasso penalization as a benchmark. We simulated data from the AFT model proposed by Friedman et al. (2001), with 3 different censoring rates (20%, 40%, and 60%). We simulated 100 datasets of 1,000 samples and 20 variables each, with pairwise interactions and non-linear effects of random subsets of these. We built an oracle model for comparison purposes. We further applied the methods to 2 real datasets: the METABRIC breast cancer data set, including 1,960 patients, 6 clinicopathological covariates, and the expression of 1,000 genes and a set of data on lung cancer, consisting of 4,120 patients, 3 clinical variables, and 1,000 genes. We investigated the effect of a pre-training using Variational Auto-encoders (Simidjievski et al., 2019) on model's predictions. We also studied the predictive uncertainty of our models using MC Dropout (Gal et al., 2016) and ensembling methods (Dietterich et al., 2000). Results. In the simulation study, we obtained the highest c-indices and lower integrated Brier score with CoxTime for low censoring and pseudo-discrete with high censoring. On the METABRIC data, different neural networks models obtained comparable 5-year and 10-year discrimination performances, but with slightly lower values than random survival forests and penalized Cox model. Detailed results from the lung cancer data will be shown at the conference.

Parametric and non-parametric variable selection methods for predictive modeling with binary response

ABSTRACT. The demand for statistical models to support medical decision making is increasing, especially when aiming to predict an outcome of interest of an individual patient. Clinical outcomes can be of diagnostic or prognostic nature and might be affected by various patient and disease characteristics. A key concern in building predictive models is to identify influential variables in regard to predicting the outcome as accurately as possible. Many modeling techniques can deal with large numbers of variables, however, the inclusion of irrelevant variables can cause overfitting, introduce noise, and might lead to a decrease in prediction performance. To address this issue, a broad range of variable selection methods has been developed. Widely applied parametric methods are, e.g., stepwise regression and regularised regression. These methods assume a linear relationship between the outcome and the variables, hence their performance might decrease in case of non-linear associations. Non-parametric tree-based methods provide variable selection based on variable importance measures and are able to deal with non-linear relations as well as with highly correlated and interacting variables. While random forests theoretically have a low risk to overfit, the number of iterations and the learning rate of gradient boosting algorithms play a critical role regarding potential overfitting. In general, the choice of one method is not trivial and depends on the between-variable relations in the data. The optimal number of boosting iterations can be estimated by computing the cross-validation prediction error in each iteration. This results in achieving a model with optimal performance and prevents overfitting but does not necessarily perform variable selection. To tackle this issue, we developed a step-by-step approach to perform variable selection using gradient boosting trees. We performed simulation studies to evaluate the mentioned methods regarding the selection of true influential variables and the performance of their final models in clinical relevant scenarios. The main focus was to investigate strengths and weaknesses to provide recommendations for the choice of method in common clinical data scenarios, because the set of selected variables depends on the chosen method. Moreover, the methods were compared using a real data set for predicting neurological improvement as binary response[1].

Performance measures for assessing machine learning algorithms in clinical trials

ABSTRACT. Machine learning (ML) methods have great promise but have been little used in clinical trials. They may be of particular benefit in assessing treatment effect heterogeneity, where they aim to find unexpected patient subgroups that may benefit even from treatments that are of little benefit overall. ML methods have previously been applied to identify treatment effect modifiers in clinical trials, but translation of findings into clinical practice has been poor. This lack of clinical impact may be because ML outputs are difficult to interpret by non-ML specialists; it may also be because clinical trials are usually powered for main effects, and thus ML methods can lack power and may find false positives. Simulation studies are needed to evaluate the methods, but it is not clear how to do these. This work explores what the estimands and performance measures should be for future simulation studies.

We first consider simple data generating mechanisms where the treatment effect is determined by a single variable. Here, type 1 error rate and power are suitable performance measures, but this setting is too simple to demonstrate the potential of ML methods. For realistically complex data generating mechanisms, one possible estimand is the set of individual treatment effects, and performance can be measured by the mean squared error, averaged over individuals. However, if clinical treatment choices are to be driven by the trial, then some errors are more important than others. We therefore propose as an estimand the subgroup who would benefit from treatment. Performance measures here could be as simple as the sensitivity and specificity of the estimated benefiting subgroup. However, a better performance measure quantifies the clinical benefit of treating according to the ML results, compared with treating according to a simpler rule. We call this clinical utility and illustrate its properties, compared with other performance measures, in a simple simulation study based on the TRACT trial comparing blood transfusion volumes in children with and without fever being treated for severe anaemia in Africa. Initial results show that models with smaller mean squared error tend to have larger clinical utility, but marked reversals can occur.

13:00-14:30 Session OC3G: Study designs
Using Historical Data to Predict Health Outcomes – The Prediction Design

ABSTRACT. The gold standard for investigating the efficacy of a new therapy – the randomized controlled trial (RCT) – is costly, time-consuming, and not always feasible in a reasonable time frame. At the same time, huge amounts of available control condition data of previous RCTs or real-world data (RWD) in analyzable format are neglected, if not often completely ignored [1]. To overcome this shortcoming, alternative study designs with more efficient data use would be desirable. Assuming that the standard therapy and its mode of functioning are well known and large amounts of patient data exist, it is possible to set up a prediction model to determine the treatment effect of this standard therapy for future patients. If a new therapy is to be tested against the standard therapy, the vision would be to conduct a single-arm study and use the prediction model to determine the effect of the standard therapy on the outcome of interest for patients receiving only the test treatment, rather than setting up a two-arm study for this comparison. While the advantages of using historical data to estimate the counterfactual are obvious, bias could be caused by confounding or a number of other data issues that could compromise the validity of the nonrandomized comparison [2]. To investigate whether and how such a design – the prediction design – could be used to provide information on treatment effects using existing infrastructure and data sources (historical data from RCTs and/or RWD), we explored the assumptions under which a linear regression model could be used to predict the counterfactual of patients accurately enough to construct a test to assess the treatment effect for normally distributed outcomes. To overcome the implications of violating the model assumptions, the use of robust methods (e.g., "robust linear regression," LASSO) was explored. This was applied to a data set comparing liraglutide and sitagliptin in type II diabetes. Simulations were used to examine the amount of data needed on historical data as well as for the single-arm study. Depending on the amount of available historical data, the sample size could be reduced compared to a conventional RCT.

Bayes Factors for Equivalence, Non-inferiority, and Superiority Designs Using baymedr

ABSTRACT. Biomedical research often seeks to determine the equivalence, non-inferiority, or superiority of an experimental condition (e.g., a new drug) compared to a control condition (e.g., a placebo). The use of frequentist statistical methods, in the form of null hypothesis significance testing (NHST), to analyze data for these types of designs is ubiquitous. However, frequentist inference has several limitations. Among the most critical limitations of NHST are the inability to quantify evidence in favor of the null hypothesis and the necessity of inflexible adherence to a predetermined sampling plan. Bayesian inference remedies these shortcomings and allows for intuitive interpretations. We present the R package baymedr (available at; Linde & van Ravenzwaaij, 2019) and an associated Shiny web application for the computation of Bayes factors for equivalence, non-inferiority, and superiority designs (see also van Ravenzwaaij, Monden, Tendeiro, & Ioannidis, 2019). These two sources allow using either raw data or summary statistics. The R package baymedr and the web application have a focus on user-friendliness and are especially intended for researchers who are not statistical experts but can also be used by (clinical) statisticians. The web application can be utilized without any programming experience. We explain and compare the frequentist and Bayesian conceptualizations of equivalence, non-inferiority, and equivalence designs and showcase baymedr and the associated web application by analyzing raw data and by reanalyzing existing empirical studies using the published summary statistics.


ABSTRACT. Context: Clinical trials focusing on neurodegenerative diseases face the difficulty of subtle, long-term and individual-specific worsening of the endpoints. Eventually, they end up recruiting heterogeneous patients that prevents from showing any drug effect, especially if the latter is believed to be more effective at a precise disease stage.

Objective: Targeting the right patients during trial screening is a way to reduce the needed sample size or conversely to improve the proven effect size. Methods: From Alzheimer’s disease (AD) observational cohorts, we selected longitudinal data that matched AD trials (inclusion and exclusion criteria, trial duration and primary endpoint). We modelled EMERGE, a phase 3 trial in pre-clinical AD, and a mild AD trial, using 4 research cohorts (ADNI, Memento, PharmaCog, AIBL) totalling more than 5500 individuals. For each patient, we simulated its treated counterpart by applying an individual treatment effect. It consisted in a linear improvement of outcome for effective decliners, calibrated to match the expected trial effect size. Next, we built a multimodal AD course map that grasps long-term disease progression in a mixed-effects fashion [1] with Leaspy (open-source Python package). We used it to forecast never-seen individuals’ outcomes from their screening biomarkers. Based on these individual predictions, for each trial we selected a clinically relevant [2] sub-group of screened patients. Finally, we compared the effective sample size that would had been needed for the trial, with and without our selection. The dispersion was evaluated using a bootstrap procedure.

Results: For all investigated setups and cohorts, our selection enabled to decrease the needed sample size. In particular, in the EMERGE (resp. mild AD) trial, selecting patients with a predicted CDR-SoB change between 0.5 and 1.5 points per year (resp. MMSE change between 1 and 2 points per year) reduced the sample size by 38.2 ± 3.3 % (resp. by 38.9 ± 2.2%).

Conclusions: In AD clinical trials, using our forecasts of individual outcomes from multimodal screening assessments as an extra inclusion criterion allows to better control trial population and thus to reduce the needed sample size for a given treatment effect.

Design effects and analysis considerations for the split-mouth design with an unequal numbers of sites per patient

ABSTRACT. Background Split-mouth studies are a common variant of within-person randomised trials, often conducted in dentistry. In a split-mouth design, sites (usually teeth) within the same person, are randomised to intervention or control. A split-mouth design has many similarities to a cluster-cross over design.

Current sample size formulae for split-mouth designs assume all patients provide an equal number of sites. Having a varying number of sites per patient is similar to varying cluster sizes in a cluster trial which can accommodate varying cluster sizes. Analysis of split-mouth studies, prima-facie, seems similar to that of a cluster trials, for example, by including random effects for patients. However, in contrast to cluster trials, split-mouth studies have very small cluster sizes (number of sites per patient), many recruit only a small number of patients, and much larger correlation coefficients are often observed. For these reasons, the performance of both sample size and analysis methods for split-mouth studies is unknown.

Objectives To determine an appropriate design effect for split-mouth studies with an unequal number of sites per patient and evaluate the performance of common methods for the analysis of split-mouth studies, when the number of patients and sites per patient are small.

Methods Data were simulated from a linear mixed effects model for split-mouth studies varying in size (overall and average sites per patients), variability in sites per patients and intra-patient correlation coefficient. The sample size was estimated using design effects adapted from those used in cluster trials. Analyses were conducted using a linear mixed-effects regression, generalised estimating equations and cluster-level summaries, with the inclusion of small sample corrections. For each scenario, the power and coverage were assessed.

Findings We present, using a motivating example, a simple design effect that can account for an unequal number of sites per patient in split-mouth studies and make recommendations for the analysis of split-mouth studies with a small number of patients and sites per patients.

Challenges of using big health data to identify patterns of anxiety and depression in multimorbid population

ABSTRACT. Clinical research problem: Patients with multimorbidity tend to have less continuity of care than patients with a single condition. Unfortunately multimorbidity is a highly prevalent condition and one in three patients with multimorbidity is having a coexisting mental health condition. Existing disease surveillance systems have not been used optimally to understand multimorbidity and its effects, or to guide effective action [1]. Therefore, investigation of mental health between multimorbid patients is very important.

Statistical challenges: We used health administrative data collected for administrative purposes not for primary but for secondary use of health research.

The objective and statistical methods: The aim of this work was to use a big health-administrative database to assess the frequency of anxiety and depression in patients with multimorbidity. For this purpose Lithuanian National Health Insurance Fund under the Ministry of Health administrative health data of 1 254 167 subjects with multimorbidity covering the period from 2014 till 2019 was analyzed. We used hierarchical clustering and exploratory factor analysis for cross-sectional phenotype identification.

Results and Conclusions: Patterns of anxiety and depression were identified and an unexpectedly small percentage of anxiety (3.9%) and depression (8.1%) were found among analyzed multimorbid patients. These findings may be related to mental health stigma and may be associated with the unwanted disease diagnostic code and other related causes. Even if general trends can be seen, conclusions need to be drawn carefully. Researchers planning to use data not only on their primary collection purpose, need to understand limitations of the database and data generating mechanism and collaborate with biostatisticians, clinicians and other experts. In addition, other study designs may be useful in further analyzing the clinical research problem. References: [1] Pearson-Stuttard, J., Ezzati, M., & Gregg, E. W. (2019). Multimorbidity—a defining challenge for health systems. The Lancet Public Health, 4(12), e599-e600.

The impact of left truncation of exposure in environmental case-control studies: evidence from breast cancer risk associated with airborne dioxin

ABSTRACT. BACKROUND: In epidemiology, left-truncated data may bias exposure effect estimates. We analyzed the bias induced by left truncation in estimating breast cancer risk associated with exposure to airborne dioxins. METHODS: Simulations were run with exposure estimates from a geographic information system-based metric and considered two hypotheses for historical exposure, three scenarios for intra-individual correlation of annual exposures, and three exposure-effect models. For each correlation/model combination, 500 nested matched case-control studies were simulated and data fitted using a conditional logistic regression model. Bias magnitude was assessed by estimated odds-ratios (ORs) vs. theoretical relative risks (TRRs) comparisons. RESULTS: With strong intra-individual correlation and continuous exposure, left truncation overestimated the Beta parameter associated with cumulative dioxin exposure. Versus a theoretical Beta of 4.17, the estimated mean Beta (5%; 95%) was 73.2 (67.7; 78.8) with left-truncated exposure and 4.37 (4.05; 4.66) with lifetime exposure. With exposure categorized in quintiles, the TRR was 2.0, the estimated ORQ5 vs. Q1 2.19 (2.04; 2.33) with truncated exposure vs. 2.17 (2.02; 2.32) with lifetime exposure. However, the difference in exposure between Q5 and Q1 was 18x smaller with truncated data, indicating an important overestimation of the dose effect. No intra-individual correlation resulted in effect dilution and statistical power loss. CONCLUSIONS: Left truncation induced substantial bias in estimating breast cancer risk associated with exposure with continuous and categorical models. With strong intra-individual exposure correlation, both models detected associations, but categorical models provided better estimates of effect trends. This calls for careful consideration of left truncation-induced bias in interpreting environmental epidemiological data.

Lower Limit of Quantification in various distributed data: examining confidence interval variations

ABSTRACT. Single or multiple lower limits of quantification (LLOQ) appear in concentration measurement data due to one or more laboratories involved in the quantification of observations, in which some observations are too low to be quantified with required precision. As the missing data mechanism is not random, most statistical methodology to handle missing data is not applicable. In clinical practice, simple imputation methods are often used to receive substitution values for the missing observations. Nevertheless, they lead to severe bias in estimating parameters such as the mean and variance. Even procedures relying on the assumption of normally distributed concentration data are little robust against distributional model misspecification [1]. Interpretation of confidence intervals (CI) rather than only point estimates would lead to an assessment of precision of the respective point estimates.

The objective is to investigate robustness and precision of different types of CI’s applied to newly developed parametrical point estimation methods for different distributional assumptions. With this we aim to show the advantage of interpreting CI’s rather than point estimates in this missing data situation.

We transfer existing maximum-likelihood based approaches relying on the normal distribution assumption for LLOQ’s to other distributional assumptions. With suitable approaches at hand for specific distributions, we not only investigate the robustness of the point estimates for mean and variance, but also compare bootstrap CI’s with parametrical CI’s to evaluate the characteristics of the CI types with respect to coverage probability and precision. The performance of the most popular simple imputation method is compared to our approach. The proposed procedure will be demonstrated using data from a cohort study [2], in which the underlying distribution varies depending on the chosen clinical parameter.

The variety of distributional assumptions for which the methods are applicable gives the applicant a broadly usable tool to handle LLOQ affected data with appropriate approaches. Before choosing the appropriate method, the distribution of the data at hand should be examined. In case of sureness about the underlying distribution, interpretation of CI’s will broaden the possible conclusion. Under uncertainty, CI’s prove to deliver more robust interpretation possibilities for the intriguing parameters than point estimates.

Diversity indices and statistical methods used in studies addressing dysbiosis applied to composition data of the gut microbiota

ABSTRACT. Background: Technological innovations, such as high-throughput sequencing, have promoted the direct determination of the taxonomic composition of the gut microbiota in samples, boosting research on the role of the gut microbiota in human health. We focused that some diseases result from dysbiosis (imbalances in the microbial community). Objectives: To illustrate the diversity measures and statistical methods used in articles addressing dysbiosis through a literature review. Methods: We searched PubMed using the search terms "microbiome, gut, 16s, taxonom*, diversity, richness" on December 1, 2020. All abstracts identified were reviewed to include studies in humans or animal models addressing the research question on the association between dysbiosis and disease. The diversity measures and statistical methods reported were identified. Results: The initial searches yielded 144 articles. After screening titles/abstracts and selecting studies, 33 articles were analyzed. The measures used by the authors of those articles for within-group diversity (alpha diversity) were Chao1 or ACE for richness and Shannon or Simpson for diversity (or evenness). the measures used for between-group diversity were "distance" using UniFrac, Bray-Curtis method, etc. Parametric or non-parametric methods for comparisons between groups of continuous variables were used in those articles for the alpha diversity measure. The methods used for "distances" between the groups were visual illustration by principal coordinates analysis and test by PERMANOVA. Discussion: We found the following methods used in most studies: comparing alpha diversity between groups; comparing groups based on the “distance” between samples. There seems to be a need for other approaches adequate the research questions from the following aspects. Microbiome compositional data are essentially relative abundance of each microbe in the sample and are multivariate data structured in a multilevel hierarchy. Each microbe influences the host in multiple ways that are partially similar and partially different from each other. Conclusion: A proposal and adoption of new methods taking the following points into account are awaited: all microbes are not needed considering; microbes may be mutually exchangeable when focusing on the specific biochemical reaction; utilization of other information than the relative abundance of each microbe; a multilevel hierarchical inter-variable structure.

Subsequent primary neoplasms in bladder cancer patients

ABSTRACT. Background: Numbers of patients who develop subsequent primary neoplasms have markedly increased recently. The development of subsequent primary neoplasms nowadays presents a major clinical problem because subsequent primary tumours are the main cause of morbidity in a large proportion of long-term cancer survivors [1]. This makes subsequent primary neoplasms a challenge and an opportunity for future oncology research. Objective: This study aimed to perform a comprehensive analysis documenting the risk of subsequent primary neoplasms in patients with bladder cancer. Methods: The Czech National Cancer Registry was the main data source, containing records of all cancers diagnosed in the Czech Republic since 1977. The risk of development of subsequent primary neoplasm after bladder cancer was evaluated by the standardised incidence ratio (SIR) with corresponding 95% confidence interval (CI) [2]. Results: A total of 71,982 patients with bladder cancer were diagnosed in 1977–2018, out of which 12 375 (17.2%) developed subsequent primary neoplasm. Bladder cancer patients of younger age, early clinical stage, and male sex were shown to be at higher risk of development of subsequent primary neoplasm. The risk of development of any malignant neoplasm (C00–C97) was approximately 1.7 higher in persons with bladder cancer than in the general Czech population (SIR:1.66; CI:1.64–1.69). The highest-risk diagnoses that occurred after bladder cancer were lung cancer (SIR: 2.79; CI: 2.67–2.91) and laryngeal cancer (SIR: 2.52; CI: 2.08–3.03). The median time to the development of a subsequent neoplasms (for all malignant neoplasm combined) was 5.6 years. The shortest times were recorded for oesophageal cancer (4.5 years) and laryngeal cancer (4.7 years). In contrast, the longest times were reported for thyroid cancer (7.1 years) and chronic lymphocytic leukaemia (6.6 years). Conclusion: To our knowledge, this is the first population-based study documenting the risk of incidence of subsequent primary neoplasms in bladder cancer patients that takes into account such a long time period. Conclusions from the performed analysis might be useful to set up correctly follow-up procedures for bladder cancer patients in specialised centres and in GP surgeries. A correctly adjusted follow-up might improve the prognosis of patients.

On heuristic detection of maternal-age-related increase of birth defect risk: Experience, issues, alternatives

ABSTRACT. One of the focuses of our research is detection of increased congenital anomaly (birth defect) risk related to high or low maternal age. The size and onset of the increase depend on the anomaly type. When events (anomalies) are frequent and the risk increase is big and takes place far from the age scale ends, joinpoint Poisson regression, for instance, may be a right choice. It fits well, for example, Czech 2013 – 2017 Down syndrome data (on both born children and terminated pregnancies), where it shows a consistent risk increase along the entire age scale, first slow, and accelerated since the age of 32. Such methods may, nevertheless, fail for rare anomalies with a moderate risk increase at age close to extremes. For such situations, we have designed, and presented at ISCB 2020 a heuristic method. Each year on the age scale splits the scale into two opposite tails. Risks in the two tails are compared by relative risk (RR) and Fisher test. The attribute of suspect risk increase belongs to a tail if it yields RR over 2 and significant unadjusted Fisher test, or is nested within such tail. A stronger attribute of verified risk increase is given to tails with the former attribute that are nested within a tail with significant Bonferroni-adjusted Fisher test. A new alternative method variant compares all tails with a common reference age interval from the lower to the upper quartile. (Considered are tails disjoint with the interval.) Otherwise, the definitions of suspect and verified risk increase remain the same. Numerical differences between the two method variants on real data are minor. For example, the former variant finds a verified risk increase at 18 or less years, and the new one at 19 or less in the 1992 – 2016 anencephaly incidence. Both variants assess equally the risk increase from 42 years as suspect. The new variant is, however, more logically consistent, as it always transforms, unlike the former one, monotone risk by age curves into monotone RR by age curves. Supported by Czech Health Research Council grant No. 17-19622A.

Linkage of national clinical datasets without patient identifiers using probabilistic methods

ABSTRACT. Linkage of electronic health records from different sources is increasingly used to address important clinical and public health questions. However, obtaining linked data can be a complex and lengthy process, often involving transfer of sensitive patient information to a trusted third party. Alternative methods that do not require the use of patient identifiers would accelerate the use of linked datasets.

We developed a step-by-step process for probabilistic linkage of national clinical and administrative datasets without patient identifiers and validated it against deterministic linkage using patient identifiers. Probabilistic linkage was carried out using seven indirect identifiers: small area of residence (Lower Super Output Area); hospital trust; date of surgery; responsible surgeon; age; sex; and surgical procedure. We used electronic health records from the National Bowel Cancer Audit (NBOCA) and Hospital Episode Statistics (HES) databases for 10,566 bowel cancer patients undergoing emergency surgery in the English National Health Service.

Probabilistic linkage without patient identifiers linked 81.4% of NBOCA records to HES, compared to 82.8% using deterministic linkage. The approach had over 96% sensitivity and 90% specificity compared to deterministic linkage using patient identifiers. Of the 176 records that linked probabilistically but not deterministically, 143 (81%) agreed on small area of residence (Lower Super Output Area) and ≥4 other indirect identifiers, suggesting that most are true links and the specificity of the probabilistic linkage is therefore likely underestimated. No systematic differences were seen between patients that were and were not linked, and regression models for mortality and length of hospital stay according to patient and tumour characteristics were not sensitive to the linkage approach.

Probabilistic linkage without patient identifiers was successful in linking national clinical and administrative datasets for patients undergoing a major surgical procedure. The approach can be used as an alternative to deterministic linkage using patient identifiers, or as a method for enhancing deterministic linkage. It has important implications as it allows analysts outside highly secure data environments to carry out the linkage process while protecting data security and maintaining – and potentially improving – linkage quality.

Chronic exposure to multiple air pollutant and risk of breast cancer: A nested case - control within the E3N cohort

ABSTRACT. Background: Studies have suggested that exposure to environmental pollutants, particularly those with endocrine disrupting properties, have a role in breast cancer (BC) development. Exposure to pollutant mixtures is recognized as the real-life scenario of populations. However, the effects of mixtures are seldom analyzed.

Objectives: In this study, we applied a new statistical method to assess the complex effect of exposure to a mixture of four xenoestrogens air pollutants (Benzo-[a]-pyrene (BaP), cadmium, dioxins, polychlorinated biphenyls 153 (PCB153)) on the risk of BC.

Methods: The study was conducted on 5,222 cases and 5,222 matched controls nested within the French E3N cohort from 1990 to 2011. Annual air concentrations of pollutants were simulated with the chemistry-transport model (CHIMERE) and were assigned to subjects using their geocoded residential history. Mean exposures were calculated for each subject from their cohort inclusion to their index date. We employed a new statistical approach, Bayesian Kernel Machine Regression (BKMR), to investigate the relative risk of the joint effect of co-exposure of four xenoestrogens pollutants on the risk of BC. Due to the high correlation between pollutants, a hierarchical selection of variables with 5000 iterations by a MCMC algorithm is performed. This quantifies the relative importance of each pollutant within the group using the probability of posterior inclusion. To account for the matching design, we used the fitted probit model with a vector (saturated on the matching variables), that was close to the values of the conditional logistic regression models. The estimation of the function will be visualized by examining the dose-response relationship between each exposure, the statistical interactions between pollutants, and the joint association between the mixture and the BC risk.

Expected Results/ Conclusions: The analyses are ongoing. This study is the largest to date to evaluate the impact of multi-pollutant mixtures on the risk of BC, the most common cancer in women. The results will allow our understanding of the mixture effect in the context of a highly nonlinear, dose-response relationship, and to estimate overall, single-exposure, as well as interactive health effects.

Use of innovative methods to estimate a reliable French pathological complete response rate on real world data

ABSTRACT. OBJECTIVES. We aim to obtain the most reliable estimate of pathological Complete Response (pCR) rate post neoadjuvant treatment by exploring a new approach and using several known methods, from a retrospective national observational study including HER2+ early breast cancer patients receiving trastuzumab-based neoadjuvant and adjuvant therapy in France.

METHODS. As patient data (n=301) were only available for the sample of centers included in the registry (n=48), the exhaustive hospitalization database of center characteristics (PMSI, n=460) was used to estimate a representative French pCR rate. The estimation was done using a 5-step approach based on the generation of a modified sample of centers: Firstly, clinical relevance and correlation with pCR allowed selecting center characteristics. Secondly, a propensity score (PS), using either logistic regression or the Covariate Balancing Propensity Score (CBPS) [1], grouped the selected characteristics. Thirdly, the PS was used to create the modified sample, exploring two methods: Inverse Probability of Treatment Weighting (IPTW) and an innovative oversampling-based matching method [2]. Next, the balance was assessed by the standardized mean differences (SMD) of each center characteristic between the modified sample and the PMSI. Finally, the modified samples were used to estimate the pCR rate at patient level.

RESULTS. Two characteristics, center types and regions, were retained to estimate PS. A satisfying overlap between estimated PS distributions was observed (AUC logistic regression: 0.63 / AUC CBPS: 0.62). Each combination between PS estimation methods and sample correction methods improved the balance with the PMSI compared to the original cohort. A negligible imbalance (SMD < 10%) on all characteristics was only observed for the logistic regression + IPTW and CBPS + matching combinations. The pCR rates estimated on these samples (resp. 42.7% and 42.3%) were close to that observed in the initial cohort (42.9%).

CONCLUSIONS. Data from a dedicated registry were joined with the PMSI to reliably estimate the pCR rate in the French population, by combining propensity matching and innovative methods. Such methodology could be useful to extrapolate results from a single study to a more general population.

Sensitivity analyses for measurement error using regression calibration or simulation-extrapolation

ABSTRACT. Measurement error is common in various domains of epidemiology. In case of random measurement error, measurements randomly fluctuate around their true value. Random measurement error in an exposure introduces bias in the exposure outcome association. Methods to correct for this bias are regression calibration and simulation-extrapolation, which can also be applied in the absence of validation data about the true measurements. Both methods require assumptions about the variance of the random measurement error. A simulation study was conducted comparing the performance of simulation-extrapolation and regression calibration for correction of measurement error of the exposure variable. The performance of the two methods was evaluated assuming the absence of validation data, yet with correct assumptions about the measurement error variance. Studied scenarios differed regarding sample size, reliability of the measurements, and precision of the estimated measurement error variance. Simulation-extrapolation and regression calibration were evaluated in terms of bias, mean squared error, and 95% confidence interval coverage. Across the evaluated scenarios, regression calibration generally resulted in less bias than simulation-extrapolation (median percentage bias of 2% for regression calibration (interquartile range 2%) vs -9% for simulation-extrapolation (interquartile range 21%)). Simulation-extrapolation was however generally more efficient in terms of mean squared error (median percentage decrease in mean squared error of 16% for simulation-extrapolation vs regression calibration (interquartile range 14%)). The quantification of the performance of the two methods in a broad range of settings was used as the input for a framework guiding sensitivity analyses for random exposure measurement error in the absence of validation data.

15:15-16:45 Session IS4: INVITED : Challenges and opportunities for learning from long term disease registers
Data linkage for creating electronic birth cohorts: handling bias due to linkage error

ABSTRACT. Data linkage is a valuable tool for creating datasets with which to understand long term trajectories of health and disease. Linkage can provide a low-cost, efficient means of collecting extensive and detailed data on interactions with health and other services. These data can be used to create population-level electronic cohorts that offer the ability to answer questions that require large sample sizes or detailed data on hard to reach populations, and to generate evidence with a high level of external validity and applicability for policy-making.

Lack of access to unique or accurate identifiers means that linkage of the same individual across different data sources or time can be challenging. Errors occurring during linkage (false-matches and missed-matches) disproportionally affect particular subgroups of individuals and can lead to substantial bias in results based on linked data.

This talk will first describe methods for creating electronic birth cohorts using data linkage. We will then explore the impact of linkage error, drawing on examples from the literature. We will demonstrate a range of methods for evaluating linkage quality and discuss how bias due to linkage error can be handled within analyses.

Challenges and opportunities for learning from long term disease registers: - Causal inference

ABSTRACT. Disease incidence registers are longitudinal data bases giving opportunities to study the development of chronic diseases in response to various treatments. For time-to-event emerging long term disease registers have at least two important advantages. They let many events build up over time, thus generating a rich information basis. They uniquely allow to study long term effects of treatment choices. When populations and available treatments change over calendar time, however, unavoidable administrative censoring patterns bring new estimation challenges as well as new questions on transportability of results. Here, we describe two Swedish examples, the Swedish Renal Registry, recording incident cases of End Stage Renal Disease since 1991 and the Swedish Childhood Diabetes Register, a population-based incidence register, recording incident cases of childhood onset diabetes mellitus (T1DM) since 1977. We describe challenges and opportunities when controlling for baseline confounding for a point-treatment (immediate kidney transplant vs. start with dialysis) when the causal estimand is the difference in average potential survival curves. For the Swedish Childhood Diabetes Register, we describe selection bias and bounds when applying multiple inclusion criteria using extensions of previous results of Smith and Vanderweele (2019) and Sjölander (2020).

Methods for combining experimental and population data to estimate population average treatment effects

ABSTRACT. With increasing attention being paid to the relevance of studies for real-world practice (such as in education, international development, and comparative effectiveness research), there is also growing interest in external validity and assessing whether the results seen in randomized trials would hold in target populations. While randomized trials yield unbiased estimates of the effects of interventions in the sample of individuals (or physician practices or hospitals) in the trial, they do not necessarily inform about what the effects would be in some other, potentially somewhat different, population.  While there has been increasing discussion of this limitation of traditional trials, relatively little statistical work has been done developing methods to assess or enhance the external validity of randomized trial results.  This talk will discuss design and analysis methods for combining experimental and population data to assess and increase external validity, as well as general issues that need to be considered when thinking about external validity.   Implications for how future studies should be designed in order to enhance the ability to estimate population effects will also be discussed.

15:15-16:45 Session IS5: INVITED : The best of both worlds: combining deep learning and modeling
Pharmacometrics-Informed Deep Learning with DeepNLME

ABSTRACT. Nonlinear mixed effects modeling (NLME) is commonly employed throughout the clinical pharmacometrics community in order to uncover covariate relationships and understand the personalized effects involved in drug kinetics. However, in many cases a full model of drug dynamics is unknown. Even further, common models used throughout clinical trials ignore many potentially predictive covariates as their connection to drug effects is unknown. Given the rise of machine learning, there have been calls to utilize deep learning techniques to potentially uncover these unknown relationships, but common deep learning techniques are unable to incorporate the prior information captured in known predictive models and thus are not predictive with the minimal data available. Thus the question: is it possible to bridge the gap between deep learning and nonlinear mixed effects modeling?

In this talk we will describe the DeepNLME method for performing automatic discovery of dynamical models in NLME along with discovery of covariate relationships. We will showcase how this extension of the universal differential equation framework is able to generate suggested models in a way that hypothesizes testable mechanisms, predicts the covariates of interest, and allows incorporating data in the form of images and sequences into the personalized precision dosing framework. This framework and the automated model discovery process will be showcased in the Pumas pharmaceutical modeling and simulation environment. We will end by describing how this is being combined with recent techniques from Bayesian Neural Ordinary Differential Equations in order to give probabilistic estimates to the discovered models and allow for direct uncertainty quantification. Together this demonstrates a viable path for incorporating all of the knowledge of pharmacometricians into the data-driven future.

Temporal and relational machine learning for biostatistical and other scientific applications

ABSTRACT. Machine learning models have made it easier to reason about not only big, but also complex, data that is pervasive throughout science and engineering. In this talk, I will discuss some of our recent research on designing machine learning models for making predictions about data with complex structure, specifically temporal and relational data, for various applications, including biostatistics. A theme of the talk will be the advantages and disadvantages of deep neural network approaches to modeling such data. As part of this, I will highlight applications in time series forecasting where deep models have been particularly useful, as well as applications where deep models are unnecessary and computationally expensive. In the latter case, domain knowledge has enabled us to design algorithms that are faster and scale to larger datasets, without sacrificing accuracy. The talk will also cover some reasons why biostatistics has unique opportunities compared to other common application domains, such as social and information network analysis.

Individualizing deep dynamic models for psychological resilience data

ABSTRACT. Individualizing deep dynamic models for psychological resilience dataDeep learning approaches can uncover complex patterns in data. In particular, variational autoencoders (VAEs) achieve this by a non-linear mapping of data into a low-dimensional latent space. Motivated by an application to psychological resilience in the Mainz Resilience Project (MARP), which features intermittent longitudinal measurements of stressors and mental health, we propose an approach for individualized, dynamic modeling in this latent space. Specifically, we utilize ordinary differential equations (ODEs) and develop a novel technique for obtaining person-specific ODE parameters even in settings with a rather small number of individuals and observations, incomplete data, and a differing number of observations per individual. This technique allows us to subsequently investigate individual reactions to stimuli, such as the mental health impact of stressors. A potentially large number of baseline characteristics can then be linked to this individual response by regularized regression, e.g., for identifying resilience factors. Thus, our new method provides a way of connecting different kinds of complex longitudinal and baseline measures via individualized, dynamic models. The promising results obtained in the exemplary resilience application indicate that our proposal for dynamic deep learning might also be more generally useful for other application domains.

15:15-16:45 Session OC4A: Missing data in causal studies
Sensitivity to MNAR dropout in clinical trials: use and interpretation of the Trimmed Means Estimator

ABSTRACT. Missing data is a common feature in randomized controlled trials (RCTs) and may result in biased inference. The impact of missing data depends on the missingness mechanism and the analysis model. When data are missing completely at random (MCAR) or missing at random (MAR), estimates from a complete case analysis (CCA) or multiple imputation (MI) will generally be unbiased. Outcome values may be missing not at random (MNAR), if patients with extreme outcome values are more likely to drop out (e.g., due to perceived ineffectiveness of treatment, or adverse effects). In such scenarios, CCA and MI estimates will be biased.

It is impossible to statistically verify whether data are MAR or MNAR. To increase confidence in the primary results, current practice recommends testing the robustness of the model assumptions by performing sensitivity analyses under plausible alternative assumptions. We propose using the trimmed means (TM) estimator as a sensitivity analysis for clinical trial data with outcome value dropout, when there is cause to suspect an MNAR dropout mechanism.

The TM estimator operates by setting missing values to the most extreme value, and then “trimming” away equal fractions of both treatment groups, estimating the treatment effect using the remaining data. The TM estimator relies on two assumptions, which we term the “strong MNAR” and “location shift” assumptions. We derive formulae for the bias resulting from the violation of these assumptions for normally distributed outcomes, and demonstrate how these formulae can be used to inform sensitivity analyses.

We applied our method in a sensitivity analysis of the CoBalT RCT, which compares the effectiveness of cognitive behavioural therapy (CBT) as an adjunct to pharmacotherapy versus usual care in 469 patients with treatment resistant depression. Results were consistent with a beneficial CBT treatment effect. The MI estimates were closer to the null than the CCA estimate, whereas the TM estimate was further from the null.

Comparison of two causal inference methods for multiple treatments in clinical research on observational data

ABSTRACT. In medical research, a common and reliable way to show the causal link between exposure and an outcome is randomized clinical trial. However randomized clinicals trials are not always feasible to estimate causal effects, but a suitable analysis of observational data can lead to the estimation of causal effects. An increasingly popular method is inverse probability of treatment weighting (IPTW) estimator which is largely applied on binary exposure. However, while there is a rich literature on IPTW for binary exposure, its extensions to multiple exposures are scarce. IPTW relies on the assessment of a weight for each individual, there are multiple methods to estimate these weights. The aim of this study is therefore to compare two of these methods: a simple multinomial logistic regression and generalized boosted models (GBM), to assess advantages and drawbacks of GBM for multiple treatment causal inference. Indeed, GBM are proven to provide more stable weights estimations than parametric models in simulations studies. Additionally, GBM do not require linearity assumption compared to logistic regression. In this work, we assess weight distribution by treatment groups and covariable balance after weighting. We also evaluate whether weighting methods affect treatment effect estimates. To do this, we use weighted linear regression to estimate the causal treatment effect and compare the point estimates and their variances between weighting the two methods. These results are also contrasted with the results of a crude and adjusted (multivariable regression) analyses. We compare both methods on a national prospective cohort of sleep. Obstructive sleep apnoea syndrome (OSAS) is a major health concern with multiples consequences, especially on patient’s quality of life. Continuous positive airway pressure (CPAP), the first-line therapy for OSAS, is highly effective in terms of symptom improvement but depends on the level of adherence. Little is known on the impact of CPAP adherence on OSAS subjective sleepiness assessed using the Epworth scale. We show that distributed weights were similar with both methods, but the computational time was longer for GBM, which gives a substantial advantage to multinomial regression weight estimators.

Handling missing data for causal effect estimation in longitudinal cohort studies using Targeted Maximum Likelihood Estimation: a simulation study

ABSTRACT. Causal inference from longitudinal cohort studies plays a pivotal role in epidemiologic research. One of the available methods for estimating causal effects is Targeted Maximum Likelihood Estimation (TMLE), which is a doubly robust method, combining a model for the outcome and a model for the exposure and only one of the two has to be consistent to obtain unbiased estimates. It also offers asymptotically valid confidence intervals even when these models are fitted using machine learning approaches, which allow the relaxation of parametric assumptions. However, it is unclear how missing data should be handled when using TMLE with machine learning, which is problematic given that missing data are ubiquitous in longitudinal cohort studies and can result in biased estimates and loss of precision if not handled appropriately. We sought to evaluate the performance of currently available approaches for dealing with missing data when using TMLE. These included complete case analysis, an extended TMLE method in which a model for the outcome missingness mechanism is incorporated in the procedure, the missing indicator method for missing covariate data, and multiple imputation (MI) using standard parametric approaches or machine learning algorithms to concurrently handle missing outcome, exposure and covariate data. Based on motivating data from the Victorian Adolescent Health Cohort Study, we conducted a simulation study to evaluate the performance (bias and precision) of these approaches for estimation of the average causal effect. We considered a simple setting, where the exposure and outcome were generated from main-effects regression models, and a complex setting, where the models also included two-way and higher order interactions. Our results aim to provide guidance for handling missing data in a range of missingness scenarios depicted using causal diagrams. We illustrate the practical value of these findings in an example examining the effect of adolescent cannabis use on young adulthood mental health.

Estimands in clinical trials: making the hypothetical strategy concrete

ABSTRACT. The ICH E9 guidelines addendum introduces the term intercurrent event to refer to events that happen after randomisation and that can either preclude the outcome of interest or affect the interpretation of the treatment effect. The addendum proposes 5 options for sensible targets of inference (i.e. 5 estimands) in the presence of intercurrent events, but does not suggest statistical methods for their estimation. In this talk, we focus on estimands defined using the hypothetical strategy, where the treatment effect is estimated under the hypothetical scenario in which we (somehow) intervene to prevent the intercurrent event from occurring. To estimate a hypothetical estimand, we consider methods from causal inference (G-computation and inverse probability of treatment weighting) and missing data (multiple imputation and mixed models). We establish that certain ‘causal inference estimators’ are identical to certain ‘missing data estimators’. These links may help those familiar with one set of methods but not the other. Moreover, they allow us to transparently show using potential outcome language the assumptions missing data methods are relying on to estimate hypothetical estimands. We also present Monte Carlo simulations that provide evidence of the performance of the methods in different settings including varying rates of occurrence of the intercurrent event, intercurrent events happening at different time points during follow-up and the presence of the intercurrent event affecting the outcome.

Incorporating baseline covariates to validate surrogate endpoints with a constant biomarker under control arm

ABSTRACT. A surrogate endpoint S in a clinical trial is an outcome that may be measured earlier or more easily than the true outcome of interest T. In this work, we extend causal inference approaches to validate such a candidate surrogate using potential outcomes. The causal association paradigm assesses the relationship of the treatment effect on the surrogate with the treatment effect on the true endpoint. Using the principal surrogacy criteria, we utilize the joint conditional distribution of the potential outcomes T, given the potential outcomes S. Let S(z) and T(z) refer to the endpoint values had the treatment, possibly counter-factually, been assigned to level z. We build upon previous models for the joint distribution of potential outcomes that assume multivariate normality among the endpoints S(0), S(1), T(0), T(1) under a binary treatment. In particular, our setting of interest allows us to assume the surrogate under the placebo, S(0), is zero-valued, and we incorporate baseline covariates. Having rich baseline patient characteristic data may improve the quality of the surrogacy assessment. First, conditioning on covariates may improve the plausibility of conditional independence assumptions, and second, it allows us to make inference about whether there are subgroups of the population for whom the quality of the surrogate varies. We develop Bayesian methods to incorporate conditional independence and other modeling assumptions and explore their impact on the assessment of surrogacy. We demonstrate our approach via simulation and data that mimics an ongoing study of a muscular dystrophy gene therapy where the primary outcome is a continuous functional score. Since muscular growth and deterioration from disease have major impact on mobility, both baseline ambulatory ability (which is measured pre-treatment) and age are important to take into consideration to evaluate surrogacy. Based on our simulations, our validation method suggests that the proposed surrogate, micro-dystrophin expression, will only be valid for a subgroup of younger patients (four years of age). The trial of interest will also include a cross-over portion where placebo subjects receive the experimental treatment mid-trial, and we consider modeling these additional endpoints in the validation framework.

15:15-16:45 Session OC4B: Cluster randomized trials
Sample size calculation for stepped wedge cluster randomized trials with multiple levels of clustering

ABSTRACT. The stepped wedge cluster randomized trial is an attractive design for evaluating health services delivery or policy interventions. In this design, clusters start in the control condition and gradually cross over to the treatment based on a schedule dictated by random assignment. Outcomes may be assessed on the same individuals over time (i.e., a cohort design) or different individuals (i.e., a cross-sectional design). A key consideration in this design is that sample size calculation and analysis must account for within-period as well as between-period intracluster correlations; cohort designs have additional correlations due to repeated measures on the same individuals. While numerous methods have been developed to account for within- and between-period intracluster correlations with a single level of clustering during each time period, few methods are available to accommodate multiple levels of clustering. Our objectives were to develop computationally-efficient sample size procedures that recognize within-period and between-period intracluster correlations in stepped wedge trials with more than two levels of clustering. Focusing on three levels of clustering and assuming equal cluster-period sizes, we consider three variants, depending on whether each level is treated as a cross-sectional or closed-cohort design. We introduce an extended block exchangeable matrix to characterize the correlation structures both within- and between-clusters in each cluster-period and develop convenient sample size expressions that depend on this correlation structure. With a continuous outcome, we show the sample size expression depends on the correlation structure only through two eigenvalues of the extended block exchangeable matrix. For binary outcomes under a mixed effects framework, we develop a sample size expression based on a first-order Taylor approximation. We conduct simulation studies to examine the finite-sample properties of the proposed sample size algorithms and demonstrate the application of the proposed methods using the Washington State Expedited Partner Therapy trial: a multilevel stepped wedge trial that randomized local health jurisdictions (level 4) consisting of clinics (level 3) and observed patients (level 2) with respect to their Chlamydia infection status (level 1).

Stepped-wedge cluster randomised trials with binary outcomes and small numbers of clusters: a case study

ABSTRACT. The stepped-wedge cluster randomised trial remains a novel study design yet is becoming a more popular design choice. Randomisation is at the level of the clusters, with clusters randomly transitioning from control to treatment at a set number of ‘steps’, and then remaining exposed to the treatment for the duration of the study. The average number of clusters randomised is 17 [Inter-quartile range, 8-38], just over 50% have a binary primary outcome, often with low prevalence, and more than 50% are analysed using generalised linear mixed models (GLMM) (Martin, 2018). CONSORT reporting guidelines recommend both absolute (e.g. risk difference) and relative (e.g. risk ratio) measures of treatment effects are reported. Methods of analysis therefore need to allow estimation of relative and absolute measures of effect for binary outcomes, possibly with low prevalence, and with a small number of clusters. In linear mixed models, both maximum likelihood and restricted maximum likelihood methods produce a downward bias in estimated variance parameters; and standard Wald tests do not provide nominal levels of coverage, when there are a small number of clusters. Small sample corrections, including Satterthwaite and Kenward-Roger corrections, are therefore recommended in the setting of parallel cluster trials. These corrections are sometimes used with GLMMs despite their performance being less well documented. In the setting of logistic regression, alternative simple corrections to the degrees of freedom might be sufficient (Li & Redden, 2015). To our knowledge there has been no evaluation of small sample corrections of GLMM with binomial distribution and log or identity link (to report relative risks and risk differences), a setting where model convergence if often problematic with low prevalence. In this talk, we illustrate the choice of methods available for data-analysis of a stepped-wedge trial conducted in 18 intensive care units (the clusters) with a binary outcome with low prevalence, considering the choice of small sample corrections, degrees of freedom corrections, and availability in standard statistical software. We illustrate that whilst desirable to report risk differences, models often fail to converge. This illustrative case study will form the prelude to a simulation study investigating these properties more widely.

Inference for the treatment effect in longitudinal cluster randomized trials when treatment effect heterogeneity is ignored

ABSTRACT. Longitudinal cluster randomized trials, such as the stepped wedge, can sometimes have a treatment whose effect varies between clusters, often known as treatment effect heterogeneity. Treatment effect heterogeneity is not usually accounted for in outcome regression models, perhaps due to the additional complexity of doing so. Until now, the effect of failing to account for treatment effect heterogeneity when it is present has only be studied in a limited set of scenarios, via simulation.

In this work, we provide an analytical approximation for the impact of failing to include treatment effect heterogeneity, in particular on the variance of the treatment effect estimator when outcomes are continuous. We use this to highlight and explain what influence the design and design parameters such as numbers of clusters, number of time periods and number of observations have on the error introduced by this form of model misspecification.

Cluster randomised trials and a small number of clusters: Analysis method for a binary outcome

ABSTRACT. Cluster randomised trials (CRTs) are often designed with a small number of clusters, but it is not clear which analysis methods are optimal when the outcome is binary.

There are three types of analysis: cluster-level analysis (CL), generalised linear mixed models (GLMM), and generalised estimating equations with sandwich variance (GEE). We conducted a broad simulation study to determine (i) whether these approaches maintain acceptable type-one error, if so (ii) which methods are most efficient, and (iii) the impact of non-normality of cluster means on these approaches.

We simulated CRTs with 8-30 clusters in total, mean cluster-size from 10-1000, varying and common cluster-size, control arm prevalence of 10% or 30%, intracluster correlation coefficient from 0.001-0.1, and cluster means following a normal, gamma, or uniform distribution. We ran 1000 repetitions of each scenario. We analysed each dataset with weighted and unweighted CL; GLMM with adaptive Gauss-Hermite quadrature and restricted pseudolikelihood; GEE with Kauermann-and-Carroll and Fay-and-Graubard sandwich variance using independent and exchangeable working correlation matrices. All methods compared test statistics to a t-distribution with degrees of freedom (DoF) as clusters minus cluster-level parameters. For GLMM pseudolikelihood, we also calculated Satterthwaite and Kenward-Rogers DoF.

Unweighted CL maintained type-one error<6.4% in 854/864(99%) scenarios. GLMM pseudolikelihood with clusters minus parameters DoF controlled type-one error in 853/864(99%) scenarios. Other DoF were more conservative. Fay-and-Graubard GEE with independent working correlation matrix controlled type one error in 808/864(94%) scenarios. Exchangeable correlation results were similar. Other methods had poorer type-one error control. Cluster-mean distribution did not affect analysis method performance.

GEE had the least power. Compared to CL, with 20 or more clusters, GLMM tended to have greater power with varying cluster-size but similar power with common cluster-size. With fewer clusters, GLMM had less power with common cluster-size, similar power with medium variation in cluster-size, and greater power with large variation in cluster-size.

We recommend that CRTs with ≤30 clusters and a binary outcome use an unweighted CL or restricted pseudolikelihood GLMM both with DoF clusters minus cluster-level parameters. The methods and findings are illustrated by application to a CRT of an intervention to increase adherence to Tuberculosis medication.

Under what conditions do open-cohort cluster RCTs provide improvements over conventional designs? A simulation study.

ABSTRACT. Background

DCM-EPIC [1], a care home cluster-randomised trial (CRT), had ~45% unavoidable drop-out of residents after 16 months. Institutions such as care homes, schools and hospitals can be viewed as ‘open cohorts’, because individuals move in and out of them over time. There are currently two established designs for parallel-group CRTs in open cohorts where outcomes are measured repeatedly over time. Closed-cohort (CC) designs recruit individuals at baseline who are followed over time. (Repeated) cross-sectional (R-CS) designs allow recruitment post-randomisation, sampling one or more cross-sections of individuals at different time points. CC designs can assess individual change over time but are limited by the drop-out of individuals, which introduces missing data and bias, and affects generalizability. R-CS designs are more robust to drop-out but can only provide population-wide inference at specific time points. Although DCM-EPIC was designed as CC, high levels of drop-out warranted a design change during the trial.

We propose the open-cohort (OC) CRT design as a potential solution. In this hybrid of the existing designs, a sample of individuals in each cluster are followed over time, with further recruitment replacing individuals who drop out. The OC-CRT design could be attractive to trialists as change at both individual and population-level can be estimated.


To determine whether the OC-CRT design provides improved precision and bias compared to the existing designs over a range of study parameters and realistic complications.


Open cohort data will be simulated using various longitudinal multilevel models. Study parameters to be varied include the design, ICC, number/size of clusters and number of follow-ups, amongst others. Complications include the level of selection bias from post-randomisation recruitment, drop-out mechanism, turnover rate of individuals and more.

Datasets will be analysed using two models; Kasza’s single-timescale model [2] and a new extension, which includes an additional timescale.


Simulation results are under review and will be presented.


Open-cohort designs have the potential to be superior to existing CRT designs when clusters have a moderate to high turnover of individuals, as in care homes, and can address a wider range of research questions.

15:15-16:45 Session OC4C: Meta-analysis
Comparison of frequentist and Bayesian methods for two-arm borrowing of historical data

ABSTRACT. The slow progress of drug development and the high costs associated with clinical trials urgently call for more innovative clinical trial design and analysis methods to reduce development costs and patient burden. To address this problem, a potential strategy could be to supplement data from a current clinical trial with existing data from relevant historical studies. This so-called extrapolation or borrowing is particularly valuable when the recruitment of patients is difficult due to ethical, logistical or financial reasons, for example in trials in paediatric or rare diseases.

The main issue associated with the use of historical data is the potential for inflation of the type I error rate. This means that it is important to choose the right extrapolation method, ensuring that the amount of strength borrowed from the historical study is appropriate and is adjusted to the agreement between the two trials with the aim of increasing the power of the current trial whilst at the same time controlling the type I error rate.

A number of frequentist and Bayesian statistical methods have been proposed for borrowing historical control-arm data. However, there is relatively little research on borrowing information from both the control and treatment arms of a single historical two-arm trial. In this work, we extend static and dynamic borrowing methods proposed for the control-arm borrowing, including the test-then-pool, Bayesian power prior, commensurate prior and meta-analytic-predictive prior methods, to the setting of two-arm borrowing. These methods are then evaluated in simulation studies investigating a two-arm trial with a binary outcome to find appropriate borrowing parameters whilst optimising the trade-off between type I error and power.

Our simulation studies show that the degree of type I error inflation is mainly affected by the historical rate difference. Dynamic borrowing approaches are shown to offer better control of the type I error inflation over a wide range of scenarios, with the choice of borrowing parameters playing an important role.

Implications of Analysing Time-to-Event Outcomes as Binary in Meta-analysis

ABSTRACT. Background: Systematic reviews and meta-analysis of time-to-event outcomes are frequently published within the Cochrane Database of Systematic Reviews (CDSR), however, these outcomes are handled differently across meta-analyses. They can be analysed on the hazard ratio (HR) scale or can be dichotomized and analysed as binary outcomes using effect measures such as odds ratios (OR). We investigated the impact of reanalysing meta-analyses from the CDSR that used these different scales and using individual participant data (IPD).

Methods: We extracted two types of meta-analysis data from the CDSR either recorded in a binary form (A) or in binary form together with observed minus expected (“O-E”) and variance (“V”) statistics (B). We explored how results for time-to-event outcomes originally analysed as binary on an OR scale (A) change when analysed using the complementary log-log (clog-log) link on a HR scale. For the data originally analysed as HRs (B), we compared these results to analysing them as binary on a HR scale using the clog-log link or using a logit link on an OR scale. Additionally, using IPD meta-analyses, we compared results from analysing time-to-event outcomes as binary on an OR scale to analysing on the HR scale using the clog-log link , the log-rank approach or a Cox proportional hazards model.

Results: For both data types within the CDSR, approximately 19% of meta-analyses provided significant results under one scale and non-significant results under the other. Results from the log-rank approach and Cox proportional hazards model were almost identical; situations under which the clog-log link performed better than logit link and vice versa were apparent, indicating that the correct choice of the method does matter. Differences between scales arise mainly from the following reasons: (1) high event probability, (2) differences in between-study heterogeneity, (3) increased within-study standard error in the OR relative to the HR analyses, (4) percentage censoring, and (5) follow-up time.

Conclusions: We identified that dichotomising time-to-event outcomes may be adequate for low event probabilities and short term outcomes but not for high event probabilities; these findings provide guidance on the appropriate methodology that should be used when conducting such meta-analyses.

Exploring non-linear treatment-covariate interactions at multiple time points using multivariate IPD meta-analysis

ABSTRACT. Background Personalised medicine refers to how we tailor treatment decisions to each patient conditional on their characteristics. This requires research to identify interactions between treatment effect and patient-level covariates. An Individual Participants Data (IPD) meta-analysis allows us to better explore such interactions but challenges arise when included trials have multiple and missing follow-up time points and we aim to examine continuous covariates with potentially non-linear associations. Objectives To develop and apply a two-stage multivariate IPD meta-analysis model to estimate non-linear treatment-covariate interactions across multiple time points using IPD from multiple randomised trials with a continuous outcome. Method In the first stage, in each study separately, we model non-linear interactions by restricted cubic spline functions across multiple time points jointly, using longitudinal linear models to account for participant-level correlation in each trial. Knots are forced to be in the same location in each trial. In the second stage, we pool the study-specific spline function parameter estimates from all time-points simultaneously, using a multivariate meta-analysis that accounts for their within-study and between-study correlations. We apply the model to IPD from a large dataset of 31 trials that investigated covariates that interact with the effect of exercise interventions for the treatment of knee and/or hip osteoarthritis (STEER OA). Results The proposed method allows borrowing strength across multiple time points, and can handle participants and trials that are missing information at some time points. The results allow graphical displays of study-specific and summary non-linear interactions to help disseminate findings to clinicians and patients. In our application, baseline pain and baseline functional activity are found to have a non-linear interaction with the treatment effect on pain and function at 3 months and 12 months. This was masked when only considering linear trends. Conclusion Given IPD from multiple randomised trials we recommend exploring non-linear interactions across multiple time points using a two-stage multivariate IPD meta-analysis to account for correlations both at an individual level and across multiple time points.

A comprehensive framework for ‘deft’ (within-trials) interactions in meta-analysis

ABSTRACT. A key question for meta-analyses is to reliably assess whether treatment effects vary across different patient groups. Traditionally, these interactions have been estimated using approaches known to induce aggregation bias, so we previously recommended a ‘deft’ (within-trials) approach to provide unbiased estimates for binary or ordered-categorical patient-level treatment-covariate interactions [1]. However, patients, clinicians and policy-makers also need to know the relative and absolute size of the overall treatment effect within each covariate subgroup, to target treatments appropriately.

In this presentation, we extend the ‘deft’ methodology to a fully flexible framework to (1) estimate ‘deft’ interactions for covariates with multiple levels; (2) estimate a set of subgroup-specific treatment effects consistent with the ‘deft’ interactions; and (3) incorporate heterogeneity into the estimation of both interactions and subgroup effects, considering four distinct heterogeneity structures. These methods require relatively little information and can be applied to aggregate (or “published”) source data, as well as individual patient data (IPD); and as such have wide practical application. We demonstrate a straightforward implementation in Stata with the existing user-written package “mvmeta”.

In a recent aggregate data meta-analysis investigating the effect of corticosteroids on mortality among critically-ill COVID-19 patients [2], we applied our methodology to a binary covariate: whether patients received invasive mechanical ventilation (IMV) at randomisation. Although a ‘deft’ interaction test was reported (p=0.0084), the published subgroup-specific effect sizes (IMV: OR=0.69, 95% CI 0.55 to 0.86; No IMV: OR=0.41, 95% CI 0.19 to 0.88) were at risk of aggregation bias. Using our methodology, we estimated subgroup effects, compatible with the ‘deft’ interaction, under a common-effect model (IMV: OR=0.73, 95% CI 0.58 to 0.92; No IMV: OR=0.19, 95% CI 0.07 to 0.50). In a further IPD example in lung cancer, we apply our methodology to a covariate with three levels, and derive absolute differences in survival. We compare the results and interpretations of our four different approaches to modelling heterogeneity, discuss the impact of trials which only contribute to a single subgroup, and propose recommendations for best practice.

Inclusion of real world data in surrogate endpoint evaluation: a Bayesian meta-analytic approach

ABSTRACT. Surrogate endpoints play an important role in drug development when they can be used to measure treatment effect early compared to the final clinical outcome and to predict clinical benefit or harm. Meta-analysis provides a useful framework for combining evidence from multiple studies and can be used to evaluate a relationship between treatment effects on a surrogate endpoint and a final outcome. Traditionally, data from randomised controlled trials (RCTs) have been used to evaluate surrogate relationships as they achieve high internal validity. However, when few RCTs are available, meta-analysing sparse RCT data may affect the evaluation of surrogate endpoints as the estimates of the parameters describing a surrogate relationship can be obtained with considerable uncertainty and poor accuracy. In such circumstances, the inclusion of observational cohort studies (OBCs) can help to obtain more precise estimates of the parameters describing surrogate relationships as well as more precise predictions of clinical benefit. This can be crucial when policy decisions need to be made based on a surrogate endpoint and further experimentation may be lengthy or unfeasible due to budget constrains. In this paper, a new method for combining evidence from different sources is proposed to improve the evaluation of surrogate endpoints in circumstances where RCTs offer limited evidence. The method extends a model proposed by Begg and Pilote to the bivariate case and allows for adjusting for systematic biases across different types of designs. This is important as the limited internal validity of OBCs can introduce bias to the estimates of the parameters describing surrogate relationships and potentially affect the evaluation of surrogate endpoints. A simulation study was carried out to assess the proposed method in various scenarios. We also applied the method to a data example in advanced colorectal cancer investigating the impact of combing RCTs with OBCs on the evaluation of progression-free-survival (PFS) as a surrogate endpoint of overall survival (OS). The inclusion of OBCs in the meta-analysis improved the evaluation of PFS as a surrogate endpoint of OS, resulting in reduced uncertainty around the estimates of the parameters describing the surrogate relationships and around the predicted effects on OS.

15:15-16:45 Session OC4D: Prediction model for omics data
Tailored Bayesian variable selection for risk prediction modelling under unequal misclassification costs

ABSTRACT. Background: Risk prediction models are a crucial tool in healthcare. They are often constructed using methodology which assumes the costs of different classification errors are equal. However, in many healthcare applications, this assumption is not valid, and the differences between misclassification costs can be quite large. For instance, in a diagnostic setting, the cost of misdiagnosing a person with a life-threatening disease as healthy may be larger than the cost of misdiagnosing a healthy person as a patient. As a result, Tailored Bayes (TB) was proposed as a principled, simple and widely applicable umbrella framework to incorporate misclassification costs into Bayesian modelling [1]. Using both simulations and real data the authors showed that the TB approach allows us to “tailor” model development with the aim of improving performance in the presence of unequal misclassification costs.

Objective: To extend the TB framework by incorporating a variable selection procedure, a ubiquitous challenge in statistical modelling, especially, with the rise of high-dimensional data.

Method: We incorporate the TB approach into a hierarchical sparse regression framework and apply it to the METABRIC cohort (n = 1787). We investigate the clinical utility of already identified genes when their effects are modelled jointly, alongside routinely used clinicopathological covariates to predict 5-year risk of relapse in breast cancer. In total, we search over 1501 covariates. We compare the results between the TB and standard Bayesian (SB) modelling.

Results and Conclusions: We show that TB favours smaller models (with fewer covariates) compared to SB, whilst performing better or no worse than SB. This pattern was seen both in simulated and real data. This allows more parsimonious explanations for the data at hand. In addition, we show the ranking of covariates changes when we take misclassification costs into consideration. This has implications for risk prediction models since smaller models may result in lower data collection costs and different covariates used in further downstream analysis, for instance in genetic fine-mapping and related applications.

[1] Solon Karapanagiotis, Umberto Benedetto, Sach Mukherjee, Paul DW Kirk, and Paul J Newcombe. Tailored Bayes: a risk modelling framework under unequal misclassification costs. Under review, 2021.

Feature selection in multivariate varying-coefficient mixed models for drug response prediction

ABSTRACT. Large-scale pharmacogenomic datasets often include multiple anti-cancer drugs, different cancer tissue types and heterogeneous multi-omics data. There are several challenges to model these data, such as those posed by correlated responses between multiple drugs, heterogeneity both between multiple tissues and between multi-omics data. We propose a multivariate varying coefficient mixed model which uses our IPF-tree-lasso method (Zhao and Zucknick, 2020) to take into account the drug- drug similarities and heterogeneity between multi-omics data. Importantly, the novel model employs random effects and varying coefficients to capture the underlying heterogeneity between multiple tissue samples. Simulation studies show that our proposed model improves the accuracy of drug response predictions and feature selection when comparing with existing lasso-type methods. We demonstrate the practical performance of our approach on a large preclinical pharmacogenomic study, the Cancer Therapeutics Response Portal (CTRP), where the model predicted the sensitivity of ca. 500 cancer cell line samples to ca. 200 drugs using ca. 10000 genomic features of the cell lines, including gene expression, copy number variation and point mutations.

Variational Bayes for Model Averaging for Multivariate models using Compositional Microbiome predictors
PRESENTER: Darren Scott

ABSTRACT. High-throughput technology for molecular biomarkers produces multivariate data exhibiting strong correlation structures, and thus should be analysed in an integrated manner. Bayesian models are strongly suited to this aim. A particular case of interest is microbiome data, which is inherently compositional, and thus imposes a constraint on model space.

A Bayesian model is presented for multivariate analysis of high-dimensional outcomes and high-dimensional predictors, including compositional microbiome predictors. The model includes sparsity in feature selection for predictors and covariance selection. A model averaging approach is taken to ensure robust selection of predictors. A hybrid Variational Bayes - Monte Carlo computational approach (following Ye et al. 2020) is used for the compositional data updates.

Fast marginal likelihood estimation of penalties for group-adaptive elastic net

ABSTRACT. Nowadays, clinical research routinely uses omics, such as gene expression, for predicting clinical outcomes or selecting markers. Additionally, so-called co-data are often available, providing complementary information on the covariates, like groups of genes corresponding to pathways. Elastic net is widely used for prediction and covariate selection. Group-adaptive elastic net learns from co-data to improve prediction and selection, by penalising important groups of covariates less than other groups. Existing methods are, however, computationally expensive. Here we present a fast method for marginal likelihood estimation of group-adaptive elastic net penalties for generalised linear models. The method uses a low-dimensional representation of the Taylor approximation of the marginal likelihood and its first derivative for group-adaptive ridge penalties, to efficiently estimate these penalties. Then we show by using asymptotic normality of the linear predictors that the marginal likelihood for elastic net models may be approximated well by the marginal likelihood for ridge models. The ridge group penalties are then transformed to elastic net group penalties by using the variance function. The method allows for overlapping groups and unpenalised variables. We demonstrate the method in a cancer genomics application. The method substantially decreases computation time while outperforming or matching other methods by learning from co-data.

Improving model performance estimation in high-dimensional data settings by using learning curves

ABSTRACT. In high-dimensional prediction settings, i.e. when p > n, it remains challenging to estimate the test performance (e.g. AUC). Especially for medical applications, e.g. predicting whether a certain therapy is successful, this should be done reliably. Arguably, the most widely used method is conventional K-fold cross-validation, which aims to balance between enough samples to learn the model and estimate its performance. We show that combining estimates from a trajectory of subsample sizes, rendering a learning curve [1], leads to several benefits. Firstly, use of a smoothed learning curve can improve the performance estimate compared to 10-fold cross-validation. Secondly, a still growing- or saturating learning curve indicates whether or not additional samples will boost the prediction accuracy. Thirdly, comparing the trajectories of different learners gives a more complete picture than doing so at one sample size only, which we demonstrate by evaluating a lasso-, ridge-, and random forest model. Fourthly, the learning curve allows computation of a lower confidence bound for the performance. Standard cross-validation produces very wide confidence bounds due to the small amount of test samples and the correlation structure between different training- and test splits. The learning curve finds a better trade-off between training- and test sample sizes, which leads to sharper bounds. This confidence bound is proven to be valid. We show coverage results from a simulation, and compare those to a state-of-the-art technique based on asymptotics [2]. Finally, we demonstrate the benefits of our approach by applying it to several classifiers of tumor location from blood platelet RNAseq data.

15:15-16:45 Session OC4E: competing risks and multi-state models
Bayesian inference for the direct approach for competing risk modeling with Gompertz distribution

ABSTRACT. The direct approach for competing risk modeling of survival data was proposed by Jeong and Fine (Jeong and Fine, 2006) to model simultaneously the cumulative incidence of several events individuals in a cohort are at risk of. This is a good alternative to the Fine and Gray method which were modeling one specific event in presence of risk of events which can prevent its realization. However, there are problems in estimating the maximum likelihood of this model. The form of the likelihood does not allow to determine global minima for most of the algorithms available in the software. In this article, we therefore propose a Bayesian inference approach, which will allow a better approach of the parameters. Three different approaches were presented to evaluate our approach. We first used the Jeffrey’s non-informative prior. The second prior distribution was the Zellener’s maximal data information prior (MDIP). Finally, we tried independent gamma distributions. The three models were applied to simulated data. They were compared with a maximum likelihood estimation. The maximum likelihood inference for the Gompertz competing risk does not offer guarantee of reaching the global minimum, thus providing the right estimations of the parameters. When the right prior is chosen, this estimation presents more accurate results. This version of the model can now be disseminated and even proposed in the more advanced statistical packages to permit good estimation of long-term survival models with multiple events.

Analysis of competing risks data using restricted mean time lost

ABSTRACT. In clinical and epidemiological studies, hazard ratios are often applied to compare treatment effects between two groups for survival data. For competing risks data, the corresponding quantities of interest are the subdistribution hazard ratio (SHR). However, the clinical applications and practices of SHR still have some limitations related to model assumptions and clinical interpretation. Therefore, an alternative statistic, restricted mean time lost (RMTL) [1-3], has been recommended for its intuitive and simplicity interpretation. However, the published researches of RMTL seem lack of robustness and completeness in statistical inference and practical application. Thus, we propose a new estimation and hypothetical test and sample size estimator based on the difference in RMTL (RMTLd). The simulation results show that the RMTLd test has robust statistical performance (both type I error and power). Meanwhile, the RMTLd-based sample size can approximately achieve the predefined power level. The results of the example analyses also verify the performance and acceptability of the RMTLd test. From the perspectives of clinical interpretation, application conditions and statistical performance, we recommend that the RMTLd be reported with the SHR when analyzing competing risks data and that the RMTLd even be regarded as the primary outcome when the proportional hazard assumption fails.

References [1] Andersen PK. Decomposition of number of life years lost according to causes of death. Stat Med. 2013;32(30):5278-85. [2] Zhao L, Tian L, Claggett B, et al. Estimating Treatment Effect With Clinical Interpretation From a Comparative Clinical Trial With an End Point Subject to Competing Risks. JAMA Cardiol. 2018;3(4):357-358. [3] Lyu J, Hou Y, Chen Z. The use of restricted mean time lost under competing risks data. BMC Med Res Methodol. 2020;20(1):197.

Impact of competing event in COVID-19 clinical data analysis

ABSTRACT. Objective: Many coronavirus disease 2019 (COVID-19) trials’ researchers have calculated the sample sizes and tested hypotheses based on single time-to-event methods and selected clinical improvement or recovery as event of interest while death as right censoring. However, the sample sizes and the conclusions may be misleading. Statistical methods: To compare competing risks methods with single time-to-event methods in calculating sample sizes and testing hypotheses at different competing event rates, we calculated the sample sizes and tested hypotheses based on eight reconstructed clinical trial datasets using competing risks methods (sub-distribution hazard, SHR and restricted mean time lost difference, RMTLd) and single time-to-event methods (hazard ratio, HR and restricted mean survival time difference, RMSTd). Monte Carlo simulations were conducted to compare differences in sample sizes and powers between competing risks methods and single time-to-event methods under different competing event rates. Results: In four COVID-19 trials, the sample sizes based on competing risks methods were all higher than those based on single time-to-event methods. In the trials of Sharples and Imazio, the conclusions drawn based on the results from competing risks methods and those drawn based on the results from single time-to-event methods may be opposite. The simulation results show that the powers based on competing risks methods increase rapidly as the competing event rate increases. If powers were calculated based on the sizes of hazard ratios and competing risks methods, they might not reach the target and decrease as the competing event rate increases. In similar to COVID-19 studies, competing risks methods are recommended to calculate the sample size and test hypothesis while the event of interest and the competing event cannot be treated as a composite event, nor can the competing events be right censored.

References [1] Lyu J, Hou Y, Chen Z. The use of restricted mean time lost under competing risks data. BMC Med Res Methodol. 2020;20:197. [2] Wu H, Yuan H, Chen Z. Implementation of an alternative method for assessing competing risks: restricted mean time lost. 2021. Under Review

Parametric Landmark estimation of the transition probabilities in survival data with multiple events

ABSTRACT. The estimation of transition probabilities is of major importance in the analysis of survival data with multiple events. These quantities play an important role in the inference in multi-state modeling providing in a simple and summarized manner long-term predictions of the process. Recently, de Uña-Álvarez and Meira-Machado (2015) proposed nonparametric estimators based on subsampling, also known as ladmarking, which have already proved to be more efficient than other nonparametric estimators in case of strong violation of the Markov condition. However, as the idea behind the landmarking is to use specific portions of data when the subsample sizes are reduced or in the presence of heavily censored data this may lead to higher variability of the estimates.

To avoid the high variability of the nonparametric landmark estimator proposed by the de Uña-Álvarez and Meira-Machado (2015), we introduce parametric estimators for the transition probabilities that are also based on subsampling. We have considered several flexible distributions to handle this issue appropriately. One of the proposed approaches, which provides good results, with high flexibility, is based on the generalized gamma distribution.

Results of simulation studies confirm the good behavior of the proposed methods. We also illustrate and compare the new methods to the nonparametric landmark estimator through a real data set on colon cancer.

Phase I/II dose-finding design for right censored toxicity endpoints with competing disease progression

ABSTRACT. Background: The growing interest in new classes of anti-cancer agents, such as molecularly-targeted therapies (MTAs) and immunotherapies drugs with modes of action different from those of cytotoxic chemotherapies has changed the dose-finding paradigm. In particular, dose-finding designs should be able to handle the frequent late-onset toxicities by defining prolonged observation windows. In this setting, it is likely that the observation of late-onset toxicity endpoints may be precluded by trial discontinuation due to disease progression, defining a competing event to toxicity. Specific trial designs with prolonged observation windows, where dose-finding is modelled using survival models to handle right-censored endpoints in a competing risks framework, appear particularly suited.

Objectives: To propose a phase I/II dose-finding design using survival models for censored endpoints allowing the outcomes to be delayed and handling possible informative censoring by considering a competing-risks framework.

Methods: In this competing risks framework, we defined the cause-specific hazard for dose-limiting toxicity (DLT) and progression, both assumed exponentially-distributed and we estimated model parameters using Bayesian inference. For dose-finding, we targeted the cumulative incidences which are sub-distribution functions of time-to-DLT and time-to-progression. Given an observation window, the objective is to recommend the dose that minimizes the progression cumulative incidence, among an acceptable set of doses with DLT cumulative incidence inferior to a target threshold. In addition, we propose a nonparametric benchmark approach for evaluation of dose-finding designs with right-censored time-to-event endpoints. Design operating characteristics were evaluated in a simulation study, notably in terms of correct dose selection and safety, including sensitivity analysis to time-varying hazards of events and to different patient accrual schemes.

Results: The performance of the proposed methods was consistent with the complexity of scenarios as assessed by the nonparametric benchmark. We found that the proposed design present desirable operating characteristics, in particular in cases of non-negligible hazard of progression competing with DLT, compared to other existing phase I/II designs.

Conclusion: We propose a framework for seamless phase I/II trials targeting the subdistribution cumulative incidences of toxicity and progression for dose-finding, using working models for censored data. It allows prolonged observation windows resulting in administrative censoring and competing risks.

Modeling Non-Proportional Hazards for Overall Survival Time for Cancer Treatments

ABSTRACT. Background: The Cox’s proportional hazards regression is a quite common and useful model in many medical researches. What it essentially means is that the ratio of the hazards for any two individuals is constant over time. However, the proportional hazards (PH) assumption is potentially a major constraint. When the PH assumption is violated, the hazard ratio (HR) is expressed as a function of time.

Objective: In this study, we confirmed that the overall survival (OS) time for cancer treatments does not satisfy the PH assumption in many cases, and then we proposed a way of modeling such cases where the PH assumption is violated.

Method: Among the cancer drugs used in Japan, we chose 50 drugs which include the Kaplan-Meier curve in its attached document, and we confirmed whether the proportionality is satisfied. We also developed a non-proportional hazards model in which the HR is considered to be a time-dependent function.

Results and Conclusion: Among 50 OS time figures, 35 (70%) cases did not satisfy the PH assumption significantly. It was confirmed that the HR as a time-dependent function can be successfully modeled in all 50 cases by expressing it with some parameters such as convergent value and measure.

Sensitivity of results to missing data for clinical trials with discrete, longitudinal outcome measurements

ABSTRACT. Background

Multi-state models (MSM) are structures that can represent transition of patients through different disease categories (states). Analysis of discrete, interval censored longitudinal data using MSM is established and can lead to increased power for clinical trials, compared to analysis of aggregated data using binary or time to event methods. However, longitudinal data often suffers from intermittent missing measurements, which may depend on the true disease state. If the state is derived from composite data, consideration of the quantity and reason for missing components, and the potential association with the latent (missing) state is required.


To investigate methods for handling non-ignorable missing data in a MSM framework.


We investigate joint (selection) models for the multi-state process and the probability of missing data. Such selection models are equivalent to hidden Markov models, where an additional ‘state’ is used to represent a misclassification of the underlying latent state, whilst the observed data are assumed to be accurate. For interval-censored data, misclassification probabilities and transition intensities for the MSM may be estimated simultaneously using the ‘msm’ program in r-CRAN library. The model was applied to a dataset from a pressure ulcer prevention trial, where different assumptions for the missing state were considered.


Exploration of the motivating trial dataset identified ‘key’ components in the definition of the disease that would be associated with the true disease state and would be informative if missing. Based on these key components, three candidate definitions for missing state mechanism were identified for our motivating example.

Applying misclassification models demonstrated that there was important variation in the point estimate and precision (and therefore hypothesis tests) of treatments effects between different missing data assumptions. Further, the probability of missing state was higher for non-healthy true (latent) states compared to the healthy state for all definitions.


Sensitivity to non-ignorable missing data can be accommodated in MSM using hidden Markov models. The definition of missing data for an endpoint derived from composite data needs careful consideration in the context of the research setting.

New Application of competing risks model in IgA nephropathy to explore the severity-dependent urinary remission

ABSTRACT. Outline of Clinical research and its problem: Nationwide prospective cohort study of immunoglobulin A nephropathy (IgAN) has been conducted throughout Japan to confirm risk classification ability of severity grading systems (HG,CG,RG). Patients with IgAN were registered between April 1, 2005 and August 31, 2015. The primary outcome (PO) was a 50% increase in serum creatinine from baseline or dialysis induction, whichever is earlier. The secondary outcomes (SOs) were proteinuria remission and hematuria remission. SOs were preferred events. The follow-up data were collected every 6 months. The final observation date was January 31, 2018. The time-to-event data were treated as censored at the latest respective examination date, if the respective events were not confirmed. With conventional statistical method (log-rank test), the association of severity grading systems and PO was clear, however, their association with hematuria remission was unclear.

The objective: To make the association between the severity grading systems and hematuria remission more clear. Statistical methods: Proposed statistical method is a new application of competing risks model with elaborate data handling. In the database, there were such data as the latest examination dates between PO and SOs were different in the same patients. And some patients experienced urinary remission after 50% increase in serum creatinine. While all patients transferred to dialysis never achieve urinary remissions, some patients can experience PO after urinary remission in real world. Since 50% increase in serum creatinine was surrogate for dialysis, we treated PO and respective SO in the framework of (semi-)competing risks. If the lengths of follow up records were different between PO and a SO in the same patient, we truncated the longer time-to-event data at the shorter one and used the event status at the truncated time in the competing risks analyses.

Results.The association of HG and the SOs was confirmed by Fine-Gray model(1) with contrasts or Gray’s test, and the cumulative incidence function was shown by severity level. The influence of the SOs to the PO was confirmed by multivariate Cox regression with time-dependent remissions status, and cumulative joint incidence functions(2) after respective SO will be plotted.

Frailty Multi-state Model with Time-dependent Covariate for Prediction of Colorectal Cancer Progression

ABSTRACT. Background: Colorectal cancer (CRC) patients usually have a complex medical history. They often experience recurrence (loco-regional or metastases) as an intermediate event which increases the risk of mortality. Although the disease progression is highly dependent on clinical or pathological stages, but the disease course of the same stage patients can differ considerably. Multi-state models are used to describe the progression of a complex disease which occupies several states over time. Predictions in survival analysis become adjustable when considering intermediate events, so the result of multi-state survival models are more accurate than considering just one event [1]. Usually homogenous Markov modeling are applied to analyze multi-state models, but the homogeneity assumption is unrealistic when considering time-dependent covariate in modeling a disease course. Objective: We evaluated the disease course of colorectal cancer by utilizing a parametric multi-state model with time-dependent covariates with and without frailty term in the model. Methods: We obtained the data of newly-diagnosed CRC patients who had undergone curative surgery and admitted to the Clinical Oncology Department at Imam-Hossein Hospital, Tehran, Iran, between 2002 and 2013. The last date of follow-up was May 2018. Demographic characteristics and clinical data of all patients were obtained through their medical records and follow-ups. A non-recursive illness-death model was considered for modeling CRC evolution in which the initial state (1) was alive without recurrence, the transient state (2) was alive with recurrence, and the only absorbing state (3) was death of any causes. We used piecewise-constant approximation for Weibull transition-specific model and compared its result when incorporating log-normal frailty into the model [2]. Result: A total of 339 CRC patients with mean age 53.32 ± 11.44 years of were included. The median follow-up of the patients was 6.2 years. Of the whole, 40.12% of patients experienced recurrence of whom 80.1% died and 10.9% of patients died without recurrence. The AICs were calculated to be 1702.893 for model I and 1527.75 for model II. Conclusion: The incorporation of frailty into the parametric multi-state model resulted in a better fit of model for prediction of CRC progression.

Time to readmission among newborns: time for a reappraisal?

ABSTRACT. Traditional analyses of hospital readmissions calculate time to readmission relative to index visit discharge. In the context of newborns, the classic readmission definition can be problematic particularly when comparing groups with disparate birth lengths of stay, as is often the case when studying neonates with conditions requiring longer post-natal hospitalization. For this study population, age from birth versus age at discharge may differ by weeks or months. We compare two methods of examining readmissions within the first year for infants diagnosed with neonatal opioid withdrawal syndrome (NOWS) compared to normal newborns (average LOS: 17 days vs. 2 days). First we applied the traditional definition to examine readmission timing from birth discharge using crude estimates of proportions and a Cox regression model. Second, we defined readmission timing by day of life and compare the corresponding proportions, then fit a Cox model with left truncation to allow delayed entry of hospitalized neonates into the at-risk period at time of discharge. Results using the traditional definition indicated normal newborns were at highest risk of readmission within the first few days since discharge while infants with NOWS were at higher risk later into infancy resulting in violation of proportional odds, an assumption which the Cox model requires for validity. We examined the hazard function and constructed a piecewise model predicting early readmissions (<25 days) and late readmissions (≥25 days). In adjusted models, NOWS infants had 1.5 times the hazard of late readmissions with no difference in early readmissions. Models predicting readmission by day of life indicated no violations of proportional odds, and overall estimates of one-year hazard ratios were similar [1.76 (95% CI: 4.40-2.22) vs. 1.55 (1.09-2.22)]. Crude estimates differed substantially between methods particularly within the first 30-days but converged at later time points through one-year. These methods indicate similar overall findings between the two approaches though readmissions indexed to day of life offers a more intuitive interpretation with no issues of non-proportionality. Advances in time-to-event modeling available in most statistical packages allow for easy incorporation of left truncation, which is particularly useful in the context of readmissions for newborns.

Studying the longitudinal trajectory of potassium in heart failure patients through dynamic survival models

ABSTRACT. Background: Potassium plays a fundamental role in the heart functioning. In patients affected by Heart Failure (HF), the disease itself together with the pharmacological treatment can alter potassium values. In clinical practice, dangerous changes are identified according to a single measurement and a cutoff which has been questioned by recent studies. This would be a trivial thing if it didn’t lead cardiologists to decide for the discontinuation of life-saving treatments. Clinical research highly needs new methods to better explore the dynamic impact of potassium on survival for personalized optimization of the treatment in HF. Objectives: The aim of this study is to propose a dynamic survival model to study the association between individual potassium trajectories and survival, which could provide an alternative to identify patterns associated with a lower survival probability. Methods: The data comes from the administrative regional data of the Friuli Venezia Giulia Region, integrated with the Outpatient and Inpatient Clinic E-Chart. We exploited the continuous, longitudinal nature of potassium representing it as a functional datum in order to go beyond the cut-off paradigm. The two main approaches to dynamic survival modelling that have been considered to study the association between potassium and the outcome are: joint modelling and landmarking. Results: The study included 3678 patients affected by HF who were observed for a median time of 45 months (IQR: 25-68). Over this period, the median number of potassium measurements per subject was 16 (IQR: 7-31). Moreover, the survival probability after 4 years of follow-up was 0.65 (95%CI: 0.64-0.67). The analyses highlighted some novel insights of the relationship between potassium and survival. They confirmed the need of using the longitudinal trend of potassium to identify when a patient shows a potassium trajectory which increase the risk of events. Conclusions: This work leads to promising new directions for the treatment of HF patients and the developing of personalized treatment tools. Future research should further investigate the estimation of personalized treatment schedules based on the potassium trajectory and the risk of adverse outcomes to avoid premature discontinuation of life-saving treatments in patients affected by HF.

Use of electronic health records to enhance data from a single clinical trial evaluating maintenance therapy in non-small cell lung cancer patients

ABSTRACT. Randomised control trials (RCTs) are considered as gold standard for evidencing treatment efficacy and subsequent decision making in health care research. However, there has been a shift in focus towards the prospects of real-world evidence (RWE) in complementing RCTs to support decision making and enhance estimation of treatment effects.

We aim to develop and compare methods to combine registry data with existing trial evidence to improve inference by using simulated Systemic Anti-Cancer Therapy (SACT) data available from the Simulacrum database. We explore the potential of simulated registry data in emulating the control arm of the completed PARAMOUNT trial investigating the effects of pemetrexed maintenance therapy on overall survival in non-small cell lung cancer (NSCLC) patients. We intend to evaluate the effects of combining RWE from SACT and PARAMOUNT trial data on the results that were presented at NICE. Methods to adjust for selection bias between both populations will be used to then enhance the comparison and analysis of the survival estimation curves.

Synthetic patient data for non-squamous NSCLC was obtained from the Simulacrum database. Since there were no patients in the Simulacrum database that received pemetrexed as a maintenance therapy after cisplatin-pemetrexed induction therapy it was not possible to estimate comparative effectiveness. Therefore, we selected patients who received either a standard treatment or no treatment following initial cisplatin-pemetrexed therapy for a duration approximately equivalent to that in the PARAMOUNT trial to create a comparable synthetic control arm for a single-arm approach. We further evaluated adjusting the synthetic control arm data to reflect the control arm of the PARAMOUNT trial using a variety of methods, including; matching, re-weighting and regression-based adjustment.

There were 973 synthetic patients in Simulacrum and 939 patients in the trial who received cisplatin-pemetrexed induction therapy. We present the results of time-to-event analysis evaluating overall survival by amalgamating the synthetic data with reconstructed data from the PARAMOUNT trial.

The results presented demonstrate the potential effects of combining RWE with existing trial data and highlight the need to develop more sophisticated methods which may guide decision making in settings with scarce experimental data in the future.

Modelling the length of stay of COVID-19 patients using a multistate approach

ABSTRACT. The aim was to predict the length of stay of COVID-19 patients in different units of the hospital: general ward, intermediate care (IMC) and intensive care unit (ICU). A semi-Markov multistate model with the states “general ward”, “IMC”, “ICU”, “dead” and “recovered” was estimated. Patients could move repeatedly between the initial states “general ward”, “IMC” and “ICU”, while recovered and dead were seen as absorbing states. The patients’ sex, age, type of hospital admission and time period (wave) were considered as covariates. Transition hazards for the twelve different possible transitions were estimated using a parametric approach with different distributions for the transitions. The approach is opposed to a Cox model approach with regard to the prediction of deaths and sojourn times.

The study population consisted of a sample of all patients that received inpatient COVID-19 treatment at the university hospital. Estimated sojourn times in different states were obtained by simulating the model 10000 times and averaging over the simulations. With increasing age, patients stayed longer both in regular ward and ICU up to an age of about 80 years. For older patients, the sojourn times decreased due to increased mortality. Males had slightly longer sojourn times in all states (except IMC) than females. At the moment, the parametric model and the Cox model yielded similar results. Combined with a model for hospital admissions, this model can be used to estimate hospital occupancy. A possible limitation of the model is that general circumstances could hardly be modelled, the capacity limit of the hospital could e.g. influence the length of stay as well. A future extension of the model should be the inclusion of the SARS-CoV-2 mutation.

Landmarking: An R package for analysis using landmark models

ABSTRACT. The landmarking approach allows survival predictions to be updated dynamically as new measurements from an individual are recorded. It was first described by Van Houwelingen (2007). The idea is to set predefined time points, known as ‘landmark times’, and form a model at each landmark time using only the individuals in the risk set. Here I present ‘landmarking’ an R package which allows the user to perform analysis using the landmarking approach, offering benefits over the existing package ‘dynpred’ (Van Houwelingen and Putter, 2011). The main benefit of the ‘landmarking’ package is that it allows for mixed effect modelling of the repeat measurements, in addition to the option of using the last observation carried forward (LOCF). Mixed effects modelling has the following advantages over LOCF: it allows for missing data in the repeat measurements, it provides improved precision when there are infrequent measurements, and it reduces measurement error. Moreover, this package allows the user to model competing risks in the survival data using either cause-specific Cox regression or Fine-Gray regression.

References Ellie Paige, Jessica Barrett, David Stevens, Ruth H Keogh, Michael J Sweeting, Irwin Nazareth, Irene Petersen, and Angela M Wood. Landmark models for optimizing the use of repeated measurements of risk factors in electronic health records to predict future disease risk. American journal of epidemiology, 187(7):1530–1538, 2018.

Hans van Houwelingen and Hein Putter. Dynamic prediction in clinical survival analysis, 2011.

Hans C Van Houwelingen. Dynamic prediction by landmarking in event history analysis. Scandinavian Journal of Statistics, 34(1):70–85, 2007.

The risk of valvular heart disease after childhood cancer: contribution of dose-volume histogram parameters

ABSTRACT. Background: Childhood cancer survivors are at increased risk of developing Valvular Heart Disease (VHD). Despite the large size of voxelized dosimetric data currently provided by individual dose estimates in radiotherapy, most studies are limited to single variable approaches such as mean dose to the heart (MHD) to study its relationship with the risk of VHD. Therefore, we used the French Childhood Cancer Survivor Study (FCCSS) cohort including 7670 five-year survivors, to investigate the potential predictive capability of dose-volume histogram in the risk of VHD after childhood cancer. Methods: Individual dose volume histograms for the whole heart were obtained, and MHD was calculated, as well as the doses (Gy) delivered to the v% of the heart volume (Dv in Gy), and the volume percentages of heart receiving ≥d Gy (V≥d). Their role in the occurrence of VHD was investigated using the Cox proportional hazard regression model and penalized Cox regression (LASSO, Ridge, Elastic Net) when faced with multicollinearity. Models were compared with each other via classic information criteria and their efficacy was evaluated through performance indices. Results: 82 patients had developed a severe VHD (grade ≥ 3). Overall, patients treated with radiotherapy had an approximately 2-fold (CI95%: 1.16, 3.42) risk increase after adjustment for chemotherapy exposure. MDH was 23.7 Gy for patients who developed a VHD while almost 7 Gy for the entire cohort. The risk for of VHD increased 12-fold (CI95%: 7.02, 21.83) when MHD was over 20 Gy. The risk increased 30-fold (CI95%: 16.07, 58.8) as the volume having received ≥ 30 Gy increased. Exposure to chemotherapy seems to increase by almost 2-fold the risk of VHD in most of the alternative adjustments. Multivariable approaches seem to provide better predictions than the binomial model, but overall the model that studies irradiation dose-effect relationship adjusted on the MDV appears to be the closest to the true model according to its AIC and a combination of decorrelated volume indicators seems to provide the best prediction (C-index: 0.783). Conclusions: Findings may be useful for patients and doctors both before treatment and during long-term follow-up for VHD in survivors of childhood cancer.

The estimation of adjustment factors for expected mortality rates with application in comorbidity-adjusted lifetables

ABSTRACT. Published life tables can be used with Relative Survival (RS) techniques in the study of excess deaths in a disease-specific population. Using population-based mortality rates as expected rates circumvents obtaining large control samples as comparators when case-only data are available (e.g. from disease registries). For RS methods to be unbiased, expected mortality rates should represent rates the exposed population would experience if unexposed. However, published life tables are usually stratified by a small number of factors, such as age, sex and calendar year. Cardiovascular disease (CVD), the leading cause of mortality worldwide, has an increased burden of comorbidities (such as diabetes), compared to those without [1]. Hence RS methods applied to CVD require further adjustment of the published rates for comorbidity.

We extend on previously developed methods for estimating adjustment factors for expected mortality rates [2] to incorporate time-varying adjustment factors. We describe how analyses are undertaken using both a Poisson and Flexible Parametric Survival Model (FPSM) approach. Poisson models, using a generalized linear model framework, require the splitting of data by relevant timescales (age and calendar year). Attained age and year are included in the model as restricted cubic splines to allow flexibility in the hazard functions. Person years and expected mortality are both included as offsets. FPSMs have no need for data splitting, representing an appealing method for large datasets, incorporating smoothed expected rates and by constraining coefficients, we can estimate deviations from the smoothed expected rates in the cohort.

Using a cohort of 1.8 million patients from primary care data, a baseline Charlson Comorbidity Index (CCI) is derived from linked secondary care data, with patients tracked for mortality. CCI score-group adjustment of published rates is performed with initial baseline CCI score group analysis revealing that those with no comorbidity have lower absolute rates than the general population, while those with CCI scores greater than 0 have adjustment factors greater than one. Comorbidity adjustment factors vary by age, and calendar year. Extensions to the research will investigate the effect of accounting for lagged CCI measures and the inclusion of updated CCI scores over time.

Multiple Cox regression analysis to investigate a biomarker in IgA nephropathy disease: different approaches

ABSTRACT. IgA nephropathy (IgAN) is a common worldwide glomerulonephritis, and the use of biomarkers is an important vehicle in identifying subgroups of patients with IgAN. Glomerular C4d (C4dG) is a robust marker of IgAN patients with poor outcome prognosis. In a retrospective cohort study, our aim was to investigate the significance of arteriolar C4d (C4dA) in an IgAN group of 126 patients and to compare it with clinical and histological markers of disease progression, particularly with C4dG. The effect of C4dA on survival was evaluated using two approaches: first, predictors were selected with a stepwise forward procedure from a set of the previously identified variables (through simple Cox regression analysis) as being related to disease progression. The second approach was motivated by the reduced sample size and the number of events that limits the number of predictor variables to be included in the multivariate model. So, two models were first obtained: one selecting variables from the set of significant clinical ones and another choosing between the histological significant variables. Finally, a multivariable model was constructed with the histological and clinical variables of the two previous models. C4dA and C4dG were added separately to these models to evaluate and compare the impact of these two markers on survival. Both C4dA and C4dG showed to be associated with renal survival, but in the final model, C4dA remained significant, while C4dG did not (p=0.054) and the Akaike information criterion was slightly lower for the C4dA model. Harrell's C indexes were calculated for both final models and their values were validated using bootstrapping. Slightly higher values were obtained for both the C4dA model than for the C4dG model, but without reaching a statistical significance. Likelihood ratio tests were used to compare the crude models with the model having both C4dA and C4dG as predictors: the inclusion of C4dA affords a statistically significant improvement in the prediction of the survival model with C4dG alone (p=0.012), but the reverse is not true (p=0.280). These findings show that C4dA is a robust biomarker predicting the progression of kidney disease in IgAN and compared to be superior to C4dG.

Analysis of Results of Total Knee Replacement Failure Using Cox Proportional Hazard Model with Time-Dependent Covariates

ABSTRACT. Total knee replacement (TKR) surgery is the most common treatment of osteoarthritis of the knee. Good health, as well as other factors, influence successful and prompt recovery of patients that underwent this surgery. Physicians are interested in quantifying the effect of patient’s well-being on the failure of the TKR that might come during the follow-up. Therefore, their aim is to monitor health of patient before and after the TKR surgery. Two suitable tools for the assessment of patient’s health state have been proposed: The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) along with the Knee Society Scoring System (KSS). Both have the ability to assess the subjective health state of a patient, while the KSS aspires to evaluate the objective health state as well.

Our study includes data about 2295 patients, who have undergone primary TKR surgery between January 1st, 2006 and December 31st, 2019, from Orthopaedic Clinic of Martin University Hospital. The data were recorded in the Slovak Arthroplasty Register (SAR). WOMAC and KSS related questionnaires have been recorded for each patient in the study in four time points during planned examinations: before TKR surgery, three months, six months, and 12 months after TKR surgery. The aim of this registry-based study is to show the relationship of primary TKR failure on WOMAC and KSS scores, age, sex, and diagnosis by means of Cox proportional hazards model with time-dependent covariates stratified based on type of implants, such as cruciate retaining, posterior stabilized condylar constrained and hinge knee implant. Statistical analyses were carried out using R software environment.

Acknowledgment: The work was supported (partly) by the long-term strategic development financing of the Institute of Computer Science (RVO:67985807) and specific research of Masaryk University as support for student projects (MUNI/A/1615/2020).

References: Klein, J.P., Moeschberger, M.L., 2003: Survival Analysis: Techniques for Censored and Truncated Data. Springer-Verlag, New York. Nečas, L., Katina, S., Uhlárová J., 2013: Survival analysis of total hip and knee replacement in Slovakia 2003–2011. Acta Chirurgiae Orthopaedicae et Traumatologiae Cechoslovaca 80,1, 1–85.

Joint contribution of positive and total lymph nodes number in predicting overall survival of esophageal cancer

ABSTRACT. The number of positive lymph nodes (PLN) serves as the current criteria in the TNM staging of esophageal cancer, but whether to involve the prognostic influence of total lymph nodes resected (TLN) is still controversial. This study compared the AIC and Harrell’s C-index of various parametric models using three different modeling strategies: (1) categorization of PLN and TLN; (2) penalized natural spline of lymph node ratio (LNR, the ratio of PLN to TLN); (3) penalized natural splines of PLN and TLN. The interaction between covariates and the hypothesis of proportionality were tested. According to the original cohort of population-based data in the USA and external validation of hospital-based data in China, the better modeling strategy to analyze the joint contribution of PLN and TLN was to construct a proportional hazard model with penalized natural splines of both covariates without interaction. A more aggressive adjuvant therapy could be proposed to patients with low TLN in the context of a randomized clinical trial.

Modelling the duration of recurrent events from interval-censored data

ABSTRACT. Background Recurrent event data arise frequently in longitudinal studies when subjects are monitored discretely. Symptomatic episodes that occur in survivors of epidemics such as Ebola are a good illustration of the recurrent events termed "recurrent episode”. In the literature, several statistical methods have been proposed to analyze the time to first event or the risk of recurrent events but the analysis of event durations especially from interval censored data remain scarce. Objective We aim to present a new approach to estimate the event-duration of recurrent events from interval censored data and assess predictors of this duration. Methods We divided the patient’s visit history into segments composed of two consecutive visit dates. Four situations were observed: a) the symptom was present all along the time segment; b) the symptom was absent; c) the symptom occurred during the time segment; and d) the symptom stopped during the time segment. The missing start/end dates were imputed either deterministically (mid-point) or stochastically using either uniform on the interval or according to maximum likelihood (Turnbull). In the latter case, we created five imputed datasets. A simulation was then performed to assess the properties of the estimators. We calculated all duration and their 95% CI using Rubin's rules incorporating within and between imputation variability. The predictive value of several factors on the duration of symptoms was then estimated by mixed-effect regression. We have applied this method to a prospective cohort study who followed 802 Ebola virus disease survivors over 48 months in Guinea. Patients were assessed at inclusion and every 6 months up to 48 months. Clinical symptoms were recorded at each visits.

Results Our simulation study demonstrates that this approach is a good strategy in estimating the duration of events and assess impact of predictors on it. We reconstructed our database and estimate the duration of symptoms day by day to get the number of days each symptom was present or absent for each patient. Conclusion We conclude that the method could be a promising and useful tool to be used for estimating the event-duration of recurrent events from interval censored data in longitudinal study.

Relation between women empowerment and birth interval: A survival analysis approach

ABSTRACT. Birth intervals and birth spacing patterns provide important information about women's reproductive behavior and dynamics of the fertility process. Birth intervals can be considered as time-to-event or survival data, where the events are the childbirth i.e first birth, second birth, and so on. Women empowerment may have important influence on birth intervals and birth spacing patterns. This study attempts to investigate the effects of women empowerment on the birth interval in Bangladesh. Four indicators have been considered to measure four dimensions of women empowerment: the level of education, participation in household (HH) decisions, freedom in movements, and employment status. Several socioeconomic and demographic variables are used as covariates in this study. The Cox proportional hazard model is used for analyzing the Bangladesh Demographic and Health Survey (BDHS, 2014) data. From Kaplan-Meier and Log-rank test it is observed that women empowerment is one of the key factors for increasing the birth interval, which subsequently contribute to the improvement of the health status and the level of development. Socioeconomic and demographic variables are affecting the birth intervals. Increases in mother's age at birth make the birth interval shorter. On the contrary women from rich socio-economic groups have longer birth interval. Birth order play a vital role in determination of birth interval. From our model we concluded that higher birth order makes the length of birth interval shorter. It is also observed that birth interval is long if previous birth is male. Urbanization can make the birth interval longer. Efforts should be made to increase the women empowerment, particularly girls' education, participation in HH decisions, freedom in movement and employment of women in economic activity.

Impact of model choice when studying the relationship between blood pressure variability and risk of stroke

ABSTRACT. Long-term blood pressure variability (BPV) is an increasingly recognized vascular risk factor. However, quantifying its impact on the risk is challenging. Most previous epidemiological studies used a Cox model with BPV derived as a fixed-in-time covariate conditioning on the future. The objective was to compare the results of commonly used models with time-dependent Cox model and joint shared random effect models, using data from a large randomized clinical trial. We used data from a secondary stroke prevention trial, PROGRESS, which included 6105 subjects. A total of 727 patients experienced a first stroke recurrence. The mean follow-up was 4.3 years and the median number of blood pressure (BP) measurements per patient was 12. Hazard ratio (HR) of BPV were estimated from six models. Commonly used Models 1 and 2 first derived standard deviation of BP (SDBP) measures observed over the whole follow-up (including or excluding BP measures observed after the first stroke recurrence, respectively), and then included SDBP as a fixed-in-time covariate in a Cox model estimated on the whole follow-up. Model 3 derived SDBP using data from the first year of follow-up, and used it as the baseline value in a Cox model estimated on the remaining follow-up. In Model 4, SDBP was included as a time-dependent covariate in a Cox model. Models 5 and 6 were shared random-effect models. In Model 5, the longitudinal marker was time-dependent SDBP, and its current true value was the main covariate in the survival part1. In Model 6, the longitudinal marker was time-dependent BP and was modeled with a subject-specific error variance which was the main covariate in the survival part2. While Models 1-3 produced opposite results (for a 5mmHg increase in BPV, HR=0.75, 95% confidence interval (CI) [0.68, 0.82], HR=0.99 [0.91, 1.08], HR=1.19 [1.10, 1.30], respectively), Models 4-6 resulted in a similar moderate positive association (e.g. HR=1.08 [0.99-1.17] for model 5). The modelling of BP variability strongly affects its estimated effect on the risk of stroke. Further methodological developments are needed to account for the dynamics of both BP and BPV over time, to clarify the specific role of BPV.

A jack-knifed version of the log-rank test in small samples: when bias meets variance to increase test power

ABSTRACT. The problem of comparing two time-to-event survival curves is common in biostatistics and is often solved by the log-rank test's application. However, it could be complicated when two groups of individuals, which the survival curves are based on, are generally of small size, pulling the log-rank test power down. There are some workarounds by which the small sample "curse" could be overcome but are usually of low statistical power, too.

This preliminary study addresses the small sample size issue and related survival curves compared by various jack-knifing samples. We assume the paradigm that jack-knifed tuples are still valid for statistical inference purposes since, compared to bootstrapped or else resampled samples, it does not contain any new piece of data. Supposing we build all possible jack-knifed samples by leaving each individual out, by leaving all combinations of each two individuals out, every three individuals out, etc., we always get a superset as a union of all the jack-knifed samples. The higher the jack-knife degree is, i. e. the higher level of individuals' left-out combinations, the larger the union of jack-knifed samples (and lower estimates' variance). However, the higher the degree of observations' left-out combinations, the more distorted the original data could be (and higher estimates' bias is). The bias-variance trade-off originated from the jack-knifing's growing degree, i. e. the level of individuals' left-out combinations, was measured by the simple summation of the variance and bias deviations compared to the original data. The first type error rate of such jack-knifed log-rank test was also considered.

Using multiple simulations and applying real-world COVID-19 data, we researched optimal degrees of the left-out combinations of individuals to minimize the variance and bias deviations' summations. Interesting consequences also resulted in the first type error rate of the jack-knifed log-rank test and its statistical power.

The jack-knife version of the log-rank test seems to be an alternative for small sample-based survival curves comparison. Analytical derivations are required to investigate jack-knifing's first type error rate and start a new R package development, implementing the proposed methods.

17:30-19:00 Session IS6: INVITED : Causal inference in continuous time for dense longitudinal data from wearable devices
TMLE for Causal Effects based on continuous time longitudinal data structures

ABSTRACT. In many applications one is concerned with estimation of the causal impact of a multiple time point intervention on a final outcome based on observing an i.i.d. sample of longitudinal data structures. We consider the case that subjects are monitored at a random finite set of time-points on a continuous time-scale, and at these monitoring times treatment actions and or time-dependent covariates and outcomes are collected. Current methods based on sequential regression break down under this setting. We develop a new targeted maximum likelihood estimator that still avoids estimation of the conditional densities for outcome and covariates of likelihood, but instead estimates a conditional mean function. The TMLE uses maximum likelihood based estimation of the monitoring, treatment, censoring, and survival process intensities. We also consider a TMLE that involves estimation of all the conditional densities, including the time-dependent outcome and covariate mechanisms. We develop highly adaptive lasso estimators of the nuisance functions and establish asymptotic efficiency of the TMLE under minimal conditions. In particular, we demonstrate these new TMLEs for estimation of treatment specific survival functions for single time-point interventions on competing survival times. Advantages relative to first discretizing the time scale and using currently available corresponding TMLE are discussed. Various applications are presented and simulation results are used to demonstrate the theoretical properties.

Assessing Time-Varying Causal Effect Moderation in the Presence of Cluster-Level Treatment Effect Heterogeneity

ABSTRACT. Micro-randomized trial (MRT) is a sequential randomized experimental design to empirically evaluate the effectiveness of mobile health (mHealth) intervention components that may be delivered at hundreds or thousands of decision points. The MRT context has motivated a new class of causal estimands, termed "causal excursion effects", for which inference can be made by a weighted, centered least squares approach (Boruvka et al., 2017). Existing methods assume between-subject independence and non-interference. Deviations from these assumptions often occur which, if unaccounted for, may result in bias and overconfident variance estimates. In this talk, causal excursion effects are considered under potential cluster-level correlation and interference and when the treatment effect of interest depends on cluster-level moderators. The utility of our proposed methods is shown by analyzing data from a multi-institution cohort of first year medical residents in the United States. The approach paves the way for construction of mHealth interventions that account for observed social network information.

Missing data imputation for non-stationary time series in mHealth data

ABSTRACT. Missing data is a ubiquitous problem in studies of psychiatry, epidemiology, social, political science and many other biomedical and social science disciplines, when large number of variables are collected (especially with repeated measurements over time), and thus complete data are rarely available. As mobile devices (e.g., cell phones and fitness activity tracker bracelets) are being more widely adopted, a new way of collecting personal health data densely or even in realtime using mobile devices has realized, and revolutionized data collection methods for personalized health outcomes. Multivariate time series of outcomes, exposures, and covariates data evoke new challenges in handling missing data in order to get unbiased estimate of causal quantities of interest and call for more efficient data imputation approaches.

We conducted a comprehensive comparison of the performance of complete-case analysis with most commonly used imputation methods, including mean imputation, last-observation-carriedforward imputation, multiple imputation, multiple imputation with significantly longer history information, as well as state-space model in estimating causal quantities in mHealth data. Validity of most imputation methods rely on the stationarity of the time series, in the sense that both variance and treatment effect do not change over time, failing to reflect that the intervention effects in psychiatry may change with severity of disease as well as the fluctuation of the patient’s mood. We propose a new imputation method derived from state space model, that accommodates the non-stationarity of time series when treatment effect and variance may change over time, to learn causal effects of interventions. We consider possible missing data in the outcome variable, exposure variable, or both under different missing rates and missing data mechanisms.

17:30-19:00 Session OC5A: Bayesian clinical trial analysis
Bayesian joint modeling of a bivariate toxicity for dose-regimens in early phase oncology

ABSTRACT. Context: Most phase I trials in oncology aim to find the maximum tolerated dose (MTD) based on the occurrence of dose-limiting toxicities (DLTs). A DLT is a binary toxicity defined from multiple toxicity types and grades. Varying the dose-regimen, defined as the combination of the dose and the administration schedule, may reduce the risk of some types of DLT but may have a different effect on other toxicities. In a motivating trial, dose-regimens defined with intra-patient dose-escalation were administered to the patients to reduce the risk of cytokine release syndrome (CRS) while the effect on other toxicities (DLTo) was unclear (NCT03594955).

Objective: The aim of the work was to propose a Bayesian method to evaluate the maximum tolerated dose-regimen (MTD-regimen) by modeling the DLT as a bivariate binary outcome to differentiate CRS from DLTo. This method was developed to be applied at the end of the trial, once all data have been collected. Methods: A Bayesian dose-regimen assessment method was used to model the CRS with the entire dose-regimen by incorporating pharmacokinetic and pharmacodynamic (PK/PD) modeling as proposed by Gerard et al (2020). Then, a Bayesian cumulative model was developed to model the DLTo with the dose-regimen, as no pharmacodynamic assumption was known. Finally, we considered three approaches to model the joint distribution of CRS and DLTo: assuming independence between toxicities, adding a correlation parameter via copula modeling and via conditional modeling.

Results: Through an extensive simulation study, we observed that our joint approaches improved the proportions of selecting the true MTD-regimen in most scenarios compared to the recommendation of the dose-allocation method implemented (modified continual reassessment method), with a difference from 6.9% to 11.5%. Our joint approaches could also predict the DLT probabilities of new dose-regimens that were not tested in the study and that could be investigated in further stages of the trial.

Conclusion: We proposed a joint modeling approach to evaluate the effect of dose-regimens on two types of toxicity where one type of toxicity could be related to a PD biomarker, while no such assumption could be raised for the other one.

Interim analysis of a clinical trial using the predictive probability of success based on a surrogate endpoint

ABSTRACT. The Predictive Probability of Success (PPoS) of a future clinical trial is a key quantitative tool for decision-making in drug development. It is generally derived from prior knowledge and evidence on the primary endpoint collected from previous clinical trials. The methodology to calculate the PPoS of a future trial based on historical data on surrogate endpoints was recently proposed. However, because pharmaceutical industries are willing to speed up drug developments and to make decisions to continue or to stop clinical trials as early as possible, interim analyses based on surrogate endpoints are raising interest in clinical trial designs. In this context, we extended the recently proposed methodology to study designs where an interim analysis based on a surrogate endpoint is set. An informative prior, called surrogate prior, is derived from (1) the information on the surrogate endpoint observed at the interim analysis and (2) the joint distribution of the surrogate and primary endpoints, estimated using a meta-analytic approach on past clinical trials. If available, the information on the primary endpoint at the interim analysis could be combined with the surrogate prior to generate the PPoS. Then, at the interim analysis, the futility rule is based on a pre-defined level of PPoS. This methodology was investigated in a phase III oncology study in colorectal cancer where Overall Survival is the primary endpoint and Progression-Free Survival data might be used at an interim futility analysis. We present the operating characteristics of the design in different settings, considering the amount of available information at the time of the interim analysis and potential prior data conflicts between the surrogate prior and the available evidence on the primary endpoint.

Trials of Vaccine Efficacy for COVID-19: Inferential and Practical Issues

ABSTRACT. Several trials of vaccine efficacy for COVID-19 have now reported and attracted much interest in the media. Analysis of these trials raises a number of inferential issues. These will be discussed using the protocols, results and publications of these trial but paying particular attention to those by Moderna, Pfizer/BioNTech and AstraZeneca/Oxford. For example, a number of claims regarding efficacy according to dose, dose interval and virus strain have been made on uncontrolled comparisons. It can be shown that if concurrent control is respected and strata and trials are fitted and the comparisons made ‘honestly’, the uncertainty is much greater than might naively be supposed. Other inferential issues are to what extent statistical information is carried by cases only and what practical difference, if any, a Bayesian approach (such as was adopted by Pfizer/BioNTech) makes to interpreting effects and what a useful scale is for reporting results. Practical matters include whether a 2:1 randomisation, such as was employed by AstraZeneca/Oxford is a good idea and also whether and under what conditions it is logical from a public health point of view to stretch dosing intervals in order to allow more subjects to receive a first dose given that some infections may occur between first and second dose. Various graphical approaches to understanding these issues will be presented.

Incorporating multiple parameters from historical controls using the meta-analytic-predictive (MAP) prior

ABSTRACT. Background The meta-analytic-predictive (MAP) prior was proposed to incorporate information from comparable historical controls in the design and analysis of a new trial by assuming exchangeability of the new and historical trials. Analysis of covariance (ANCOVA) is often used to analyze data of clinical trials with a pretest-posttest design. In ANCOVA, both the intercept and the baseline effect influence the treatment effect estimate. However, previous implementations of the MAP mainly focused on the between-study variation in a single parameter, often the intercept or the mean outcome. Objective To extend the MAP prior to account for the between-study variation in multiple parameters, and to illustrate this approach in clinical trials with an ANCOVA model. Method The MAP prior was extended to allow for the between-study variation in the intercept and the baseline effect in ANCOVA, as well as the correlation between these parameters. To quantify the amount of information, prior effective sample sizes (ESS) were calculated using the variance ratio method. Different priors for the between-study variation were compared in terms of the prior ESS and the estimated treatment effect. The method was illustrated using data of six clinical trials conducted by the UC San Diego Alzheimer’s Disease Cooperative Study (ADCS). Results The MAP prior based on the historical controls yielded approximately normal informative priors for the intercept and the baseline effect with prior ESS of 17 and 43, respectively. The MAP only slightly improved the precision of the estimated treatment effect in the ANCOVA model (posterior standard deviations reduced by 1.5%), but larger improvements were observed for the estimated intercept and baseline effect. The results were robust to different priors for the between-study variation. Conclusions The MAP can be extended to an ANCOVA model where more than one model parameter may vary across trials. Estimation of the between-study variation in multiple model parameters seems feasible even with a limited number of historical trials. However, the gains of using historical data in terms of required sample size and the precision of the estimated treatment effect may be restricted.

Bayesian hierarchical modeling for MedDRA coded adverse events in RCTs

ABSTRACT. Patients participating in randomized controlled trials (RCTs) often report a wide range of different adverse events (AEs) during trial. MedDRA is a hierarchical standardization terminology to structure reporting of AEs. The lowest level (i.e. Preferred Terms (PT)) is a single type of medical events and higher levels aggregate specific lower levels (i.e. Higher Level Terms (HLT), Higher General Level Terms (HLGT), System Organ Class (SOC)). MedDRA has a multiaxial structure where a single lower level could be aggregated in multiple higher levels.

Most of these AEs are uncommon and may occur once or twice in a patient. In general, power calculations of RCTs are not focused on AEs and we observe very low incidence rates. As a result, there is limited statistical power to detect rare AEs, leading to a high rate of false negatives. Therefore AE data of higher levels of the MedDRA structure are reported; these have higher incidence, but are less informative since contain a spectrum of AEs.

We propose hierarchical Bayesian models for identifying MedDRA coded AE relative risks (RRs) and odds ratios (ORs). Our model allows estimation of ORs and RRs at all levels and can deal with the multiaxial structure of AEs. Following other authors we started out by specifying a hierarchical binomial model. To account for multiple occurrences of specific AEs we specified a hierarchical Poisson model. To incorporate the hierarchical and multiaxial structure, the parameters from the Poisson or binomial distributions were sampled from weighted normal distributions of the HLT-level. These parameters were then sampled from distributions from higher levels up to the SOC-level. A full Bayesian model was specified using noninformative prior distributions of the means and variances of the AE-logORs and -logRRs. Models were implemented in the Stan-language and run through the rstan-package in R. When PTs occurred in only few patients, our models did not converge. Such PTs were aggregated and modeled using binomial/Poisson distributions at the HLT-level with suitable adjustment of the mean logOR/logRR.

We illustrate our model with AE-data from a large RCT (n=2658) and we compare results with other methods for analyzing AEs.

17:30-19:00 Session OC5B: Prediction by Machine learning
Individual risk prediction: comparing Random Forests with Cox proportional-hazards model by a simulation study

ABSTRACT. With big data becoming more widely available in healthcare, Machine Learning algorithms such as Random Forest (RF) that ignores time-to-event information and its extension Random Survival Forest (RSF) are used for individual risk prediction as an alternative approach to Cox proportional-hazards (Cox-PH) regression model. Our objective was to systematically evaluate and compare RF and Cox-PH models’ predictive performance. Cox-PH, RSF with two split criteria [log-rank (RSF-LR), log-rank score (RSF-LRS)] and RF were applied and evaluated through an extensive simulation study based on Athens Multicenter AIDS cohort study (AMACS) data. Several simulation scenarios were set up assuming different associations between the predictors and the outcome [linear (L), linear and interactions (LI), non-linear (NL), non-linear and interactions (NLI)], different sample sizes (500, 1000, 5000), censoring rates (50%, 75%, 93%), hazard functions (increasing, decreasing, constant) and number of predictors (7 or 15), leading to 216 scenarios in total. To evaluate the performance of the methods, equal-sized training and testing datasets were independently generated and time-dependent area under curve (AUC), C-index and integrated Brier score (IBS) were calculated. To reduce the variability of the model performance estimates, as a sensitivity analysis, testing datasets of 10000 sample size were generated. In all scenarios, RF had by far the worst performance among all models considered. In scenarios with low number of events (NE<250) as well as under linearity assumption, Cox-PH outperformed RSF models by 3% on average. Both methods performed similarly in LI scenarios when NE increased above 500. In NL and NLI scenarios, RSF-LR performed better than Cox-PH regression when NE≥250, resulting in approximately up to 2% increase in performance improvement. No notable differences in models’ performance were observed among different hazard functions. Sensitivity analysis confirmed our results lowering variability especially in scenarios with small training datasets (≤1000). When applied to real data, models that incorporated survival time performed better. Although RSF models are a promising alternative to conventional regression methods as data complexity increases, they require much larger datasets for training. In time-to-event analysis, it is important to use RF models that consider the survival time.

Predicting patient mortality from large sets of diagnosis codes using logistic regression and machine learning

ABSTRACT. Background: Electronic healthcare records are increasing in volume and scope, presenting growing opportunities to use large sets of predictors and model their relationships with more flexible methods. Machine learning approaches have been used to model interactions between many diagnosis codes in large datasets of electronic healthcare records. No previous studies have directly compared regression and machine learning approaches for predicting patient outcomes from large sets of individual International Classification of Diseases (ICD) codes.

Objective: To compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records.

Study Design and Setting: We analysed national hospital records and official death records for patients with myocardial infarction (n=200,119), hip fracture (n=169,646), or colorectal cancer surgery (n=56,515) in England in 2015-17. One-year mortality was predicted from patient age, sex, and socioeconomic status, and 202 to 257 International Classification of Diseases 10th Revision codes recorded in the preceding year or not (binary predictors). Performance measures included the c-statistic, scaled Brier score, and several measures of calibration.

Results: One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Optimism-adjusted c-statistics for the logistic regression models were 0.884 (95% CI 0.882, 0.886), 0.798 (0.796, 0.800), and 0.811 (0.805, 0.817). The equivalent c-statistics for the boosted tree models were 0.891 (95% CI 0.889, 0.892), 0.804 (0.802, 0.806), and 0.803 (0.797, 0.809). Model performance was also similar when measured using scaled Brier scores. All models were well calibrated overall.

Conclusion: In large datasets of electronic healthcare records, logistic regression and boosted tree models of numerous diagnosis codes predicted patient mortality comparably. Our results suggest that there is little or no advantage to using machine learning rather than regression approaches in this particular study context.

Predicting individual life years lost due to cancer using pseudo-observations with random forest, in the absence of cause of death information

ABSTRACT. Competing risk analyses are essential for making inferences about a specific disease and relevant methods that account for more than one event should be applied. Application to population-based registry data involves additional methodological challenges due to the absence of reliable information on the cause of death. Excess hazard methodology is widely applied to such settings allowing cause-specific inference in terms of net survival, crude probability of death and life-years lost (LYL). LYL are of particular interest due to their easy interpretation and communication to non-statistical audience. Here, we show how to estimate the LYL due to a specific cause using the pseudo-observation approach combined with excess hazard methodology. Jack-knife pseudo-observations for LYL are computed for each individual regardless of their initial censoring status at one time-point. The complete set of (continuous) pseudo-observations can be subsequently modelled with conventional generalized models or with a variety of machine learning tools ranging from random forests to support vector machines and neural networks. In this study, we illustrate this method (using pseudo-observations with random forest) on English lung cancer data with the ultimate aim to predict the individual LYL due to cancer based on numerous variables, including comorbidities and clinical characteristics.

A comprehensive comparison of approaches for the calibration of probability machines

ABSTRACT. Statistical prediction models have gained popularity in applied research. One challenge is the transfer of the prediction model to a different population which may be structurally different from the model for which it has been developed. An adaptation to the new population can be achieved by calibrating the model to the characteristics of the target population, for which numerous calibration techniques exist. In view of this diversity, we performed a systematic evaluation of various popular calibration approaches used by the statistical and the machine learning communities. Focusing on models for two-class probability estimation, we provide a review of the existing literature and present the results of a comprehensive analysis using both simulated and real data. The calibration approaches are compared with respect to their empirical properties and relationships, their ability to generalize precise probability estimates to external populations and their availability in terms of easy-to-use software implementations. Calibration methods that estimated one or two slope parameters in addition to an intercept consistently showed the best results in the simulation studies. Calibration on logit transformed probability estimates, i.e., the linear predictor or two log transformed probability estimates generally outperformed calibration methods on non-transformed estimates. In case of structural differences between training and validation data, re-estimation of the entire prediction model should be outweighted against sample size of the validation data. We recommend regression-based approaches using transformed probability estimates where at least one slope is estimated in addition to an intercept for updating probability estimates in validation studies.

Methodological conduct of clinical prediction models using machine learning methods in oncology needs to be improved

ABSTRACT. Context Clinical prediction models are widely used in oncology for medical decision making. Using modern modelling methods, such as machine learning (ML), to improve prediction is a rapidly growing area of research. ML is often portrayed as offering many advantages, such as flexible modelling and ability to analyse ‘big’, non-linear and high dimensional data. These promises are yet to be realised and there is concern that prediction models developed using ML use poor and inefficient methodology and are at high risk of bias.

Objective To assess methodological conduct and the risk of bias of studies that develop clinical prediction models using ML (as defined by primary study authors) in the field of oncology.

Methods We conducted a systematic review of prognostic clinical prediction models developed using ML, published during 2019. We extracted data on study design, sample size, data pre-processing, hyperparameter tuning and other analysis methods, and items for risk of bias assessment. Primary outcome was risk of bias assessed using the Prediction model Risk Of Bias ASsessment Tool (PROBAST) per model that was developed and validated.

Results We identified 2,922 publications and excluded 2,860 due to study design and publication type. We reviewed full-text of 62 publications; 48 development-only studies and 14 development with validation studies. 152 models were developed overall, with a median of 2 models (range:1 to 6) developed per publication. Most prevalent ML models were classification trees (18%), logistic regression (18%), random forest (15%) and neural networks (12%).

Development of 84% of models and validation of 51% of developed models were found to be high risk of bias. Bias in the analysis domain was the largest contributor to the overall high risk of bias during model development. Sample size was justified in 5 publications, a median of 16 predictors (range:4-33788), 647 participants (range:20-582398) and 195 events (range:7-45797) were used to develop the models. 45% of studies used split sample to internally validate their models.

Conclusion Most prediction models developed using ML in oncology were at high risk of bias, largely due to analysis methods. Urgent methodological guidance is needed to improve the quality of these models.

17:30-19:00 Session OC5C: Screening and Diagnostic studies
The Natural History of Invasive Breast Cancers Detected in the Scandinavian Mammography Screening Programs: A Cohort Study

ABSTRACT. Background: The prevailing theory is that a large proportion of mammography-detectable tumors have very long lead-times and that tumors never regress. Two staggered cohort studies have previously suggested that almost all incidence increase when screening with mammography is due to detection of small tumors which natural fate is to regress before they become clinical disease.(1,2) Here we study alternative methods to estimate lead-time and cancer regression. Materials and methods: Prospective cohort study of 375,064 Swedish, 127,064 Norwegian and 149,266 Danish women invited to a first time mammography in 1986-89, 1996-97 and 2008-9, respectively. The proportions of tumors with lead-time over 1 year (women under age 50) and over 2 years (aged 50-69) are estimated. We study if tumors accumulate in the breast in the absence of screening. Regression is also studied by comparing the prevalence increase to incidence increases in succeeding screening rounds. Results: RR after start of annual screening was 1.33 (95% CI:1.21-1.47;P<0.0001) for women aged 40-49 years in Sweden and with no prevalence peak. Thus, maximum lead-time is 1 year for these tumors. About 20% (95% CI:0.11-0.29;P<0.0001) and 29% (95% CI:0.26-0.33;P<0.0001) of tumors in the prevalence screening in Norway and Denmark of women aged 50-69 years had lead-time over 2 years. There is no evidence that slow-growing tumors accumulate in the absence of screening mammography under age 65 when comparing age-specific prevalence peaks to succeeding screening rates. A staggered cohort analysis of the introduction of public screening in Denmark yielded RR=1.10 (95% CI:1.05-1.17;P=0.0004) also suggesting that tumors do not accumulate in the absence of screening. Conclusion: There is no evidence of any large reservoir of breast tumors with long lead-times. The incidence increase when screening is too large to be solely explained by early diagnosis – many mammography detected tumors must regress. Adjustment for long lead-time is not justified when calculating overdiagnosis, and lead-time bias is an exaggerated problem in cancer epidemiology. References 1.Zahl,Mæhlen,Welch. The natural history of invasive breast cancers detected by screening mammography. Arch Intern Med 2008;168:2311-6. 2.Zahl,Gøtzsche,Mæhlen. Natural history of breast cancers detected in the Swedish mammography screening program; a cohort study. Lancet Oncology 2011;12:1118-24.

Meta-analysis of dichotomous and polytomous diagnostic tests without a gold standard

ABSTRACT. Standard methods for the meta-analysis of diagnostic tests without a gold standard are limited to the analysis of dichotomous tests. Multivariate probit models are used to analyze correlated binary data, and can be extended to multivariate ordered probit models to model polytomous (i.e. non-binary) data. Within the context of an imperfect gold standard, they have previously been used for the analysis of dichotomous and polytomous diagnostic tests in a single study and for the meta-analysis of dichotomous tests. We developed a hierarchical, semi-ordered latent class multivariate probit model for the meta-analysis of polytomous and dichotomous diagnostic tests without a gold standard. Our model enables the synthesis of data from studies reporting accuracy at varying thresholds, and can accommodate a hierarchical partial pooling model on the conditional within-study correlations, which allow us to obtain summary estimates of joint test accuracy. Dichotomous tests use binary probit likelihoods and polytomous tests use ordered probit likelihoods. We fitted the models using Stan, which uses a state-of-the-art Hamiltonian Monte Carlo algorithm. In the first case study, we applied the models to a dataset in which studies evaluated the accuracy of tests, and combinations of tests, for deep vein thrombosis. We also compared our results to the original study, which assumed a perfect reference test. We found that modelling the polytomous test (the Wells score) as dichotomous and conducting stratified analyses at each threshold resulted in substantial bias in the reference test (ultrasonography) and the other index test under evaluation (the D-Dimer). Furthermore, we found substantial imperfect standard bias (over 10% difference in the joint test accuracy of the Wells and D-Dimer) in the original analysis, which assumed a perfect gold standard. In the second case study, we applied the methods to a dataset for clinical dementia, where studies reported accuracy from one to 16 distinct thresholds, and compared our results to stratified analyses. Our results suggest that the original analysis underestimated the sensitivity of the Mini-Mental State Examination (MMSE) by over 10% and overestimated the specificity. We discuss limitations and possible ways to improve scalability by making use of recently proposed algorithms.

Comparison of methods for the linear combination of biomarkers under Youden Index optimisation criterion

ABSTRACT. In clinical practice, it is common to have information on multiple biomarkers for disease diagnosis. Combining them all into a single marker is a common and widespread practice and often provides a better diagnostic yield.The formulation of algorithms for the estimation of binary classificatory models that maximise the AUC has been a widely explored line of research. The Youden index is a statistical metric also widely and successfully used in several clinical studies and serves as a summary for diagnosis. However, unlike the AUC, the study and exploration of methods that optimise the Youden index has not received sufficient attention in the literature. The aim of our study was to propose a new step by step algorithm to combine continuous biomarkers that maximise the Youden index, and additionally, to explore, evaluate and compare with other methods. Three methods are based on Pepe and Thompson's empirical search [1] (our proposed stepwise approach, Yin and Tian's stepwise approach [2] (SWD) and the Min-max approach [3] (MM)) and three numerical search methods based on derivatives (the logistic regression, a parametric approach under multivariate normality and a non-parametric kernel smoothing approach (KS)). To compare the performance of these methods, a comprehensive simulation study was performed and also analysed on two real data sets (Duchenne Muscular Dystrophy Dataset and Prostate Cancer Dataset). The simulated data analysed cover a wide range of scenarios regarding the probability distribution of biomarkers (normal, non-normal), the discrimination ability between biomarkers (similar or different) and the correlation between them, considering from small to large sample sizes. The results obtained show that our proposed stepwise approach outperforms all other algorithms in most scenarios. In general, it is followed by KS and SWD. The MM algorithm is the worst performer in most scenarios, except in normal biomarker scenarios, with the same means and different covariance matrix for the diseased and non-diseased population, where it outperforms other algorithms.

Single and multiple imputation combined with missing indicators in clinical prediction models: a simulation study

ABSTRACT. Background & Aims Clinical prediction models (CPMs) allow the communication of risk between patients and caregivers, based on current patient and clinical characteristics. The development and validation of CPMs is a complex process, especially in the presence of missing data, which is especially common in clinical data. Multiple imputation is often heralded as the gold standard for imputing missing data in both causal inference work and prediction modelling studies, but due to practical limitations can be difficult to implement in clinical practice, where CPMs are applied. Key limitations include the requirement for access to the development data and computational power at prediction time, which are often not available. We therefore aim to consider whether regression imputation could offer a promising alternative to multiple imputation in the context of prediction, since it is a deterministic process requiring only access to the imputation model to impute missing data at the point of care. Both MI and RI both rely on the assumption that data are missing at random, which is often a dubious assumption in clinical research data, particularly within electronic health records where data collection is opportunistic. We therefore also consider whether the inclusion of missing indicators in combination with MI and RI can improve the predictive performance of models developed under informative missingness. We setup a simulation study to allow us to explore these ideas in more detail.

Results We assessed ideal and pragmatic performance (Wood et al., 2015) of models developed and validated using both MI and RI, with and without missing indicators included as predictors. We found that under MAR and MNAR structures, the inclusion of a missing indicator can indeed improve the calibration and discrimination of models developed and applied in the presence of missing data. We also found that regression imputation could offer a practical alternative to MI where it is not possible to apply MI at the point of prediction.

Conclusion We advocate the careful use of regression imputation and missing indicators in the development and validation of clinical prediction models, where the missing data are assumed to be informative with respect to patient condition.

Diagnosing Latent Class Analysis for Analyzing Diagnostic Tests in the Absence of a Gold Standard

ABSTRACT. Diagnostic tests play a key role in disease control. The absence of a gold standard however hampers estimation of disease prevalence and misclassification errors (ME) of available imperfect tests. With a set of such test results jointly available for a sample of patients, latent class analysis (LCA) allows for correct estimates of these parameters under restrictions. For unknown (latent) true disease status, LCA assumes test ME are independent within each class and constant across subpopulations. These assumptions are violated when serious comorbidity affects the targeted disease risk and/or ME rates. We examine implications of envisaged model violations on the working likelihood estimators and through simulation focusing on population prevalence, sensitivity and specificity as target estimands. We derive if and when results are still reliable and consider how a simple and more comprehensive adapted conditional version of LCA may alleviate problems. We support our results with finite sample simulations mimicking a setting of passive case finding of presumptive pulmonary tuberculosis (PTB) patients with or without HIV comorbidity, before applying the methods on a case study. Based on realistic sensitivities and specificities of five commonly used diagnostic tests for PTB (any TB symptom, digital chest X-ray, CRP, Xpert MTB/RIF & culture), we simulated test results in samples of various sizes with different PTB prevalence across HIV subgroups. We thus generated independent test results within the true PTB latent classes conditional on HIV. For different numbers of tests (5 versus 3), we performed Bayesian LCA for working models with or without constant PTB prevalence and ME across the HIV subgroups. The working model with different PTB prevalence but constant ME across HIV subgroups generated substantially more biased PTB prevalence and ME than when ignoring comorbidity all together. With three tests, all models produced poor coverage. Reassuringly, models with five tests allowing for different ME yielded largely unbiased total population prevalence with acceptable coverage. Standard LCA is not robust to model violation through heterogonous ME across subpopulations, especially with fewer tests. Well-chosen covariate-specific adaptations can alleviate the problem.

This project is part of the EDCTP2 programme supported by the European Union (RIA2018D-2489 TB TRIAGE+)

17:30-19:00 Session OC5D: Bayesian Joint models for longitudinal data and time-to-event
Bayesian multilevel nonlinear joint model to characterize the variability in the response to immunotherapy

ABSTRACT. Background: The association between survival and tumor dynamics, assessed by the Sum of the Longest Diameters (SLD) of the target lesions, has brought a lot of attention from statisticians, with the goal to anticipate the outcome of clinical trials and/or identify most at risk’s patients[1]. However, SLD is an aggregate measure result, which sums up the dynamics of several lesions that can have different dynamics. Moreover, these lesions can be located in different organs, and hence play may be differently associated with survival. Whether this heterogeneity is exacerbated by immunotherapy, and could be associated with survival, has been suggested. Objectives: Here we aimed to quantify, in a large population of individuals with advanced urothelial cancer, i) the impact of tumor dynamics on survival in different organs ii) the intra-patient variability and its association with treatment response. Methods: We analyzed the tumor dynamics from a phase 3 clinical trial (IMVigor211) of 900 patients randomized between immunotherapy (Atezolizumab) and chemotherapy treatments. We developed nonlinear parametric joint models to describe the SLD dynamics in 5 different locations of the body (lymph, lung, liver, bladder, other) and quantify their marginal impact on survival. Then, we developed a Bayesian multilevel joint model, where the individual lesions dynamics (up to 5 per individual) were modeled using a nonlinear mixed effect model with an additional level of random effects. Inference was done using HMC algorithm in Stan[2]. Results: We observed a great variability in tumor dynamics across organs, with different impacts on survival. In particular, liver tumor dynamics was strongly associated with survival compared to other locations. Considering proper association between organ-specific tumor kinetics and survival instead of one single association with all organs significantly improved the survival data fit. Thanks to a great amount of data (2133 target lesions), we expect to demonstrate a larger intra-patient variability under immunotherapy compared to chemotherapy. Conclusion: This approach allowed to characterize inter and intra-patient variability in response to immunotherapy, which might help to early identify the most at risk patients in a perspective of personalized medicine.

Multistate inference based on longitudinal/competing risks joint modeling under misclassified cause of failure

ABSTRACT. Thomadakis et al. (2019) proposed joint modeling of a disease marker through a linear mixed model (LMM) and competing risks using cumulative incidence functions (CIFs), with the CIFs dependent on the “true” marker values to remove measurement error. The generalized odds rate transformation was adopted, with the proportional subdistribution hazards model being a special case. In HIV studies, patients receiving antiretroviral therapy may die or disengage from care (competing risks). The CD4 count, a longitudinally measured marker, is a predictor of clinical outcomes, which suggests joint analysis. However, biases can occur as many deaths may be incorrectly classified as disengagements from care, especially in studies from resource-constrained countries.

We extend [1] to account for failure cause misclassification through double sampling, where the true failure cause is ascertained in a small random sample of individuals initially classified as disengaged from care, using a Bayesian MCMC procedure. We also estimate multistate probabilities defined jointly by marker and competing-risk data. Based on the assumed joint model, we derive posterior samples for (a) probabilities of being event-free and “true” marker values being in predefined intervals (states) and (b) population-averaged CIFs. Both (a) and (b) are re-estimated by baseline marker state.

A simulation study is performed assuming the true failure causes are available in 20% of patients. Marker data are generated by an LMM, with two scenarios for the CIFs: (i) a proportional odds rate model and (ii) a proportional subdistribution hazard model. Under each scenario, (i) and (ii) are fitted yielding estimates with small biases and good coverage rates (93-97%). Multistate/transition probability estimates are nearly unbiased even under misspecified survival submodels. The proposed models are applied to data from the East Africa IeDEA cohort study. It is estimated that only 29.2% of deaths are correctly classified, leading to significant adjustments in the mortality estimates. Mortality rates are substantially higher at lower initial CD4 counts.

We have extended a flexible CIF-based joint modeling approach to account for potential failure misclassification and derive multistate/transition probabilities. Our approach is particularly useful when the effect of the marker on the failure probabilities is of primary interest.

Bayesian Predictive Model Averaging for Joint Model of Survival and Longitudinal Data: Application to an Immunotherapy Trial

ABSTRACT. In many clinical researches, multiple biomarkers are repeatedly measured over time, so that physicians are clearly aware of patients’ conditions during the follow-up. Predicting a patient’s future survival status based on such recorded longitudinal information is of great interest, known as the dynamic individualized prediction by jointly modeling longitudinal and survival data. From a predictive viewpoint, better results can generally be obtained by averaging over the candidate models to account for the model uncertainty, compared with selecting the single best model. In this study we apply a Bayesian predictive model averaging approach to the dynamic prediction in the context of the joint model analysis, which evaluates fitted models using estimated out-of-sample prediction accuracy. In extensive simulation studies across a broad range of situations, we examine the operating characteristics of the proposed method in terms of the predictive performance, by measuring the calibration and discrimination abilities of the dynamic predictions. We discuss the strengths and limitations of the proposed model averaging approach in comparison with the single-model based method through an application to an ovarian cancer immunotherapy clinical trial as well as the simulation studies. It is suggested that the proposed modeling framework can provide generally more precise predictions on survival probabilities, which could help subsequent medical decision making process.

A Joint Model for Multiple Longitudinal Outcomes, Recurrent and Terminal Events using CF Patient Registry Data

ABSTRACT. Cystic fibrosis (CF) is an inherited disease primarily affecting the lungs and gastrointestinal tract. Thick and infected mucus in the patient’s airways lead to recurrent acute respiratory events known as pulmonary exacerbation (PE); thereby worsening lung function. It is of clinical interest to investigate the association between the risk of PE with lung function and nutritional decline, as direct positive associations between lung function and nutritional status have been reported. Previous work has been limited to continuous longitudinal markers and time-to-first PE, thereby neglecting subsequent occurrences and other survival outcomes. This was mainly due to the unavailability of appropriate and robust statistical software.

Our primary goal is to simultaneously investigate the association between the risk of PE, lung function decline (FEV1), evolution of the patient’s growth and nutritional status (e.g., BMI), and the risk of lung transplant or death using all available U.S. CF Foundation (CFF) patient registry data. We intend to explore different forms of association between the longitudinal markers and the events of interest.

We propose a joint modeling framework accommodating multiple longitudinal markers, a recurrent event process, and a terminal event. The terminal outcome accounts for informative censoring due to lung transplantation or death from respiratory failure. Novel elements of our approach, compared to previously proposed joint models for recurrent events, are: (i) allowance for multiple longitudinal markers with different distributions, (ii) specifying various functional forms to link these markers with the risk of a recurrent event and the risk of the terminating event, and (iii) accommodation of discontinuous intervals of risk, and the time can be defined in terms of the gap or calendar timescale. The developed model will be available in the R statistical package JMbayes2.

Analysis of all recurrent events with multiple biomarkers enhances our understanding of risks posed by PEs. Full MCMC algorithm implementation in C++ enables model fit in a timely fashion, despite its complexity. The proposed multivariate joint model affords the opportunity to make more efficient use of all available CFF registry data. It thereby brings new insights into CF disease progression and contributes to monitoring and treatment strategies.

Joint Modeling of Incomplete Longitudinal Data and Time-to-Event Data

ABSTRACT. Clinical studies often collect longitudinal and time-to-event data for each subject. Joint modeling is a powerful methodology for evaluating the association between these data. The existing models, however, have not sufficiently addressed the problem of missing data which are commonly encountered in longitudinal studies. In most cases, analysis methods are based on the assumption of MNAR or MAR, and sensitivity analyses are performed to assess the robustness of findings to plausible alternative assumptions about the missing data. When we cannot determine whether missingness is MNAR or MAR on the collected data, a robust model corresponding to both MAR and MNAR missing mechanism assumption is needed. Shared parameter model is one of the model-based approaches to dealing with missing data in longitudinal studies. This model is considering that the relationship between outcome and missingness models is connected by means of common random effects. In this presentation, we introduce a novel joint model with shared random effects for incomplete longitudinal data and time-to-event data. Our proposed joint model consists of three submodels: a linear mixed model for the longitudinal data, a Cox proportional hazard model for the time-to-event data, and a Cox proportional hazard model for the time-to-drop-out from the study. By simultaneously estimating the parameters included in these submodels, the biases of estimators are expected to decrease under both missing mechanisms, MAR and MNAR. The proposed model is estimated by Bayesian approach, and we evaluate the performance of our method through Monte Carlo simulation studies. The results from simulation studies indicate that our proposed model provides less biased results with respect to the association parameter of longitudinal data and time-to-event data comparing with the existing joint model.

17:30-19:00 Session OC5E: cure and mixture models
RECeUS: Ratio Estimation of Censored Uncured Subjects for Studying Sufficient Follow-Up in Studies of Long-Term Survivors

ABSTRACT. The need to model a cure fraction, the proportion of a cohort not susceptible to the event of interest, arises in a variety of contexts including tumor relapse in oncology. Existing methodology assumes that follow-up is long enough for all uncured subjects to have experienced the event of interest at the time of analysis, and researchers have demonstrated that fitting cure models without sufficient follow-up leads to bias. However, this assumption is rarely testable in practice; for example, no diagnostic tests exist to determine if a cancer patient is cured of the disease. Limited statistical methods exist to evaluate sufficient follow-up, and they can exhibit poor performance and lead users to falsely conclude sufficient follow-up, leading to bias, or to falsely claim insufficient follow-up, possibly leading to additional, costly data collection. The goal of this project is to develop a new quantitative statistic to evaluate whether cure models may be appropriate to apply to censored data. Specifically, we propose that the proportion of censored uncured subjects in a study can be used to evaluate cure model appropriateness. We implement this via maximum likelihood estimation. Asymptotic and simulation results demonstrate that the statistic can be estimated in finite samples with desirable statistical properties. We also apply the method to two oncology examples to compare the approach with existing methods.

Multiple imputation for survival analysis with missing data and a cure fraction: a study of osteosarcoma

ABSTRACT. Background: Nowadays many cancer patients get cured after treatment. Hence, distinguishing between prognostic factors with curing or life-prolonging effect is of great interest in clinical research. However, some of these covariates are often not fully observed. In childhood osteosarcoma, histologic response to pre-operative treatment is known to have a strong effect on survival while the benefits of intensified chemotherapy remain unclear. Previous osteosarcoma studies have treated all patients as uncured and have excluded those with missing histologic response. This approach might affect the results.

Objectives: The study aims at developing and analyzing innovative methods for incorporating observations with missing covariates into the analysis of survival data with cured subjects. This is statistically challenging because of the complex model structure and the latent cure status.

Methods: A mixture cure model is considered, assuming that patients are either cured or uncured. In particular, the cure status and the failure times are modeled respectively through a logistic and a Cox regression model. Several methods based on the multiple imputation approach are explored for handling missing covariates, incorporating outcome information in the imputation model. Compared to the method discussed by Beesley et al., the proposed approach is more general because it allows for inclusion of different covariates in the two model components. Existing procedures, are used to estimate the logistic-Cox model. The proposed methods are evaluated through an extensive simulation study and are used to analyze osteosarcoma data from MRC BO06/EORTC 80931 clinical trial.

Results: The developed methodology allows to simultaneously consider cured patients and partially missing covariates. It leads to smaller variance of the estimates and larger power to detect significant effects. This study shows for the first time that histologic response has a strong prognostic value for the cure status and the effect of intensified chemotherapy is strongly related to histologic response. Conclusions: The complex nature of the disease and the limitation present in the data ask for more advanced statistical techniques. This study shows that imputation of missing covariates while account- ing for cured patients provides more accurate interpretative and forecasting tools in osteosarcoma research and other oncological studies.

A simulation analysis of reliability and robustness of a cancer cure model accounting for extra non-cancer mortality.

ABSTRACT. Introduction: The proportion of cancer patients cured of the disease is estimated with standard cure models assuming they have the same risk of death as the general population [1]. Cured patients, however often maintain an extra risk of dying compared to the overall population due to other causes than cancer [2]. The aim of this work was to develop and validate an extended cure model incorporating a patients’ relative risk (α) of death from other causes compared to that observed in the general population. Methods: We extended mixture cure models considering Weibull relative survival of the uncured by adding a relative risk α multiplying the observed mortality in the general population. The parameters were estimated using maximum likelihood method for individual data and unweighted least square for grouped data. We evaluated the standard and the extended cure models through a simulation study, assessing their performances when all the assumptions are valid and their robustness when survival of uncured patients did not follow a Weibull distribution or extra non-cancer death risk was dependent of age at diagnosis or it randomly varied across patients. Two scenarios, simulating lung and breast cancer survival patterns, were varied by different relative risks α and lengths of follow-up. 1000 samples of different sizes (500- 20,000 cases each) were generated. The models were also applied to colon cancer FRANCIM real data. Results: When the assumptions were satisfied, both the extended cure models estimated correctly the parameters and their standard errors providing excellent coverage in all scenarios, although maximum likelihood coverage outperformed unweighted least square. The standard model underestimated π by 7% when α= 1.2, and by 40% when α=2.0. Age effect on the cure fraction was heavily overestimated. For reasonable deviations from the assumption parameter estimates appeared fairly robust, with relative difference from the true values in the range ±10%. Applied to real male colon cancer data, the extended models estimated α around 1.2 and π around 56% higher than that of the conventional model 52%. Conclusions: The present analysis suggests that conventional indicators overestimate cancer-specific death and underestimate cure fraction and net survival of cancer patients.

An extension of Fellegi-Sunter record linkage model for mixed-type data with application to SNDS

ABSTRACT. Probabilistic record linkage is a process of combining data from different sources, when such data refer to common entities and that identifying information is not available. Fellegi and Sunter proposed a probabilistic record linkage framework that takes into account multiple non-identifying information but is limited to simple binary comparison between matching variables. In our work, we propose an extension of this model especially when matching data contains different types of variables (binary, categorical and continuous). We developed a mixture of discrete distribution for handling low prevalence categorical matching variables, and a mixture of hurdle gamma distribution for handling continuous matching variables. The maximum likelihood estimates for model parameters are obtained by means of the Expectation Conditional Maximization (ECM) algorithm. Through a Monte Carlo simulation study, we evaluated both the posterior probability estimation for a record pair to be a match and the prediction of matched record pairs. The first simulation results indicate that the proposed methods perform well as compared to existing methods. The next step will be to apply the proposed method to real datasets, which aim to find corresponding patients in SNDS (Système National des Données de Santé) and GETBO (Groupe d'étude de la Thrombose de Bretagne Occidentale) register data.

17:30-19:00 Session OC5F: longitudinal data analysis :
Multiple imputation approaches for handling incomplete three-level data with time varying cluster memberships

ABSTRACT. Three-level data structures arising from repeated measures on individuals who are clustered within larger units are common in clinical and population health studies. An additional complexity arises when individuals move between clusters over the course of the study resulting in a cross-classified data structure, which needs to be accounted for in the analysis. In these studies, missing data are also a common occurrence. Multiple imputation (MI) is a popular approach for handling missing data, but its validity depends on appropriate tailoring to the target analysis. In the context of cross-classified data, this means that the three-level structure and the time-varying cluster membership should be appropriately accommodated in the imputation model. While three-level data can be handled by either adapting single- and two-level MI approaches using dummy indicators and/or imputing repeated measures in wide format, or using a three-level MI approach, the implementability and comparability of these approaches in the context of cross-classified structures remain unclear. We conducted a simulation study to evaluate MI approaches handling incomplete cross-classified data when the substantive analysis uses a cross-classified random effects model. The simulation design was based on a longitudinal cohort study estimating the effect of depressive symptoms on the academic performance of students over time, clustered by time-varying school. The simulations were conducted under various missing data mechanisms and strengths of cluster correlation. The approaches evaluated included ad-hoc methods, ignoring the time-varying nature of cluster membership by taking the first or the most common cluster; pragmatic extensions of single-level and two-level MI approaches within the joint modelling (JM) and the fully conditional specification (FCS) framework; and a three-level FCS MI approach specifically developed for handling cross-classified data. We also compare the approaches in the longitudinal cohort case study. Simulation results indicated that the FCS implementations performed well in terms of bias and precision while the JM approaches performed poorly. The results in the case study were in line with simulations, with the JM approaches resulting in comparatively different estimates for the variance components than the FCS approaches.

Separation in Marginal Logistic Regression Models

ABSTRACT. Clustered or longitudinal data are frequently encountered in clinical research, e.g. when multiple study centers collect data or when measurements are performed multiple times per individual. With a binary outcome of interest, extensions of the logistic regression model to the context of correlated data are commonly applied. One popular option are marginal logistic regression models which relate the population-averaged log-odds to a linear combination of explanatory variables. Using generalized estimating equations (GEE) for fitting the marginal model results in consistent coefficient estimates even if the within-unit associations are notspecified correctly. If the binary outcome can be perfectly predicted by a linear combination of the explanatory variables, i.e. the data are ‘separated’, then the marginal model fitted by GEE will not give finite coefficient estimates. Similarly, with independent data, logistic regression fails to converge if the data are separated. A popular solution with independent data is to resort to Firth’s penalized logistic regression (FL), which was originally proposed to reduce the bias in maximum likelihood coefficient estimates. We found that the stabilizing property of FL can be transferred to the analysis of correlated data in a pragmatic way: noting that FL is equivalent to maximum likelihood estimation of an appropriately augmented data set, we suggest to first perform logistic regression ignoring the correlation structure of the data in order to create the corresponding augmented data set. The marginal model can then be fitted by GEE on this augmented data set, stabilizing the coefficient estimates. We illustrate the performance of our proposed method analysing clustered data from a study on the occurrence of hematological complications in implant dentistry, where most patients underwent multiple implant procedures. The data were separated as there were no complications for one level of a risk factor. Furthermore, we present a simulation study comparing our method to a recently published approach integrating Firth’s penalty in GEE in a more rigorous way. Interestingly, this latter method suffered from severe non-convergence problems for some data structures. Our approach performed better with respect to convergence and has the advantage of being easily implementable in any software where FL is available.

A geometric Brownian motion model with non-normal random effect for the prediction of the growth of abdominal aortic aneurysms

ABSTRACT. Patients with abdominal aortic aneurysms (AAA) require regular monitoring of aneurysm size, and, once the AAA has exceeded a critical diameter of 50-55 mm, a surgical intervention is indicated to avoid rupture. We propose a statistical model for the growth of AAAs to assess the risk that the AAA exceeds the critical size in a given time period and to find determinants of fast versus slow growth.

The growth of AAAs is characterised by (1) growth rates that increase with increasing aneurysm size, (2) right-skewed heterogeneity in growth rates between patients and (3) heterogeneity in growth rates within a patient across time. Further, (4) patients present at different stages of their disease at the initial diagnoses and a prediction model should be applicable regardless of disease stage.

We regarded the growth of AAAs as a stochastic process and developed a parametric prediction model that is based on geometric Brownian motions with log-normally distributed growth rates which are modelled as random effect. The model parameters include the mean growth rate (which may depend on co-variates), a scale factor quantifying within-patient variability in growth and the random-effect variance quantifying between-patient variability. In contrast to models for AAA growth proposed in the current literature, the stochastic growth model accounts for all considered characteristics (1)-(4) and in addition has a self-consistency property. A model fitting routine using maximum likelihood with a Laplace approximation was implemented in R. Also, methods to calculate the model-based distribution function and the quantile function of AAA size at a given time-point depending on the current AAA size were implemented.

The model was fit to a longitudinal data set of 87 patients with a median follow-up time of 1.8 years and a total of 522 AAA diameter measurements. The comparison of prediction intervals across time with observed growth curves showed high agreement, and leave-one-out cross validation revealed that the distribution of diameters at the last visit was accurately predicted based on the individual initial diameters. An online-calculator for predictions using the fitted model was made available.

The model may also be applied to other clinically relevant growth or deterioration processes.

Predicting Patient Risk for Adverse Drug Events in Health Care Claims Data using Functional Targets

ABSTRACT. Adverse drug events (ADEs) represent a burden on health care systems. Detecting ADE signals in pharmacovigilance is mainly achieved using spontaneous reporting systems. Longitudinal data offer a promising alternative due to their high-volume and high-resolution nature. Particularly, health care claims databases contain information on prescriptions, diagnoses and demographic risk factors.

We present a novel strategy for predicting ADEs based on health care claims data, in which we group drug and disease predictors according to their biological functional targets (FTs), i.e., pathways of molecular targets (e.g., receptors). We hypothesize that drugs and diseases involved with an FT are more likely to lead to the ADEs associated with that FT. Exploiting such knowledge may better explain the relationships between predictors, allow for utilizing the full spectrum of longitudinal data, and increase methods' predictive power.

We compared the predictive performance of four methods in three settings: 1) FT-grouping, 2) grouping according to the WHO drug/disease classification, and 3) no grouping (ng). We compared machine learning methods: random forests (RF; ng) and block forests [BF; grouping (g)], and regression methods: the LASSO (g + ng) and an extension of the adaptive rank truncated product (ARTP; g), a method used in genetic applications, to enable outcome prediction.

We applied the strategy to the German Pharmacoepidemiological Research Database, which contains claims data from ~24 million insurants, to predict gastrointestinal bleeding (GIB) and intracranial bleeding (ICB), two known ADEs of direct oral anticoagulants (DOACs). We analyzed data from the years 2015-2016, and created two matched subcohorts (1:4) of adult insurants diagnosed with GIB (N=64,720) or ICB (N=34,600). We controlled for age, sex, region of residence and time-to-event. Performance evaluation measures included the area under the precision-recall curve.

For both subcohorts, the LASSO, RF and BF comparably outperformed the ARTP. For GIB, the BF with FT-grouping ranked the DOACs group higher than WHO-grouping did. For ICB, BF with FT-grouping ranked etiology pathways the highest. FT-based grouping may better describe ADE risk profiles, however, dataset size, and FT-group sizes and overlap affect prediction. Regression-based methods, e.g., the group LASSO, are challenged by data dimensions and require scalable implementations.

17:30-19:00 Session OC5G: Missing data & measurement error
Multiple imputation for missing data in case-cohort studies: simulation and case study

ABSTRACT. Background: Case-cohort studies are useful when exposure data are expensive to collect on the full cohort. In case-cohort studies, a random subcohort is selected with probability < 1 and exposure collected only on the subcohort and all cases. The unequal sampling probabilities in case-cohort studies are accounted for during analysis through inverse probability weighting. Missing data is commonly addressed by multiple imputation (MI), but valid use requires compatibility between the imputation and analysis models. When unequal sampling probabilities are incorporated into the analysis, compatibility requires the probabilities also be accommodated for during imputation. It is unclear how to best apply MI in case-cohort studies to address missing covariates in order to achieve compatibility. This study assessed the performance of various approaches to implement MI in the context of a case-cohort study, in which the target analysis was a weighted model estimating either a risk ratio or odds ratio. Methods: A simulation study was conducted with missingness in two covariates, motivated by a case-cohort investigation within the Barwon Infant Study (BIS). MI methods considered were ignoring weights, including an interaction between the outcome (as a proxy for weight groupings) and all other analysis variables, and imputing separately for cases and controls. Factors such as the proportion of incomplete observations, missing data mechanism and subcohort selection probabilities were varied to assess performance of MI methods. A weighted complete case analysis (CCA) was performed on the subcohort and cases with complete covariate information for comparison. MI methods were also applied to a subset of the BIS data. Results: There was similar performance in terms of bias and efficiency in both estimates across all MI methods, with expected improvements compared with the CCA. For all methods, an expected increase in precision as the subcohort selection probability increased was observed. These results were consistent with the case study. Conclusions: Our results suggest that the use of MI to handle missing data is more efficient than a CCA. How weighting is included in the imputation model makes little difference in the analysis of case-cohort studies potentially because there are only two weight classes in this setting.

Application of three level multiple imputation to national surveys

ABSTRACT. The theory of multiple imputation requires the sampling design be incorporated in the imputation process. Not accounting for complex sample design features, such as stratification and clustering, during imputations can yield biased estimates from a design-based perspective. Most datasets in public health research show some form of natural clustering (individuals within households, households within the same district, patients within wards, etc.). Cluster effects are often of interest in health research. Missing values can occur at any level in multilevel data, but there is limited guidance on extending multiple imputation to impute variables captured at three levels or more.

This paper implements and extends the Gelman and Hill approach for imputation of missing data at multiple levels by including aggregate forms of individual-level measurements to impute for missing values at higher levels. In our study, we use the fourth National Family Health Survey (NFHS-4) data of India, to implement our extensions of Multiple Imputation using Chained Equations to impute for missing values in three level data structures.

The dataset is naturally hierarchical with children nested within mothers who are further nested within households, and aims to identify maternal and household level predictors of anaemia in children in India. This study is novel in its approach in imputing for missing data in the third level (households) in the NFHS survey for India.

Profiles of COVID 19-Hematological patients: Franco-Brazilian observational cohort study

ABSTRACT. Background - The coronavirus disease 2019 (COVID-19) pandemic that began in China, has rapidly spread to the rest of the world. The range of disease presentation is large, from asymptomatic and low severity cases up to severe life-threatening forms. Getting further insights in patient profiles of COVID-19 in immunodeficient patients is of particular interest. A bi-national cohort of 263 patients affected by both COVID-19 and an hematological disease, from France and Brazil, was analyzed. Objectives - We first aimed at identifying COVID-19-hematological patient profiles using data-driven, “unsupervised”, methods. A secondary objective was to identify the interest of using a semi-supervised procedure to obtain a partition associated with patient survival, compared to the unsupervised procedure. Methods - Patient profiles were obtained from continuous variables (age, number of comorbidities, biological measurements), and an archetype variable derived from symptoms and underlying hematological disease characteristics using a generalized low rank model. Learning methods adapted to the presence of missing data were used for unsupervised and semi-supervised learning for a survival endpoint. Results - Both methods identified two clusters with a few unclassified patients. While the unsupervised clusters differed notably in terms of comorbidities and age, the semi-supervised partition differed from the former (as measured by an ARI of 0.29), additionally distinguishing patients on biological variables such as creatinine level, and with an increased prognostic value. Actually, the 30-day survival rate was 77.1% in the cluster of young patients with low C-reactive Pr, D-dimers, LDH and creatinine levels, compared to 46.7% in the second cluster. The partition and patient age achieved additive prognostic information. Conclusion - The interest of the semi-supervised method was shown, allowing to identify two highly different prognostic profiles of patients with hematological disease and COVID-19, mostly based on age, comorbidities, and biological evidence of inflammation.

(1) Faucheux, L., Resche-Rigon, M., Curis, E., Soumelis, V. & Chevret, S. Clustering with missing and left-censored data: A simulation study comparing multiple-imputation-based procedures. Biometrical Journal (2021). (2) Faucheux, L., Soumelis, V. & Chevret, S. Multiobjective semisupervised learning with a right-censored endpoint adapted to the multiple imputation framework (Submitted).

Prediction of cancer incidence in areas without registries using proxy and registry data

ABSTRACT. Background In France, cancer registries only cover about 20% of the population. To predict incidence at local levels, one may use imperfect proxy data, correlated to incidence, such as health care databases or mortality data. To do so, we developed a calibration model based on modelling the ratio between proxy (P) and incidence (I) observed in the registries area. The aims of this study are to 1) present a global methodology to predict incidence at local level, involving three steps: calibration model, assessment of predictions quality and spatial smoothing for disease mapping 2) show the properties of the estimators derived from the calibration model and 3) illustrate the application of this methodology to predict cancer incidence at the district-level in France (départements).

Material and methods In the registries area, the ratio between the number of patients from the proxy source and incident cases is modelled by age using a Poisson mixed model. This model provides i) a smooth P/I age ratio f(a) and ii) an estimation of the district-variability of the ratio. For a new district, prediction are derived by age using the P numbers divided by f(a). Predictions follow a lognormal distribution, with variances depending on the variability of the ratio. The properties of the predictions were evaluated through realistic simulations. The whole methodology was applied to predict cancer incidence in France over the 2007–15 period for 24 sites, using several health care indicators and mortality.

Results The calibration model provided unbiased estimations of number of incident cases; coverage rates of the 95% predictions intervals ranged from 91 to 96%. Incidence prediction was of sufficient quality for 27/34 solid sex-sites combinations and only 2/8 sex-haematological malignancies combinations. Mapping of smoothed predicted incidence provided a clear picture of the main contrasts in incidence.

Conclusion Our calibration approach is an adequate tool to predict local cancer incidence, using proxy measures and registries data, when registries cover only part of the territory. Future developments are oriented toward a joint modelling of incidence and proxys processes. It offers an interesting perspective to provide predictions at smaller geographic scales.

What is the real prevalence of hypertension in France ?

ABSTRACT. Background Hypertension (HT), defined by permanent high blood pressure (BP) level, is a leading modifiable risk factor for cardiovascular and renal diseases. In practice, HT is diagnosed as BP (systolic/diastolic) exceeding a threshold level (140/90mmHg). Due to variability of BP measures within one’s individual, it is recommended to use BP measurements during several visits for diagnosis of HT. However, in epidemiological studies, BP is frequently measured during a single visit. In such designs, direct count of patients with hypertension neglects within-person variability and bias estimates of HT prevalence. The aim of our study was to provide factors taking into account the different components of BP measures variability (between individuals, between visits and between measures) to correct bias in HT prevalence estimation in epidemiological studies. The method was applied to estimate HT prevalence in France in 2015. Methods We used data from the Nhanes III study, in which patient’s BP was measured in three visits with three measures per visit. For each gender and type of BP, components of BP variance were modelled with random effects model. Variances of random effects were allowed to vary with age and modelled using penalized splines. Models were estimated in a Bayesian framework, using Hamiltonian Monte-Carlo (Stan software). Components of variance allowed calculation of factors to correct estimates of HT prevalence in epidemiological studies. The method was applied to data from the Esteban study, where three standardized BP measurements were performed during a single clinical exam.

Results The shape of the components of BP variability varied greatly with age, with different patterns for systolic and diastolic BP. Variability of BP was driven by between visit and between individual variances, between measures variability being much lower. The raw prevalence of HT in the Esteban study reached 31.4%. After correction for BP variability, estimates decreased to 28.0%. By applying these corrected proportions to the French adult population, the number of hypertensive patients reached 15,300,000 in 2015.

Conclusion Correction of within individual BP variability to estimate HT prevalence from a single measure could avoid a substantial over-estimation of the prevalence of hypertension in the population.

Exploring the Sensitivity of Extended SIR Models Through Randomized Simulations and Multiple Factor Analysis

ABSTRACT. COVID-19 has been modelled since its emergence in several ways, most prominently using(Susceptible-Infected-Removed) SIR models. This paper aims to evaluate the robustness of extended SIR models by using multivariate factorial analysis on 13 parametric assumptions to determine which methods are most influential in overcoming the COVID-19 pandemic. Overall, we show that the COVID-19 epidemic projections are very sensitive to minor changes in assumptions, even when using parametric assumptions within ranges given by the CDC. The spread and disease burden depended upon very distinct parameters based on the primary response that was measured, and key parameters impacted different facets of disease burden.

Specifically, we find that testing, as well as isolation and quarantine measures are most effective in containing the spread and alleviating disease burden. Similarly, the accuracy of diagnostic tests carry great importance. Since each of the parameters used in all COVID-19 model projections are estimated values, better care should be used to understand the variability of these parameter estimates when models are shared with the public

Monte Carlo simulation of the COVID-19 spread using an agent-based modelling in Russian regions

ABSTRACT. Introduction Prediction of COVID-19 outbreak and timely introduction of preventive measures require reliable tools for epidemic spread simulations. The aim of this study was simulation of the COVID-19 spread in early and peak stages in Russian regions using a Monte Carlo agent-based model. Methods We fitted a general pooled model, in which all individuals (agents) may interact with each other. Conditional on a simulated scenario, each agent has its appropriate binary states, which are governed by the Monte Carlo based random values. The model accounts for the population age structure. Infection transmission coefficient was directly related to the average number of individuals, to which an infected agent may transmit the infection within one week given no restriction measures to be applied. The model also accounts for the efficiency of isolation measured by the ‘self-isolation index’. For prognosis, we considered positive and negative scenarios. The testing model also includes the number of daily PCR tests for COVID-19 and the increase in its accuracy. Results To provide a best-fit scenario, we manipulated with several key parameters of the model. They were the number of initial infected agents, percentage of deaths among agents in the critical state, and virus transmission coefficient. Our agent-based model of COVID-19 epidemic spread widely employed the rigorous methodology of Monte Carlo simulation principles. The model was validated on the statistical data for daily new cases and deaths in representative regions of Russia, such as Moscow and Novosibirskaya oblast’. Conclusions We suggest that agent-based modelling can be succefully used for elucidating the COVID-19 trends. Traditional simulation approaches based on derivatives of a SIR model, although being quite efficient, suffer from not accounting for random factors. Agent-based models provide a convenient solution which allows accurately accounting for age distribution, variations in self-isolation strategies and testing protocols, super-spreaders etc. Despite local features of territories, epidemic curves can be predicted correctly for different regions. In this situation we use the same model parameters, except the initial number of infected, which serve as a tuning parameter of the model. Acknowledgments The study is supported by RFBR, CNPq, and NSFC (project no. 20-51-80004).

Similarities between the COVID-19 spread in Romanian counties identified through data clustering

ABSTRACT. The purpose of our study was to analyze the dynamics of COVID-19 spread in Romanian counties over a period of 299 days (April 2nd 2020 - January 25nd 2021), in order to identify possible similarities that may contribute to a better understanding of the mechanisms of disease spread. The data used in our study are the numbers of active cases for each county in Romania, as well as Bucharest and the whole country, reported daily by the Romanian Ministry of Health (; based on these values, we calculated the daily number of new cases and we reported them at the total number of inhabitants per counties, as it was enlisted by INS (Romanian National Institute of Statistics). We expressed these values as ratios at 100.000 habitants, in order to gain consistency and relevancy (otherwise, the raw values were small in magnitude and difficult to interpret). In order to identify similarities between counties, we decided to use data clustering techniques. We performed the hierarchical clustering of counties, recorded as variables, using the between-groups linkage method and testing different distances: Euclidean, squared Euclidean, Chebyshev, Block and Minkowski. The result of this approach was that we identified rather the “outliers” among counties, because, no matter the distance we used, we found a big cluster, which includes most counties in Romania, a small cluster including 5 counties, situated approximatively in the same geographic area (Transylvania – Alba, Brasov, Cluj, Sibiu and Timis) and a few counties with specific evolution: Bucuresti (which is the biggest city from Romania), Constanta (located in the SE of Romania, along the seaside), Ilfov (the rural area around Bucuresti) and Salaj (located also in Transylvania, in the neighborhood of Cluj, which belong to the first small cluster). Since we are dealing with time series, this method for data analysis isn’t actually the first choice (the calculation of cross-correlations is the classical approach), but it lead us to an interesting conclusion – there are no significant differences between most counties in Romania in which concerns the disease’s spread pattern – with only a few exceptions which were clearly identified.

A new epidemic model for the Covid-19 pandemic

ABSTRACT. We present a new model to describe the Covid-19 pandemic, that takes in account both the possibility of a re-infection of the recovered subjects and the differentiation between symptomatic and asymptomatic infected subject. The model, denoted as -S(I)RD, is a 6-compartment model, described by as many ordinary differential equations. The six compartments are represented by Susceptible (S), Symptomatic Infected (Is), Asymptomatic Infected (Ia), Recovered from Asymptomatic fraction (Ra), Recovered from Symptomatic fraction (Rs), Deceased (D). The biological assumptions are as follows: (i) no entry or exit from the territory (closed territory); (ii) the contagiousness of the infected is immediate (therefore the compartment of the Exposed is not considered); (iii) a loss of immunity is considered (in the first version of the model it is considered at a constant rate); (iv) mortality and birth rate affect, as a first approximation, only the Susceptible compartment; (v) the Asymptomatic Infected compartment includes both the fraction identified by diagnostic evaluation and the unidentified one; (vi) there is no lethality in the Asymptomatic Infected fraction. Some numerical simulations have been performed, but at present, the validity of the model has not been verified by fitting with real data, also due to poor data quality. However, we will perform the fitting on the basis of the data referring to the compartment of the Deceased (the only compartment characterized by hard data, at the moment).

Spatial analyses of the first wave of COVID-19 cases in Hong-Kong using Poisson intrinsic and Besag York Mollié conditional autoregressive models under a Bayesian paradigm

ABSTRACT. Background: As the COVID-19 pandemic continues to evolve, identifying high risk areas and populations becomes increasingly important. Spatial analyses are limited, yet such insight can inform targeted responses.

Methods: We conducted exploratory analyses of the spatial clustering of COVID-19 cases by Hong Kong district and investigated dependence on socioeconomic and demographic factors. These include population density, age and sex composition, working population, healthcare capacity, and post-secondary education attainment, among others. Spatial trends were tested using Moran’s I statistic with contingency- and distance-based neighbourhood structures. Cases were modeled using Poisson intrinsic and Besag York Mollié (BYM) conditional autoregressive (CAR) Bayesian models.

Results: Moran’s I test showed evidence of positive spatial autocorrelation. The WAIC revealed the distance-based binary-weighted BYM CAR Ridge model with five-nearest neighbours performed the best. The analyses showed positive spatial autocorrelation of cases, as districts with the highest standardized incidence ratio were clustered in southern Hong Kong, which is evidence of positive spatial clustering.

Conclusion: These results can inform facilitate the identification of COVID-19 clusters and their determinants to better anticipate the course of the pandemic and design focused interventions to effectively control the spread of COVID-19. Nonetheless, as the pandemic continues to evolve, trends may change.

Indices of inequality to monitor temporal and geographic trends in COVID-19 incidence and death data

ABSTRACT. There is a clear need to monitor pandemic activity at different levels. Useful metrics and their follow-up over time have moral as well as policy implications. For example, differences in observed incidence and death are explained by differences in the infection fatality rate (IFR). It is of interest to explore inequality in IFR across geographic areas and over time. Inequalities in IFR can be argued to be of moral significance (unfair), and such inequalities may also have policy implications as countries/regions should aim for the lowest possible IFR. Among the most common metrics for measuring inequality are the Gini index, the Theil index, and the Hoover index [1]. They share four relevant properties: Anonymity or symmetry, scale independence or homogeneity, population independence, and the transfer principle. Inequality is different from variability: The Gini index of two populations may be different while the variability within each population is the same. We also study the more general concept of entropy [1]. The approach is also inspired by the discussion of heavy tail properties for distributions related to global incidence or mortality data [2]. We observe that classic inequality measures regarding infection and mortality require a new interpretation within the analysis of global infection data. Due to the equality of income or wealth within a society, low values of the Gini index (≤ 0.4) are associated with positive connotations such as fairness and justice. In the epidemiological setting, on the other hand, high values of the Gini index (>0.8) reflect the goal striven for: the epidemic is confined to one (or few) region(s), while the surrounding area is not affected - the epidemic has not spread. However, the analysis of inequality for the IFR rather follows the classical interpretation: single very high values must be avoided. We explore these concepts using 2020 COVID-19 pandemic data on infection incidence, mortality, and IFR at multiple scales: global, continental, national, and federal regions.

References: 1. Atkinson A (1970). "On the Measurement of Inequality" Journal of Economic Theory. 2 (3):244–63. 2. Cirillo P, Taleb NN (2020) Tail risk of contagious diseases. Nat. Phys. 16, 606–613.

Bayesian disease mapping of standardized infection fatality rate using the example of COVID-19 in Bavaria

ABSTRACT. During the ongoing COVID-19 pandemic, a major effort has been made to provide up-to-date (daily/weekly) snapshots of current infection incidence and mortality measures. This information is provided as listings or on maps. However, such maps are limited to showing either infection incidence or, in rare cases, mortality, and lack simultaneous visualization of both quantities. Further, such combined presentation of pandemic data across regions needs to be standardized for comparison purposes. We introduce the regional standardized infection fatality rate sIFR as the ratio of observed to expected infection mortality. It turns out that the sIFR can be calculated as the ratio of two standardized measures - the standardized mortality rate SMR and the standardized incidence rate SIR. The sIFR describes the ratio of the regional deviation in the mortality process to the regional deviation in the infection process. Providing sIFR on maps is of interest for comparing different regions and identifying regional hotspots or areas of concern. Both the SMR and the SIR are relative risks (RR). Thus, the sIFR can be understood as the ratio of relative risks (RRR). Bayesian disease mapping is suited for analysis of relative disease risk measures. We simultaneously estimate SMR and SIR within the Bayesian framework of autoregressive convolution models [1]. This also yields the sIFR, taking into account the neighborhood structure and correcting for statistical artefacts introduced by regions with a small number of observed data. To demonstrate the application of the method, we estimate the sIRF using the COVID-19 pandemic data from a large German federal state (Bavaria). We use aggregated data from four periods of three months each between February 2020 and January 2021. The sIFR and its components SMR and SIR are shown on maps. The naive IFR decreases during the first three periods and then increases again. Regional sIFRs change over time. Identifying major deviations in sIFR can help inform decision making between emphasizing measures for infection control and mortality reduction.

References: 1. Besag, J., York, J. and Mollie, A. (1991). Bayesian image restoration, with two applications in spatial statistics. Annals of the Institute of Statistical Mathematics, 43, 1-59.

Mathematical Modelling of COVID-19 Epidemics in Tokyo Metropolitan and New York City

ABSTRACT. The comparative study of the COVID-19 epidemics in Tokyo Metropolitan and New York City (NYC) is an intriguing research subject in two points. First, the large contrast of the epidemic sizes in the two cities attracts our great attention. Second, the two cities have been publishing high-quality datasets for the COVID-19 epidemics, so that epidemiological studies should definitely make use of them to obtain useful information. Datasets used are from the daily case reports by Tokyo and NYC Governments. The timings of infections and the effective reproductive (Re) numbers were determined from February 2020 through January 2021, using the back-calculation method and exponential growth model. The determined Re numbers were substituted into a modified SEIR compartment model, and theoretical epidemic curves were derived. The change-with-time of Re numbers showed irregular evolution in Tokyo, while that in NYC shows a large and one-time peak in Re numbers at the early stage of the epidemics followed by small fluctuation. The greatest Re numbers since March 2020 were 4.19 (95%CI: 3.63-4.75) and 11.7 (95%CI: 10.5-12.8), and the duration periods of Re numbers exceeding 2.0 in March were less than 9 days and more than 10 days in Tokyo and NYC, respectively. These results suggest that the difference in the amplitude and duration period of large Re numbers at the early stage decided the issue in the two cities. With respect to our analyses using the compartment model, we showed the possibility that the medical treatments have been improved and the aged individuals have been getting cautious to COVID-19 in Tokyo since July 2020, whereas such changes arose in NYC after November. While various factors can account for the epidemic contrast in the two cities, we proposed a difference in medical intervention as an additional factor, i.e., whether that is individual-oriented countermeasure or not. Although the time lags between infection and detection impose us to accept a few days for the duration of large Re numbers, we must avoid the duration period of more than 10 days. Careful medical (individual-oriented) interventions must be also implemented irrespective of the Re numbers.

Application of a spatio-temporal SVEIRD model to COVID-19 epidemic in the Czech Republic

ABSTRACT. Data assimilation is a general Bayesian technique for repeatedly and optimally updating an estimate of the current state of a dynamic model. We present advanced methods of Bayesian data assimilation to epidemiology, specifically the Optimal Statistical Interpolation (Cobb et al., (2014), in this case to capture the transmission dynamics and the spatial spread of the ongoing COVID-19 epidemic in the Czech Republic. The machinery of data assimilation acts to integrate daily incidence, recovery and death data (Komenda et al., 2020) as made available by the Ministry of Health of the Czech Republic into a fully spatial Susceptible-Vaccinated-Exposed-Infectious-Recovered-Dead (SVEIRD) compartmental model for the tracking process. Rather than representing the population as a linked set of regions or districts, we represent the population as a gridded map. Each grid cell has a population count, which is divided into disease compartments. Each grid cell can transmit disease to its neighbors, with probabilities that decline exponentially with the Euclidean distance. We use the proposed spatial SVEIRD model to estimate and project the number of newly infected and death cases up to August 1, 2021. We use mathematical modeling in order to provide insights that would support public health agencies towards informed, data-driven decision making.

Impact of STI screening intensity on antibiotic exposure: A modelling study among men who have sex with men in Belgium

ABSTRACT. Background Neisseria gonorrhoeae (NG) could become untreatable in the near future, as it has developed resistance to all class antibiotics it has been exposed to. While treatment of symptomatic NG in core groups, such as men who have sex with men (MSM) is crucial, screening programs that target asymptomatic NG cases may contribute to an excessive exposure of the population to antibiotics and thus contribute to emergence of antibiotic resistance in NG. It is important to ensure that screening has benefits that outweigh the risks of increased antibiotics resistance. Methods We used a network-based mathematical model of NG transmission dynamics among MSM in Belgium to estimate the prevalence of NG in the population and the amount of antibiotic uptake. The model simulates the daily transmission of NG among three anatomical sites (pharynx, urethra, rectum). Low- and high risk behaviours , are modelled for a more realistic approach. The effects of different screening intensities on NG prevalence and antibiotic exposure were explored. Results The model was simulated in a population of 10.000 Belgian MSM over a period of 10 years. Different combinations of screening intensity (annual, biannual or every 3 months) and coverage (5% - 50% of the population) were compared to no screening. Annual screening of 50% of the population resulted in a prevalence of 7.6% in the pharynx, 3.8% in the urethra and 9.4% in the rectum, compared to 9.5% (pharynx), 4.8% (urethra) and 11.9% (rectum) in the no-screening scenario. In the most intensive scenario (3-monthly, 20% coverage) the prevalence was reduced to 3.8% (pharynx), 2.1% (urethra) and 5.9% (rectum). If 50% of the population were screened on an annual basis, the proportion of the asymptomatic population exposed to antibiotic treatment increased from 9.6% to 26.4% with 3-monthly screening of 30% of the population. Discussion All scenarios reduced the prevalence of NG in all anatomical sites, compared to no screening. However, the most screening-intensive scenarios, exposed a large part of the population to antibiotics, which could result in the emergence of antibiotic resistant NG and other organisms.

Epilocal: A real-time tool for local epidemic monitoring

ABSTRACT. BACKGROUND The novel coronavirus (SARS-CoV-2) emerged as a global threat at the beginning of 2020, spreading around the globe at different times and rates. Within a country, such differences provide the opportunity for strategic allocations of health care resources. OBJECTIVE We aim to provide a tool to estimate and visualize differences in the spread of the pandemic at the subnational level. Specifically, we focus on the case of Italy, a country that has been harshly hit by the virus. METHODS We model the number of SARS-CoV-2 reported cases and deaths as well as the number of hospital admissions at the Italian subnational level with Poisson regression. We employ parametric and nonparametric functional forms for the hazard function. In the parametric approach, model selection is performed using an automatic criterion based on the statistical significance of the estimated parameters and on goodness-of-fit assessment. In the nonparametric approach, we employ out-of-sample forecasting error minimization. RESULTS For each province and region, fitted models are plotted against observed data, demonstrating the appropriateness of the modeling approach. Moreover, estimated counts and rates of change for each outcome variable are plotted on maps of the country. This provides a direct visual assessment of the geographic distribution of risk areas as well as insights on the evolution of the pandemic over time. CONTRIBUTION The proposed Epilocal software provides researchers and policymakers with an open-access real-time tool to monitor the most recent trends of the COVID-19 pandemic in Italian regions and provinces with informative graphical outputs. The software is freely available and can be easily modified to fit other countries as well as future pandemics.