next day
all days

View: session overviewtalk overview

13:00-14:00 Session Plenary 1: PRESIDENT'S INVITED SPEAKER

ABSTRACT. Survival analysis is characterized by the need to deal with incomplete observation of the outcome variable, most frequently caused by right-censoring, and several – now standard – inference procedures have been developed to deal with this. Examples include the Kaplan-Meier estimator for th survival function and partial likelihood for estimating regression coefficients in the proportional hazards (Cox) model. During the last decades, methods based on pseudo-values have been studied. Here, the idea is to apply a transformation of the incompletely observed survival data and, thereby, to create a more simple data set for which ‘standard’ techniques (i.e., for complete data) may be applied, e.g., methods using generalized estimating equations. An advantage of this approach is that it applies quite generally to(marginal)parameters for which no or few other regression methods are directly available (including average time spent in a state of a multi-state model). Another advantage is that it allows the use of a number of graphical techniques, otherwise unavailable in survival analysis. Disadvantages include that the method is not fully efficient and that it, in its simplest form, assumes covariate-independent censoring (though generalizations to deal with this have been developed). We will review the development in the field since the idea was put forward by Andersen, Klein and Rosthøj in a 2003 Biometrika paper. Focus will be on graphical methods but the theoretical properties of the approach will also be touched upon.

An early-phase clinical trial design in oncology with generalization ability

ABSTRACT. Recent years have seen progress in precision medicine that uses biomarkers to predict the actions and effects of pharmaceuticals. Finding investigational new drugs using biomarkers has become an important part of drug development. Research into using image data as a biomarker to indicate therapeutic effects has accelerated the progress of precision medicine. A diverse array of biomarkers is used in clinical trials, and strong correlations between biomarkers are believed to exist. Clinical trials is often be conducted on a limited number of cases. Therefore, it becomes difficult to improve the generalization ability of conventional statistical models that express the relationship between dose and outcome if patients' biomarkers are included among the explanatory variables. In the presentation, we propose an approach that applies a Gaussian process, which is a form of machine learning, to an early-phase clinical trial in oncology. The treatment (dose), the status of the biomarkers, and the interaction between biomarker status and treatment (dose) are the input values; the presence or absence of efficacy and toxicity are the output values. Using previously accumulated patient data, efficacy and toxicity are predicted for each treatment (dose) based on subsequent patients' biomarkers status. From the predicted values, the treatment (dose) believed to be optimal for the patient are adaptively determined. We examined the operating characteristics of the proposed approach by simulating several scenarios. The results of a simulation study suggested the proposed approach obtains a desirable selection percentage of the optimal dose compared with conventional approach.

A flipped classroom approach for teaching medical statistics and statistical software training

ABSTRACT. Teaching biometry often relies on the use of statistical software courses. The aim is to impart hands-on experience for students beside the theoretical contents of medical biometry, enabling them to handle own research projects in the future autonomously to some extent. Our experience from such courses is that the technical realisation of the practical components is time-consuming, thus leaving only a small amount of time for exchange, questions and interpretation of statistical methods. The time until everyone feels sufficiently save in terms of the knowledge imparted is heterogeneous. We will upend this scenario within the didactic method of a Flipped Classroom ( Before the course, students are ought to familiarize themselves with the statistical software and to work on exercises. Thus, they will have the possibility to adjust for both individual learning pace and experience. In the beginning of the course technical problems can be solved and – ideally – there will be more time left to discuss biometrical topics. The advantage is that the lecturers accompany and supervise this phase. Presumptions for the implementation of such a concept in statistical software courses are, that - the students have unlimited access to the software (anytime, anywhere) so that students have the opportunity to individually schedule the required time for preparation and - a bundle of self-explanatory utilities (script, learning-videos, exercises, data, moodle-page) is available for autonomous training. This approach will be implemented in our teaching of medical students at Ulm University. SAS Studio / SAS on Demand for Academics [1] offers a statistical software platform version that enables students to have unlimited access free of charge. The only presumption is that an internet connection and a browser are available, which may be assumed to be standard nowadays. A script [2] and other required learning materials are already available in moodle (in german, for our students for free). Altogether, this seems to be a reasonable basis for the execution of a Flipped Classroom. We will present the idea and first evaluation data. In a second step the program will be evaluated by a cluster randomised study.

A comparison of dual biomarker threshold identification procedures within a confirmatory clinical trial

ABSTRACT. Background Often, targeted therapies only show benefit in a ‘sensitive’ subgroup of the patient population. This can cause such treatments to be overlooked in broad ‘all-comers’ trials, due to dilution of the observed treatment effect. Patient subgroups can be defined by continuous biomarker values, in conjunction with a threshold value to dichotomise the population into sensitive and non-sensitive. Such biomarker-based subgroups are frequently identified retrospectively or in an exploratory manner within trials, which can lead to inefficiencies and delays in patient care. Identifying and validating biomarker-based subgroups within a single confirmatory trial is therefore key. Moreover, there is increasing evidence to suggest that multiple biomarkers are needed to sufficiently identify sensitive patients for some drugs or drug combinations. In this work, a variety of dual biomarker threshold identification procedures are applied in a phase III trial setting and their performance contrasted. Methods In this work, it was of interest to identify thresholds for two continuous biomarkers simultaneously, which dichotomise the respective biomarkers into sensitive and non-sensitive patients, thus defining a two-dimensional patient subgroup i.e. patients who are defined as sensitive for both biomarkers. Four methods were implemented within Freidlin and Simon’s Adaptive Signature Design (ASD) framework, these being: a grid search, a modelling-based method, recursive partitioning and prognostic peeling. Methods were contrasted by their ability to accurately identify biomarker threshold locations and by the proportion of trials which achieved significant efficacy, both overall and subgroup specific. This work was carried out using a simulation study. Results In the simulation study, recursive partitioning methods showed the best overall performance, with respect to both the threshold identification accuracy and trial operating characteristics. All methods suffered when the expected proportion of sensitive patients was low and when the magnitude of treatment effect was modest. Conclusions Dual biomarker threshold identification can be successfully incorporated into a confirmatory phase III setting, without jeopardising the ability to detect an overall treatment effect. In such low dimensional settings, recursive partitioning methods should be taken into consideration.

Multifactor intervention efficacy on MACE and mortality in diabetic kidney disease: a cluster-randomized controlled trial

ABSTRACT. Diabetic kidney disease (DKD) associates with a very-high cardiovascular risk [1], advocating implementation of an intensive and multifactorial risk factors therapy. NID2 aimed to assess efficacy of a multifactorial intervention, versus Standard-of-Care (SoC), on major fatal/non-fatal cardiovascular events (MACEs) in DKD patients with albuminuria and diabetic retinopathy. NID2 is a multicentre, cluster-randomized, open-label clinical trial on 395 DKD patients from 14 Italian diabetology clinics, aged >40 years, with negative CV events history. Centres were randomly assigned to a multifactorial intensive therapy (MT, n=207) of main cardiovascular risk factors and SoC (n=188). Primary endpoint was MACEs occurrence by end of follow-up phase. Secondary endpoints included single components and all-cause death. Standardized differences (SDiff) cut-offs criteria by Leyrat et al. were used to establish baseline covariates’ imbalance in cluster randomization [2]. To evaluate a global imbalance, c-statistic was further calculated. P-values to consider clustering were computed by generalized estimating equations (GEE) model with cluster as group variable. Distribution of dependent variable and link-function was used as appropriate (gaussian/identity for continuous variable, binomial/logit for dichotomous variable). Groups comparison at end of intervention was also performed by GEE, further adjusting for baseline values. Median follow-up was calculated by inverse Kaplan-Meier procedure and primary endpoint following intention-to-treat principle, with event curves based on Kaplan–Meier analysis. Due to cluster-randomization, a Cox shared-frailty model was fitted to calculate HR and 95% Confidence Interval. Across centres, frailties are assumed as gamma-distributed latent random effects affecting hazard multiplicatively. Intervention lasted on median 3.84 and 3.40 years in MT and SoC, respectively. At end of intervention, targets achievement was significantly higher in MT. 74 MACEs were recorded (50 vs. 24 in MT), with an unadjusted HR 0.28 (95%CI 0.13-0.63; p=0.002). During global 13 years follow-up 262 MACEs were recorded (116 in MT vs. 146 in SoC). The adjusted Cox shared-frailty model demonstrated 52% lower risk of MACEs in MT arm (aHR 0.478, 95%CI 0.30-0.74, p=0.001). Similarly, all-cause death risk was 47% lower (aHR 0.53, 95%CI 0.29-0.93, p=0.027). In conclusion, MT reduces over the long-term MACEs and mortality risk in high-risk DKD patients. MT shows an early benefit on MACEs.

Challenges in Factorial Design Randomized Control Trials

ABSTRACT. Description: Randomised controlled trials (RCTs) using a factorial design enable the assessment of two or more interventions within a single trial. Compared to multi-arm trials, factorial RCTs are more efficient as they require fewer participants, with the assumption that the interventions act independently of each other (i.e. no interaction effect is present). This supposition creates specific challenges in the design, analysis and reporting of a factorial RCT, which if ignored can lead to biased results. Objective: To evaluate current methodology and reporting of published reports of 2x2 factorial design RCTs. Additionally, to assess how frequently trial design methods differ in reporting of results compared to those pre-specified in the protocol/statistical analysis plan (SAP). Methods: We searched PubMed to identify primary reports of 2x2 factorial design RCTs published between 01 January 2018 and 04 March 2020. The corresponding trial protocol and/or SAP were collected, where available. Data from both primary reports and protocol/SAP were extracted and compared on the trial characteristics (disease, sample size, funding, etc.) and approach to factorial design-specific methodology, such as design rationale or consideration for a treatment interaction in the sample size and analysis as indicators of potential challenges. (Preliminary) Results: The review included a purposeful sample of 100 factorial RCTs. The majority (23%, n=23/100) were conducted in cardiology; the median sample size was 258 (interquartile range 120 to 693); 44% (n=44/100) were multicentre; 61% (n=61/100) were funded by non-industry. The rationale for a factorial design was often efficiency in assessing multiple treatments in one RCT (44%, n=44/100). 12% (n=12/100) explicitly assumed no treatment interaction in the outset and 4% (n=4/100) reported powered sample size to detect an interaction. The primary outcome analysis was conducted for the main effects in 43% (n=43/100), as a four arm comparison for 25% (n=25/100) and both in 32% (n=32/100). Of 60 articles reporting testing an interaction, 83% (n=50/60) reported non-significant interactions. Protocols/SAPs were available for 37% (n=37/100) of the published primary reports. 65% (n=24/37) intended to assess for an interaction in the analysis (as reported in the protocol/SAP) and 17% (n=4/24) did not report this in the final report.

Non-inferiority trials with indirect evidence of assay sensitivity using network meta-analysis

ABSTRACT. Background: The choice of a non-inferiority (NI) margin and assurance of assay sensitivity are well-known issues in 2-arm NI trials. The conclusion that a NI trial has assay sensitivity is based on the following three considerations: (i) historical evidence on the efficacy of the treatment effect, (ii) the constancy assumption, and (iii) the quality of the NI trial (ICH-E10, FDA NI guidance). A 3-arm NI trial including both a placebo and a reference treatment, called the gold standard design, is strongly recommended to assess assay sensitivity. However, there are concerns about the ethics and feasibility of including a placebo; consequently, practical applications of the 3-arm NI trial have not progressed. Therefore, there is a need for a practical method to assess assay sensitivity in the 2-arm NI trials. Objective: We propose a new practical approach to confirm assay sensitivity in a 2-arm NI trial. This method involves assessing the assay sensitivity using the indirect effects of a reference treatment and a placebo by integrating data from previous trials and the 2-arm NI trial using network meta-analysis. Method: To assess assay sensitivity, it is necessary to demonstrate that the acceptable minimum effective value of a test treatment in the 2-arm NI trial is superior to placebo (Hida & Tango, 2018). Since the 2-arm NI trial does not include a placebo, we are forced to use historical trial results as external information. In other words, the proposed method uses network meta-analysis to assess and obtain indirect evidence on the substantial superiority of the reference treatment over placebo. The performance of this method is investigated in terms of the actual type I error rate, joint power, and calculated sample size using simulations of several scenarios based on the data from clinical trials. Results and Conclusions: The level of evidence for the proposed method may be lower than that for the gold standard design, owing to the use of external information. However, the performance of this method as per the results of various simulations suggests that it will be useful as one of the methods to assess assay sensitivity of the 2-arm NI trial.

Biomarker-based Bayesian randomized clinical trial for population finding with efficient reduction of sample size

ABSTRACT. The benefits and challenges of incorporating biomarkers into the clinical development of new agents have been increasingly discussed. In many cases, more accurately identifying a sensitive subpopulation of patients, a larger sample size, thereby, a higher development cost and a longer study period may be required. Thus, we consider designing interim analyses with decision criteria to improve the efficiency of the study design that is aiming to reduce the expected number of patients enrolled onto a clinical trial. The clinical trial analyzes a time-to-event endpoint such as progression-free survival time and the decision criteria accounts for the amount of accumulated information of observed events. We discuss a Bayesian randomized clinical trial design incorporating a predictive biomarker measured on a graded scale for the development of a new molecular targeted treatment Extensive simulation studies evaluate the operating characteristics of the proposed method, including the correct identification probabilities of the desired subpopulation under a wide range of clinical scenarios.

A methodological review of phase I designs with late-onset toxicities and incomplete follow-up

ABSTRACT. Background: Conventional phase I designs, such as 3+3 and CRM design, are conducted based on cytotoxic agents, where the acute toxicity is likely to occur within the first cycle of treatment. These designs require previous participants to be fully followed up before further dose assignment. However, some therapies such as immunotherapies and molecularly targeted agents may have late-onset toxicities. Suspending new recruitment and waiting for the full observation of DLT outcome may lead to a prolonged trial and increase trial costs, Similar difficulty also arises when accrual is fast as many may have incomplete follow-up [1, 2]. Several designs have been proposed to address this issue, but the uptake has been slow. We conduct a methodological review to provide a comprehensive overview of the designs and their characteristics.

Methods: We performed searches in PubMed in November 2020. Phase I designs that clearly stated late-onset/pending toxicity consideration were included. We also checked the references of these identified papers to ensure no designs are missed. Key characteristics such as the trial design, methodology, advantages and limitations, and how the designs have been implemented in published trials, are extracted.

Results: Our search yielded 23 designs, where 11 (47.8%) are parametric, e.g. TITE-CRM and 12 (52.2%) are non/semi-parametric designs, e.g. TITE-BOIN and Rolling 6. Only 5 (21.7%) designs have been implemented in published clinical trials. We analyzed the time from publication of a novel design to a first published trial application using the Kaplan-Meier estimates. The probability of implementation at 5 years is 0.11, 95% CI [0, 0.24]. Time-related weight function is one typical way to deal with late-onset toxicity: weight function is applied to the dose-toxicity model (e.g. TITE-CRM) or to the toxicity outcome (e.g. TITE-PIPE). TITE-CRM has been implemented most often – amongst those published trials, fifteen used the uniform weight function and five assigned different weights on different DLT follow-up period.

Conclusions:Intelligent trial designs that allow for more rapid trial completion and achieve high accuracy in determining the right dose are much needed in practice. This comprehensive review enhances knowledge and provides guidance for investigators to choose among such study designs.

Patient-specific dose finding in seamless phase I/II clinical trials

ABSTRACT. This paper incorporates a covariate to determine the optimum dose in a seamless phase I/II clinical trial. A binary covariate and its interaction effect are assumed to keep the method simple. Each patient's outcome is assumed to be trinomial, and the continuation ratio model is utilized to model the dose-response data. The Bayesian approach estimates parameters of the dose-response model. At each stage of a trial, we allocate that dose to a patient for which the estimated probability of efficacy is maximum subject to the constraint that the estimated probability of toxicity is no more than a target value. Also, we allow the design to stop early for futility and/or toxicity. Eight plausible dose-response scenarios are investigated to check the proposed methodology. A simulation study shows that covariate consideration can enhance the identification of the optimum dose when it is appropriate to do.

The true power of clinical trials in pediatric cancers and other rare diseases

ABSTRACT. Background: Clinical trials are challenging in rare diseases like pediatric cancers, where the accrual is limited. In these trials, inference assumptions are the same as in common diseases, ie the sample comes from a quasi-infinite population. This leads to overestimating the variance of the treatment effect. The finite-population correction factor (FPCF) is often used in surveys, but not in clinical trials. With few assumptions, the use of the FPCF can improve trials efficiency, showing that the power of those trials is higher than it appears. Methods: First, a simulation study assessed the standard-error of the mean (sem) treatment effect and coverage of the 95% confidence interval with and without the FPCF. Second, a corrected power of a z-test was derived. Finally, the impact on the sample size calculation was investigated. The impact of using the FPCF versus the naive approach is assessed for varying treatment effect, sample size and population size. Results: The simulations results confirmed the overestimation of the sem with the naïve estimator. Depending on the scenario, the gain in power reached 10.0%, 14.1% and 12.9% to detect a difference in treatment effect of 10%, 15% and 20%, respectively. The gain increased with the sample size. It was negligible for n=30, and in scenarios with high power (>95%). This gain in power translated into a decrease in sample size: if the naive calculation leads to a sample size of 10% the population size, then the sample size can be divided by 1.1; if the naïve calculation leads to a sample size of 50% the population size, then the sample size can be divided by 3, in order to reach the planned type-I error and power. A Desmoplastic Small Round Cell Tumor trial is presented where the sample size is decreased from 32 to 27 patients. Conclusion: When dealing with rare diseases like pediatric cancers, the power of clinical trials is higher than it appears. The gain in efficiency was seen with reasonable sample sizes and treatment differences, showing it can be useful in pediatric cancers clinical research, when the population size is approximately known.

Outcomes reported in randomized clinical trials of depression in geriatric patients: a methodological review

ABSTRACT. Background and statistical challenges: Major depressive disorder (MDD or depression) is prevalent among older adults aged 65 and older. The effectiveness and safety of interventions used to treat MDD is often assessed through randomized controlled trials (RCTs). However, heterogeneity in the selection, measurement, and reporting of outcomes in RCTs creates challenges for the comparison and interpretation of results, and limits their utility in clinical decision-making. Core outcome sets (COS), developed through systematic scans of the literature, have been proposed as a viable solution to address the heterogeneity of outcome selection in RCTs. A COS represents a minimum set of outcomes that must be measured and reported in trials pertaining to a particular illness. There is presently no COS for use in RCTs that evaluate interventions for geriatric populations with MDD.

Objectives: We will conduct a methodological review of the literature for outcomes reported in geriatric depression trials to assess the heterogeneity of outcome measures.

Methods: RCTs evaluating pharmacotherapy, psychotherapy, or any other intervention for older adults with depression that have been published in the last 10 years will be located using electronic database searches (MEDLINE, EMBASE, PsycINFO, and CINAHL). Reviewers will conduct title and abstract screening, full-text screening, and data extraction of trials eligible for inclusion independently and in duplicate.

Analysis: Outcomes will be synthesized and mapped to a core-outcome domain framework commonly used in biomedical research comprising five areas: physiological/clinical, life impact, resource use, adverse events, and death. We will also summarize characteristics associated with studies (e.g., the number of single-arm, parallel, multi-arm, and crossover trials) and outcomes (e.g., total number of outcomes per trial, number of trials with discernable primary outcomes).

Expected results: ‘Depression severity’ is expected to be the most commonly-used outcome measure in trials of elder adults. We anticipate inconsistency in the definition and measurement of outcomes across RCTs, and few trials which specify a single, discernable primary outcome.

Conclusions: The findings from our methodological review will inform the development of a COS for geriatric MDD, with the eventual aim of reducing variability in outcome selection, measurement, and reporting for this clinical population.

One small clinical trial design to provide additional evidence of treatment effects than single-arm trials.

ABSTRACT. Background: There are some cases where the traditional randomized controlled trial designs are difficult to conduct in small populations such as rare diseases and pediatric diseases area. In such small clinical trials, many single-arm trials to assess within-patient comparisons are conducted due to feasibility. The efficacy of a test drug is evaluated based on a pre-specified threshold from the evidence of natural history or external information. However, in even well-controlled single-arm trials, there are problems of bias in the observed treatment effects. Therefore, the development of new clinical trial design is needed which provides an adequate estimate of the therapeutic effect of the test drug as well as conventional efficacy evaluation based on the threshold.

Objective: We propose a new trial design that makes a level of evidence strengthen without increasing the required sample size compared with the single-arm trials.

Method: It is a method for estimating the effect size of a test drug in addition to a threshold-based efficacy assessment, using a design similar to the delayed start design (D’Agostino RB. 2009) where subjects are allocated to treatment at different time points (e.g., Group1 (G1): period1=placebo, period2= test drug, Group2 (G2): period1 and 2=test drug). In other words, by integrating the data of period2 in G1 and period1 in G2, we can assess therapeutic efficacy compared with the same threshold as single-arm trials. Additionally, the comparison of the data of period1 in both groups allows us to estimate the effect size of the test drug with certain accuracy. We also take into account period2 in G2 to estimate efficacy more accurately and assume a correlation between period1 and period2 to derive more unbiased and interpretable estimation. For practical application, we perform various simulation scenarios with parameter changed in actual clinical settings.

Result and Conclusion: We have confirmed that the assessment based on the estimation derived from integration of period2 in G1 and period1 in G2 achieves the similar power compared with single-arm trials. And the effect size of the test drug can be obtained with certain degree of accuracy through comparing period1 in both groups.

SteppedPower - an R Package for Power Calculation in Stepped Wedge Cluster Randomised Designs

ABSTRACT. Stepped wedge cluster randomised trials (SWCRT) are a versatile alternative to parallel cluster randomised designs. They are increasingly popular in health services research for evaluation of complex interventions. Various approaches to power calculation have been presented. Hussey, Hughes [1] introduced an approach based on generalised least squares, which has since been refined; aditionally, design formulae have been proposed [2]. Recently, a maximum-likelihood approach to power calculation for binary outcomes was presented.Power calculation for SWCRT is still a matter of ongoing research, and no ready software is at hand for various situations, designs, and statistical models.

We have therefore set up an R package for power calculation that addresses current deficits and offers more flexibility concerning designs, underlying statistical models and methods of analysis.

We extend the generalised least squares method first proposed in [1]. The design matrix and covariance matrix are constructed explicitly, which makes this approach very flexible. The use of sparse matrix algorithms and aggregation on cluster level makes computation efficient, thus calculation of (very) large designs becomes feasible. In particular, cluster level aggregation facilitates the calculation for closed cohort designs. For binary and count outcomes to be analysed with generalised mixed models with non-linear link functions, tools for translating between conditional and marginal means and effects are provided. It further offers out-of-the-box tools to visualise cluster- and period importance. Settings with few clusters can lead to anti-conservative analysis; we therefore offer some common design-driven degrees-of-freedom adjustments.

We present the features implemented in the SteppedPower package and illustrate some of them with real world examples including cross-sectional and cohort type SWCRTs.

The user friendly software presented fills a gap in power calculation tools for cluster randomised trials. Work is in progress to extend visualisation tools to explore sensitivity to misspecification of the covariance structure.

Investigating the operating characteristics of clinical trials with borrowing from external data

ABSTRACT. In the era of precision medicine, novel designs are developed to deal with flexible clinical trials that incorporate many treatment strategies for multiple diseases in one trial setting. This situation often leads to small sample sizes in disease-treatment combinations and has fostered the discussion about the benefits of borrowing of external or historical information for decision-making in these trials. Several methods have been proposed that dynamically discount the amount of information borrowed from historical data based on the conformity between historical and current data. It is of major importance for regulators and clinicians to investigate the operating characteristics of trial designs that include borrowing from external information. The objective of the research is to identify correct simulation approaches to investigate the operating characteristics of trial designs that include borrowing from external information and to develop methods for informative display of results from the simulation studies. We will consider Bayesian phase II clinical trials where efficacy is evaluated on the basis of posterior probabilities. Borrowing from external data will be achieved by various methods, including hierarchical models and robust dynamical priors. Clinical trial operating characteristics will be either simulated by Monte Carlo methods or calculated analytically if this is feasible. We will illustrate appropriate and inappropriate simulation setups for investigating the operating characteristics of a clinical trial with borrowing from external data, compare the results of the setups and relate the findings to simulation studies that have been published recently. The results from the simulation studies can be used to characterize the properties of various Bayesian borrowing methods.

Introducing GINGER - a General simulation-INterpolation tool for designing multiGroup ExpeRiments

ABSTRACT. In animal experiments it is often investigated whether there is a difference in means of an outcome variable between two different groups. If an experiment is conducted with more than two groups, usually ANOVA is used, in which first a global test of any difference between groups is performed. In a multifactor experiment, one can then test for interactions between effects and for main effects. If interactions exist (which is most often assumed), the ultimate question is if there are differences between the individual experimental groups defined by different combinations of factor levels. Therefore, the experiment should be planned in order to have sufficient power to detect the smallest biologically relevant effect between any two experimental groups, even if more than two groups were included in the experiment. Existing software for sample size calculation either focuses on power of the global ANOVA test, or on contrasts between groups, but without considering multiplicity corrections (other than simple Bonferroni-type corrections). In analyses, however, typically corrections are needed for all-pairwise comparisons (Tukey HSD correction) or many-to-one comparisons (Dunnett correction), or for a number of specific contrasts between groups. Therefore, we developed a tool that combines elements of simulation, interpolation and exact computation of sample sizes, and implemented it in GINGER, a web-based application hosted at Based on our experience with experimental design of animal trials, GINGER computes sample sizes for typical animal experiments with continuous outcome variables. In contrast to existing calculators, it also considers Tukey or Dunnett corrections. The tool is based on a simulation-interpolation approach. Results are obtained instantaneously as the tool builds on pre-simulated data. We exemplify the use of the tool by means of carrying out several sample size calculations for real experiments. We also compare results with existing software where this is possible. To maintain the tool, we are hosting it in a public repository allowing for full version control and bug reporting. Ideas for further developments include extensions of the range of scenarios covered, further multiplicity corrections and options to simulate repeated experiments which are often carried out to adjust for environmental confounding.

Substitution of study control group by historic controls: Effect on study results using the example pain therapy for endometriosis

ABSTRACT. The experience of pain is modulated by physiological and psychological factors. Therefore, determining the effect of pain treatment usually requires a sufficiently large control population who receives placebo. Ethical consideration, in contrast, call for having as few patients as possible enduring the trial in the placebo group. Effort has been made for methods to reduce or substitute the control groups. The use of synthetic or historic control arm using data from previous trials or even Real World Data/Real World Evidence gained acceptance by regulation authorities. For pain medication regarding endometriosis, this method is promising but has not been tested so far. For this case study, study data from an already published clinical study (1) on the use of 2mg Dienogest daily to treat endometriosis-associated pelvic pain (EAPP) was used and efficacy was re-evaluated with a historic control arm based study data from an published study (2) using an CCR1 antagonist to treat EAPP. First, the full treatment (1) and historic control (2) groups were compared for several efficacy parameters. In addition, Propensity Score matching (PS) on all baseline variables was used to match between the treatment and historic control arm. To evaluate the effect of matching on PS, the same efficacy parameters were evaluated between matched treatment and control pairs as well. This case study has shown that even for studies which are very similar in design, heterogeneity and between-study variation is present. With the use of a historic control arm, it was possible to reproduce similar results than in the original study, while the PS matching improved the comparability considerably. For the main endpoint (pain measured on Visual Analog Scale), PS matching was able to reproduce the original study results. The method in general has proven to be useful while emphasis has to be given to the appropriate selection mechanism as well as the underlying assumptions.

Design optimization and intermediate safety reporting for a randomized controlled biomarker trial

ABSTRACT. The specification of optimal estimands and estimators is a great biostatistical challenge for every randomized controlled trial (RCT), in particular concerning safety. We illustrate and discuss the specification of safety endpoints, analysis populations and reporting formats for an interventional rule-out biomarker trial when pilot study results are available. This work was performed for the RCT “IDEAL” set up to analyze the effect of a biomarker to assist decisions on hospitalization for a specific population of patients presenting to the emergency department (ED). The rate of hospital admissions was reduced in the treatment arm vs. standard care, 40% vs. 60%. Safety was characterized by mortality, later (possibly delayed) hospital admission and re-presentation to the ED within 28 days. Typically safety analysis is conducted by quantifying adverse events per study arm and comparing between them. In the present study a biomarker was used to facilitate the stratification of patients into a low severity group with less medical surveillance (patients not admitted to hospital, the rule-out decision) and a higher severity group with high medical surveillance (patients admitted to hospital). Thereby, the study arms were split into two patient subgroups with different risks for adverse events. Accordingly, it seemed advisable to compute adverse event rates specifically for the study arm subgroups of non-hospitalized patients with potentially increased risk. Severe adverse events like death could be considered too rare for rule-out patients to construct a meaningful statistical safety criterion (single-case review, not observed in pilot). For the less severe adverse events ED re-presentation and later hospitalization a good balance had to be achieved concerning (a) larger follow-up times increasing the number of observed events (preferable for higher statistical power) and (b) stronger expected causal relationship between intervention and adverse events (attenuation of treatment effect with time). The gold standard of safety analysis is a confirmatory study proofing prespecified performance, typically non-inferiority. At the stage of a pilot study we recommend reporting of (a) all analyzed safety endpoints with point estimates and uncertainties, (b) study-arm specifically and for the difference between study arms, (c) uncertainty by intuitively well-interpretable Bayesian credible intervals and (d) verbal statements.

Using routinely collected data to conduct a pragmatic randomized controlled trial: an example addressing antibiotic prescription and resistance monitoring in Swiss primary care

ABSTRACT. Background Antibiotic consumption is very high in primary care in Switzerland, and Nesting intervention trials into routinely collected registry data is an innovative approach to addressing clinical problems in need of system-wide interventions such as antibiotic overuse in primary care. Objective To reduce antibiotic use in primary care by providing personalized antibiotic prescription feedback to individual primary care physicians in Switzerland. Methods We conducted a nationwide pragmatic randomized intervention trial of routine antibiotic prescription feedback in general practitioners (GP). We used routinely collected individual patient claim data from the three largest health insurers to prepare interventional feedback and to assess endpoints. The target population consists of the top 75% antibiotics prescribers among all GPs who see at least 100 patients a year. The intention-to-treat analysis has been done. The prescription rates are calculated per 100 consultations for each year. Results We randomized 3,426 Swiss GPs in a 1:1 ratio to intervention and control arms. The 1,713 GPs on the intervention arm only once at the beginning of the trial received evidence-based guidelines for the management of acute respiratory and urinary tract infections. Then they received quarterly personalized antibiotic prescription feedback (see Figure 1). The 1713 GPs in the control group were not actively notified about the study and received no guidelines and no prescription feedback. The two-year intervention phase started in January 2018 and ended on December 31st, 2019. The prescription rates are decreasing gradually per each year of intervention comparing feedback vs. control 7%, and 18%, respectively (See Table 1). Discussion Our trial demonstrates that using the quarterly feedback the antibiotic prescription rate decreased by 18% in the intervention vs. control group in the second year of the intervention.

Table 1. Median antibiotic prescription rates per 100 consultations

Median rates (Q1 to Q3) Baseline year (1st Jan – 31st Dec. 2017) Control 7.98 (7.39 to 8.49) Feedback 7.93 (7.39 to 8.42) First-year of intervention (1st Jan – 31st Dec. 2018) Control 7.71 (7.45 to 8.76) Feedback 7.64 (7.44 to 8.66) Second-year of intervention (1st Jan – 31st Dec. 2019) Control 7.69 (7.34 to 8.28) Feedback 7.51 (7.12 to 8.27)

Statistical considerations in using a novel consensus building technique to estimate action thresholds in clinical decision making

ABSTRACT. Background: Policy makers and health care workers are often confronted with uncertainty, both in the development of guidelines and patient management. In clinical decision making, such decisions are linked to an action threshold, e.g. the therapeutic threshold in case of treatment decisions. Because of common contention among stakeholders about the associated benefits and harms and the difficulty in weighing the harms, a predetermined agreed-upon threshold is important. Existing methods to estimate thresholds produce inconsistent results and often fail to account for harms that are hard to quantify and for the inherent variation among stakeholders.

Methods: We are piloting an adapted version of a formal consensus method in different clinical decision making settings: switching to second-line for patients with presumptive HIV treatment failure, initiating treatment for tuberculosis in patients with presumptive tuberculosis, and enforcing self-isolation for presumptive SARS-CoV-2 infection. Experts and stakeholders are invited to formulate and reflect on the potential harms of wrong decisions: either taking unneeded action (false positive) or refraining from required action (false negative). The panel rates the extent to which each of these harms should be taken into account on a modified Likert scale, a process that is repeated after discussion. In the final step, each of the harms are weighed against each other. The action threshold is estimated as the probability of disease or infection at which the expected harms associated with false positives are equal to the expected harms of false negatives.

Results and discussion: Respondents agreed more on the ratings after panel discussion and results were similar between different expert panels presented with the same statements. While estimated action thresholds were also similar between panels, it is uncertain if individual agreement is consolidated in the weighing phase given considerable variation remains between experts within the same group. Design choices, such as the number of harm-describing statements, the sequence of discussion thereof and the time spent on each statement may be sources of bias. Given the absence of a gold standard, validation of the estimated thresholds is impossible. Considerable challenges in the data-analysis and statistical inference process of this novel technique remain.

Flexible software framework to compare Bayesian hierarchical models across basket trial designs

ABSTRACT. Context: Master protocols such as basket trials become increasingly important before moving to late phase pivotal trials. Commonly applied models for this purpose are Bayesian hierarchical models (BHM), which evaluate a trial’s outcome factoring in the similarity of the strata’s results by dynamically borrowing information across the strata. The decision to move to a pivotal trial can be formalized with a go / no-go decision-making framework, which enables the calculation of operating characteristics of a trial design. Often, the path to this decision is lined with one or several interim analyses.

Objective: The objective is a framework and its software implementation that allow to compare the operating characteristics of basket trial designs across several BHMs with binary endpoints.

Methods: An R package has been built that provides functions for the simulation, analysis and evaluation of basket trials with binary endpoints. The BHMs proposed by Berry et al. (2013) and Neuenschwander et al. (2016), as well as a modified BHM that combines both approaches, are implemented in JAGS. The runtime of the simulations has been optimized by applying the BHMs to unique trial realizations across scenarios, parallelization and storage of interim results.

Results: The implemented framework allows for an arbitrary number of strata, number of interim analyses, and staggered recruitment. The functions for trial evaluation enable highly customizable go / no-go decision rules for each decision point. This allows to assess the decision probabilities, biases and mean squared errors for very flexible trial designs, as well as the analysis of a such a trial’s outcome. The implementation runs comparatively fast due to the performance optimizations and is available on CRAN.

Conclusions: The resulting R package “bhmbasket” provides a framework that facilitates the design selection for basket trials and enables the comparison of operating characteristics of different designs and BHMs.

Assessment of Clinical Trial Missing Data During a Pandemic: A Tipping Point Analysis Case Study

ABSTRACT. In December 2019 the COVID-19 outbreak emerged in China and quickly spread becoming a declared global pandemic by the World Health Organization (WHO) in March 2020. For over a year quarantines, travel restrictions, interruptions to supply chains, social distancing, face coverings, and site restrictions have led to many difficulties in adhering to and completing study protocols. While some clinical trials have been halted or suspended, others have had to implement different mitigation strategies to assure the safety of participants and continued to collect data. The effects of these difficulties, even with mitigation strategies, have led to an increase in the amount of missing clinical trial data.

Missing data may have an influential impact on the primary analysis. This case example explores a device study that was mid-way in data collection during the declaration of the pandemic, that implemented mitigation strategies for data collection at home by the participants without visiting the clinic, and other measures. Due to missing data, several sensitivity analyses needed to be considered. Ultimately a tipping point analysis was used to assess how severe a departure from the missing data assumptions by the main estimator must be to change the conclusions of the primary analysis was conducted. Tipping point adjustments were made between the pandemic and non-pandemic missingness, by type of data collection (in clinic versus by participant) and reason of missingness. Altering the assumed probability that the participant would have adhered to treatment through the end of the study versus that they would have had other non-pandemic related events were considered and their outcomes imputed accordingly. Each tipping point analysis with the data will be presented to illustrate the methodology for each set of assumptions.

A comparison of multiple imputation strategies for handling missing data in repeatedly measured multi-item scales

ABSTRACT. Medical research often involves using multi-item scales to assess individual characteristics, disease severity and other health-related outcomes. It is common to observe missing data in the scale scores, due to missing data in one or more items that make up that score. Multiple imputation (MI) is a popular method for handling missing data. However, it is not clear how best to use MI in the context of scale scores, particularly when they are assessed at multiple waves of data collection resulting in large numbers of items. The aim of this research is to provide practical advice on how to impute missing values in a repeatedly measured multi-item scale using MI when inference on the scale score is of interest. We evaluated the performance of five MI strategies for imputing missing data at either the item or scale level using simulated data and a case study based on four waves of the Longitudinal Study of Australian Children. MI was implemented using both multivariate normal imputation and fully conditional specification, with two rules for calculating the scale score. A complete case analysis was also performed for comparison. Our results suggest that imputing at the item-level across all waves when you have many items measured at each wave can lead to biased inference even when computationally feasible.

A Hidden Markov model segmentation to identify trajectories in sleep apnoea patients

ABSTRACT. Objective Obstructive sleep apnoea (OSA) is affecting nearly one billion people worldwide. Continuous positive airway pressure (CPAP), the reference treatment, is nightly used by million individuals globally with remote telemonitoring providing daily information on CPAP usage and efficacy. The knowledge gain by this avalanche of related data is hampered by the lack of relevant data mining. We aimed to implement state of the art data science methods for describing heterogeneity and diversity of CPAP telemonitoring time series of residual apnoea + hypopnoea index (rAHI). Methods We analysed a CPAP telemonitoring database of 2,860 CPAP-treated OSA patients to model and cluster rAHI trajectories. Our primary objective was to use Hidden Markov models (HMMs) as a probabilistic model-based approach to extract unrevealed features from rAHI time-series. Our secondary goals were to identify clusters of rAHI trajectories and their relation to CPAP treatment outcomes, namely adherence and leaks. Results From the telemonitoring records of 2,860 CPAP-treated patients (age: 66·31 ± 12·92 years, male gender 69·9%), HMM modelling revealed three distinct states differing in two complementary domains: variability inside a given state and probability for shifting from one state to another. Six clusters of rAHI telemonitoring trajectories were identified from very well controlled CPAP-treated patients (Cluster 0: 669 (23%); mean rAHI of 0·58 ± 0·59 events per hour) to the most unstable cluster (Cluster 5: 470 (16%); mean rAHI of 9·62 ± 5·62 events per hour). CPAP adherence was half an hour significantly higher in cluster 0 compared to clusters 4 and 5 (p-value<0·01). Leaks were also significantly higher in cluster 5. Interpretation We propose a new analysis based on Hidden Markov Models and supported by machine learning that might constitute a backbone for deployment and dissemination of digital health solutions for improving interpretation of telemonitoring of CPAP-treated patients. This method allows to visualise and reveal novel interesting features through the graph of the HMM states for guiding tailored CPAP follow-up management.

A comparison of statistical methods to compensate for missing data in longitudinal cluster-randomised trials

ABSTRACT. Missing data in clinical trials can introduce bias and reduce power in analyses, potentially causing researchers to miss important intervention effects. This is a particular concern for longitudinal cluster-randomised controlled trials (LCRCT), as missing data may cumulate longitudinally in addition to occurring at the level of individual observations, individual trial participants, or clusters of participants. Both multiple imputation (MI) and full information maximum likelihood (FIML) estimations are expected to generate similar results with appropriately adjusted standard errors when compensating for missing data in LCRCTs. Despite the advantages and robustness of these methods, little research has directly compared the performance of these two methods in compensating for missing outcome data in a three-level structure. The aim of this research was to compare the performance of these two techniques in compensating for missing outcome values in both real and simulated LCRCT data. Simulated datasets were modelled off the CopeSmart Trial, a LCRCT that aimed to increase emotional self-awareness of 560 Irish secondary school students from ten schools over an eight-week period. Missingness was introduced at varying proportions using three mechanisms: missing completely at random (MCAR); missing dependent on covariate x (MAR); and missing not at random (MNAR). The data were then analysed using the two aforementioned techniques and the bias, power, and coverage were averaged over 1000 simulates. A cluster-level covariate and individual-level covariate were also simulated to be correlated with the outcome variable. These were separately added to the MI and FIML models to determine if the addition of correlated covariates would reduce bias when data were MNAR. When the data were MCAR or MAR, both methods produced similar, unbiased estimates of the intervention effect. Both methods were biased when the data were MNAR, though MI had reduced power compared to FIML. When adding the covariates, both methods improved in terms of bias and coverage. However, the FIML method appeared to perform better consistently across all scenarios. When applying these methods to the CopeSmart data, the FIML was again superior. We recommend researchers employ FIML when compensating for missing outcome data in LCRCTs.

Effect of impaired vision on physical activity from childhood to adolescence

ABSTRACT. Background International physical activity (PA) guidelines are set irrespective of disabilities. Yet the levels of and changes in PA across transition from childhood into adolescence among those with impaired vision are not well understood due to the challenges of longitudinal population-based studies of rare conditions.

Objective To determine whether children and adolescents with impaired vision can achieve PA levels equivalent to those without impaired vision.

Methods Data from the Millennium Cohort Study of children born in the United Kingdom in 2000-01 and followed-up to age 14 years (n=11,571). Participants were grouped by eye conditions causing no, unilateral, or bilateral impaired vision based on parental self-report on vision and treatment coded by clinical reviewers. There were 16 types of PA reported by parents, teachers, and/or participants, which covered physical education (PE) at school, organised sports, self-organised sports, and hobbies. Age-related trends in reported PA types were modelled using ordinal and logistic regression. Objective accelerometer-based time spent in moderate-to-vigorous PA (MVPA) were modelled by quantile regression to assess differences by vision status and reported PA types.

Results Bilateral impaired vision was associated with having difficulties with PE (aOR=4.67, 95% CI 2.31-9.41), finding oneself “to be bad at PE” (3.21, 1.44-7.15), and enjoying indoor PA (2.08, 1.14-3.85). Unilateral impaired vision was associated with having difficulties with PE (1.80, 1.26-2.59), below-average abilities in PE (2.27, 1.57-3.28), and high level of participation in organised sports (0.45, 0.56-0.98). Age-related trajectories in PA levels and time spent in MVPA did not differ by impaired vision. The internationally recommended level of ≥60 MVPA min/day was achieved by 50% of those aged seven and 41% of those aged 14. The main contributing factors were the levels of PE and organised sports.

Conclusions Our findings show that children with impaired vision can achieve healthy levels of PA equivalent to those without impaired vision, although those who had suboptimal levels of PA continued on that level into adolescence. Population-wide public health programmes to increase PA levels are needed, as well as interventions specific to children and adolescents with impaired vision to encourage participation in PE and organised sports.

Predictors of Multidrug Resistance in Nosocomial Pneumonia among Intensive Care Units’ Patients of a Tertiary Hospital, Egypt

ABSTRACT. Background: Ventilated-Acquired Pneumonia (VAP) and Nonventilated Hospital Acquired pneumonia (NV-HAP) remain critical public health problems. Increasing antimicrobial resistance has exaggerated the management and costs of ICU patients. Objectives: To identify the predictors of Multidrug Resistance (MDR) in Nosocomial Pneumonia (NP) among ICU patients and report the microbiological profile of NP. Subjects and Methods: A prospective longitudinal study was performed at a tertiary hospital’s general ICUs from 2018-2019. We included adult patients admitted for at least 72 hours before signs appear. We studied the causative organisms, antibiotic susceptibility, and resistance patterns of these patients. We utilized the Relative Risk (RR) binomial model to determine the predictors of MDR. Statistical analyses were conducted using SPSS® version 23 and R® software. Results: The incidence rate of MDR was 1.48 per 100 person-days (95% CI 1.21 – 1.78 per 100 person-days). Although Acinetobacter baumannii (21.05%), Klebsiella pneumoniae (40.60%), Pseudomonas aeruginosa (18.80%) were the most isolated bacteria among VAP patients, only Klebsiella pneumoniae (42.11%) and Pseudomonas aeruginosa (23.68%) were the most predominant organisms detached from NV-HAP. In gram-positive bacteria, overall antibiotic nonsusceptibility was high for cefuroxime, cefoxitin, as well as cefotaxime 100%. The highest antibiotic susceptibility of gram-positive bacteria was detected for vancomycin 93.75%. Among gram-negative bacteria, the antibiotic nonsusceptibility was maximum for cefuroxime 100%. Maximum antibiotic susceptibility of gram-negative bacteria was observed for colistin 47.46%. The relative risk model clarified that the most independent predictors for MDR were A. baumannii (RR= 3.71, P=0.014), K. pneumoniae (RR= 2.78, P<0.001), P. aeruginosa (RR=2.66, P=0.002), and duration in ICU (RR=0.99, P=0.001). Nevertheless, the relative risk of the ICU duration was not expected as the length of stay of patients in ICU did not affect the multidrug resistance. Conclusion: Predictors of MDR were A. baumannii, K. pneumoniae, and P. aeruginosa, however duration in ICU did not alter MDR. Nonsusceptibility of gram-positive and gram-negative bacteria was high for commonly used antibiotics. Susceptibility in gram-positive bacteria was high for vancomycin, while high susceptibility of gram-negative bacteria was for colistin.

Modelling of longitudinal data to predict cardiovascular disease risk: a methodological review

ABSTRACT. Background Risk prediction informs understanding and management of cardiovascular disease (CVD). Many CVD risk prediction models only use one data point per patient, which does not account for temporal change in cardiovascular risk factors. Longitudinal data permits study of change in risk factors over time, within person-variance, and risk prediction based on risk factor trajectories. Analysis of longitudinal data adds complexity, such as dealing with dependence between individual observations and missing or incomplete data. The aim of this review was to identify methods used in modelling longitudinal data focussing on trajectories of risk factors to predict CVD risk. Methods The Medical Literature Analysis and Retrieval System Online (MEDLINE - Ovid) was searched from inception until 3rd June 2020 for studies meeting pre-specified inclusion and exclusion criteria: adults only; outcome including risk of CVD; and longitudinal design with ≥3 time-points. Reviews or those without individual patient data were excluded. Search terms related to “longitudinal” and “CVD” were used, including multilevel, change, slope, heart disease, myocardial infarction, and stroke. One author screened search outputs and extracted data; other authors discussed and resolved queries. Current statistical methods, assumptions, flexibility, and availability of the software were compared. Results Searches returned 2601 studies; 82 studies were included. Studies were divided into three approaches for modelling CVD risk using longitudinal data: single-stage models including basic summary measures, two-stage models using a summary measure or estimated longitudinal parameter as a covariate in a survival model, and joint models fitting longitudinal and survival data simultaneously. Single-stage models were used in 41 (50%), 30 (36%) used two-stage models, 8 (10%) used joint models, and 3 (4%) used simple statistical tests only. Over time, use of two-stage models and joint models were more prevalent, with an increase in CVD risk prediction models created using longitudinal data. Most studies used SAS, R or Stata. Joint and two-stage models allow for greater flexibility than single-stage models, although software for joint models is often more restricted. Conclusions Although the use of two-stage and joint models is becoming more prevalent, many studies still underutilise their longitudinal data when modelling cardiovascular risk.

Impact of model misspecification on model-based bioequivalence

ABSTRACT. To assess the bioequivalence (BE) of a generic, we compare its pharmacokinetic (PK) to a reference drug, through two parameters of interest: the area under the curve of plasma concentration as a function of time, and the maximal concentration. When the conventional non-compartment analysis (NCA is not feasible due to sparse sampling, an alternative to compute these parameters is the model-based (MBBE) approach. Both methods can be similarly applied to compare two drug formulations. We compared NCA and MBBE approaches to determine possible impact on the BE determination using data from a biosimilarity study of a monoclonal antibody developed at Roche, as an example.

For MBBE, the structural model was selected on real data on Bayesian Information Criterion, on both the original and sparsified data with less sampling points. This analysis inspired a simulation study with rich and sparse PK designs. On rich designs, we compared NCA and MBBE in terms of type I error, also exploring the impact of the treatment effect model used. On sparse designs, we investigated the impact of using a misspecified structural model and the relevance of a model selection step prior to the BE analysis.

Both approaches were concordant on the real data. A two-compartment model with treatment effects on all PK parameters was found to best fit the data. BE of the formulations could not be shown. The design did not impact these results. On simulations, the type I error of the NCA and MBBE approaches were similar and close to the nominal level. The MBBE approach maintained a controlled type I error except when the structural or treatment effect model was misspecified. The step of selection of the structural model enabled to achieve a controlled type I error.

MBBE was a robust alternative to NCA. For the first time, a simulation study shows how model selection is key to maintaining an appropriate type I error for BE testing.

Cluster randomised controlled trial of lifestyle intervention for adolescents’ health using ‘SPRAT’ programme

ABSTRACT. Background: Lifestyle modifications to reduce subjective psychosomatic symptoms (SPS) is an important topic worldwide. We developed a school-based lifestyle education programme involving parents to reduce SPS in adolescents (SPRAT). That approach aimed to reinforce the role of parent participation in adolescents’ healthy lifestyle modifications to reduce SPS and increase enjoyment of school life. Objectives: This study aimed to evaluate the effectiveness of SPRAT in reducing SPS among adolescents. Design: Cluster randomised clinical trial with two intervention arms. Setting: Voluntary middle schools in Japan. Participants: Middle school students with their informed consents. Interventions: SPRAT with 6 months’ intervention and usual school program (control). Primary and secondary outcome measures: The SPS score was assessed at baseline and 2, 4, and 6 months thereafter. Proportions of lifestyle factors achieved such as enjoyment of school life were the secondary outcomes. Change from baseline (CFB) at 6 months was the primary endpoint. Results: The participants used on the intention-to treat analysis were 951 (90.2%) and 1035 (84.6%) in the SPRAT and control groups, respectively. The CFB in the 6-month SPS score adjusted for baseline was lower in the SPRAT group compared to the control group but was not statistically significant -0.95 (p=0.081). Good effect was observed in CHB at 4-month (-1.60, 95%CI: -2.87 to -0.33). Improve of energy intakes in breakfast and lunch, and improved of lifestyle factors (enjoying school life, staple food consumed per breakfast, and main dishes consumed per lunch) were also observed. Conclusion: Although the results for primary outcome were not significant, good effects were observed in some secondary outcomes. These findings will contribute to meeting the critical need to develop effective and practical measures to minimise SPS, and its potential influence among adolescents. Trial registration number UMIN000026715.

Reference [1] Watanabe J, et al. BMJ Open 2018;8:2:e018938.

Assessing the role of hyperventilation in patients with traumatic brain injury: longitudinal data analysis from the CENTER-TBI

ABSTRACT. In mechanical ventilated patients with traumatic brain injury (TBI) admitted in intensive care units (ICU), the management of bloody partial pressure of carbon dioxide (paCO2) is controversial: hyperventilation leads to vasoconstriction, lowering ICP, but also might lead to an increased risk of cerebral ischemia. Guidelines suggest that the optimal paCO2 value lies between 35-45 mmHg. Nonetheless, ICU operators manipulate these values in order to balance for intracranial pressure (ICP), as values exceeding 20 mmHg are potentially dangerous. We used data from the CENTER-TBI study, a worldwide longitudinal prospective collection of TBI patient data, 1) to describe the management across centers of paCO2 in ICU patients with TBI and 2) to assess the impact of hyperventilation to 6-months outcome (Extended Glasgow Outcome Scale, GOSE). To model the clustered longitudinal paCO2 profiles, a linear mixed effects model was used, while to quantify heterogeneity in paCO2 management through the median odds ratio (MOR), a logistic mixed effects model was performed on daily hyperventilation (paCO2<30 mmHg). In both models, the fixed effects were deployed for baseline covariates and daily maximum ICP value, while a two-level hierarchical structure, i.e. patients nested within centers, was used for the random intercept effects. Finally, a logistic model on GOSE was adjusted by the standard trauma-related covariates at baseline and by two covariates representing a summary of the paCO2 and ICP longitudinal profiles (i.e. the area under the ICP trajectory in time from 20 mmHg and the area between 30 mmHg and the paCO2 trajectory in time). We evaluated data of 1100 mechanically ventilated TBI patients collected between December 2014 and December 2017 in 36 centers. A large variability in the ORs on hyperventilation usage was shown across centers (MOR=2.04) and further investigations should clarify the nature of this unexplained heterogeneity. Moreover, PaCO2 showed an important impact on the 6-months neurological outcome. We will show the clinical findings of the study and we will discuss the methodological challenges we encountered that are mainly due to the hierarchical structure of data and to the relationship between paCO2 and ICP.

A new approach to measure frailty in the context of COVID-19 population

ABSTRACT. Frailty is a clinical syndrome, resulting from interaction of the age-related decline in physiologic systems with chronic diseases. It is a multidimensional biological and physiological phenomenon able to capture the “chronological age-independent” health status. During the coronavirus (COVID-19) pandemic, it became clear that frail subjects are those at highest risk of mortality. Some tools to measure individual’s frailty have been proposed and the most widespread is the cumulative frailty index (FI) defined as proportion of deficits in a large number of domains. The aim of this study was to develop a new frailty score through the latent variable approach in a sample of patients diagnosed with COVID-19. We explored the quantification of the frailty status through structural equation modeling based on the data of consecutive COVID-19 patients admitted at San Gerardo hospital in Monza, Italy (n=448) from February until December 2020. A total of 41-items assessing the health status (e.g., comorbidities, functional abilities, habits, and laboratory tests) at admission were considered to create the underlying latent frailty phenotype. Confirmatory Factor Analysis (CFA) models with different structures were explored and the diagonally weighted least squares (WLSMV) estimator was used to estimate the parameters. Models were evaluated with several goodness-of-fit and summary statistics. We carried out a correlated CFA model with one latent factor and 41-items as indicators and a nested CFA model with one general factor (frailty) that accounts for the communality of the items, and other accounting for the influence of the specific items (functional abilities). Both models showed good fit and adequate summary indices (Comparative Fit Index, CFI, and Tucker Lewis index, TIL, more than 0.95, Root Mean Square Error of Approximation, RMSEA, less than 0.05) and suggested that frailty score was strongly represented by a cluster of functional abilities. Results confirmed the presence of a latent variable underlying different organ involvements. Our approach allowed to identify different clusters of items and variables strongly associated with the frailty status by a weighted approach. The prognostic ability of the proposed score on hospital mortality will be compared with standard tools, such as the cumulative FI and Clinical Frailty Scale.

Pain Management in Immediate Life Support Ambulances

ABSTRACT. This study evaluates different approaches in the pre-hospital treatment of pain in trauma patients. Data were collected in immediate life support ambulances (ASIV), in mainland Portugal and the Azores, from March 1, 2019 to April 30, 2020. Pain management may include pharmacological measures, which have the disadvantage of possible negative effects, and non-pharmacological ones, which have also been shown to be effective in reducing pain (e.g. Pierik et al., 2015). Several non-pharmacological measures were studied: relationship-based measures (therapeutic touch, active listening, hand holding and therapeutic presence without the use of touch); cryotherapy; heat application; distraction; immobilization; extremity elevation; presence of family and friends; comfort measures (comfortable position). Pain was assessed in three moments, before (T1), during (T2) and after (T3) nurses’ interventions, using an 11-point Numeric Rating Scale (NRS) validated by Bijur et al. (2003). The effect of pain management measures was first studied through the change in the patients’ level of pain in the first and third moments of pain assessment (Δ_PAIN). Furthermore, linear mixed-effects models with random intercepts were used to account for the repeated measurements in pane level of the same individual. Models were adjusted for patient-related variables including age, gender, anatomical location of trauma and type of injury. Of the pharmacological measures, only morphine was shown to have a significant effect on the decrease of pain intensity between the first moment, T1, and the third, T3. Two non-pharmacological measures, relationship measures and cryotherapy, were also significantly associated with pain reduction, even after adjusting for morphine, which supports the effectiveness of these non-pharmacological measures as reliable alternatives to analgesics.

Statistical methods for estimating sources of variability in count-based biomarkers

ABSTRACT. Background Analysis using random effects linear models is the established method used in biological variability studies to attribute the observed variability arising from between-patient differences, within-patient differences, and measurement error. However, these models assume underlying normality, and thus may not be applicable for biomarkers based on counts. Aim To present methods for estimating sources of variability in count-based biomarkers and apply and compare the approaches in a case study of patients with Sjogren’s syndrome. Methods Both Poisson and negative binomial models are appropriate for analysis of count data, and methods for obtaining between and within-patient variance estimates are described in Leckie et al1. We analysed the biomarker data using random effects Poisson and negative binomial models, and for comparison, using a random effects linear regression model. The intra class correlation (ICC) was calculated as a ratio of the between-patient variance over the total variance, and was compared across the different models. The AIC and BIC criteria were used to assess each model’s performance. Data from 32 patients with Sjogren’s syndrome was used as a case study, considering the focus score, calculated for each salivary gland observed in each biopsy as the number of foci over the glandular area, multiplied by 4. Between-patient and between-gland within-patient sources of variability were estimated. Results The ICC estimates obtained from Poisson (0.323) and negative binomial models (0.310) were similar, and higher than the linear regression model (0.222). AIC and BIC values were similar for Poisson (AIC=463.63, BIC=469.84) and negative binomial models (AIC=465.55, BIC=474.87) and indicated both were a better fit than the linear regression model (AIC=632.69, BIC=642.01). Conclusion It is important to properly model the distribution of biomarkers based on count data to correctly estimate sources of variability and measurement error. Keywords Biomarkers, variance, random effects, count data

Longitudinal progression of frailty in older population and risk of adverse events: An application of joint models

ABSTRACT. Context: A validated electronic frailty index (eFI) based on cumulative deficits in the electronic primary care record has been implemented in UK primary care to identify older people living with frailty, to better target clinical care1. Assessing the change of eFI over time could enhance understanding of eFI progression and variation, and the nature of its association with the risk of adverse events.

Objective: To evaluate the application of joint models to frailty progression and risk of death or unplanned hospital admission in elderly people using routinely collected data.

Methods: Patients identified from the UK Clinical Practice Research Datalink (CPRD GOLD) linked to Hospital Episode Statistics and Office for National Statistics mortality register aged over 65 at January 1st 2009 were included. eFI scores were calculated annually to assess progression of frailty for up to 5 years. Outcomes of mortality and unplanned hospital admission were assessed during the same time window. Joint modelling combines time and event into one model. Joint models were fit to assess the risk of all-cause mortality and unplanned hospital admission, separately. Both models were adjusted for baseline age and gender in their longitudinal and survival parts.

Results: A total 475,698 were included in this study. Mean baseline age was 75.1 (SD 7.4) and 211,849 (44.5%) were male. 244,240 had unplanned hospital admissions and 54,107 died during follow-up. eFI score showed a deterioration in 155,732 (63.8%) of those hospitalised and 43,484 (80.4%) of those who died. eFI slope over time was 0.41 (95%CI 0.41, 0.41) for hospitalisation and 0.42 (0.42, 0.42) for death. eFI progression was associated with a higher risk of death and hospitalisation; hazard ratio of the association term (eFI) was 1.12 (1.12, 1.12) for hospitalisation and 1.13 (95%CI 1.13, 1.14) for death.

Conclusion: This example demonstrates the applicability of joint models in this area and the potential value of using frailty trajectories over a cross-sectional assessment. It can be used to inform health practitioners on the risk of adverse outcomes for frail patients based on their eFI progression. Thus, they will be able to target the limited resources towards the most vulnerable patients.

Semi-variogram approach to estimate within-subject variability in repeated measurements

ABSTRACT. Context: Components of variability in biomarker measurements are best estimated in Biological Variability Studies (BVS). Longitudinal BVS aim to estimate components of variability in repeated biomarker measurements, ideally taken on all subjects at the same time-points. However, conducting prospective longitudinal BVS is not always feasible. We are investigating how measurement error and true change components of within-subject variability can be estimated retrospectively using pre-existing longitudinal data, such as in clinical trial and routine datasets. Understanding these components of variability is critical in evaluating properties of biomarkers and monitoring programmes. Objective: To demonstrate a semi-variogram approach based on the work of Peter J Diggle et al, to estimate components of within-subject variability in repeated systolic blood pressure measurements on children in the UK monitored every 6 months over 42 months. Subjects were treated for an initial episode of nephrotic syndrome as part of the PREDNOS RCT. Methods: Variability in measurements on subjects over time was attributed to three components: differences in mean measurements between-subjects; differences in change-from-mean measurements between-time-points within-subjects, referred to as the ‘signal’; and differences in measurement errors within-time-points within-subjects, referred to as the ‘noise’. The semi-variogram approach considers differences in measurements between-time-points within-subjects; the semi-variances of these differences increase towards a ceiling comprised of the combination of ‘signal’ plus ‘noise’. The semi-variogram is therefore a plot of semi-variance of differences on the y-axis, against time separation on the x-axis. Extrapolation of the intercept is by a regression model, which estimates the ‘noise’. The ‘signal’ can be estimated from the modelled estimate at the ceiling, minus the ‘noise’. Results: In 159 subjects, the ‘noise’ estimate accounted for 81.0% (95% CI: 53.6%, 94.0%) of the total within-subject variability. Such results could be used to indicate when a biomarker monitoring programme may be more likely to detect measurement error (‘noise’) over true change (‘signal’). Conclusion: The semi-variogram approach has potential in retrospectively estimating components of within-subject variability using pre-existing datasets. However, further research is due to investigate the precision of estimates, and risks of bias from missing measurements and missing documentation on the context in which measurements were taken.

Non-linear dynamic indices summarize densely sampled longitudinal data

ABSTRACT. Background: Densely sampled longitudinal data have become ubiquitous with wearable device measurement technology presenting a challenge for traditional statistical analysis. However, these data also present an advantage over traditional longitudinal sampling. These allow learning about the dynamics of the underlying process, which generates them. Traditional and modern time-series and dynamic models can be applied to these data. The aim of this project is to assess how well established indices from non-linear dynamics represent these data and how to best use these indices within other statistical modeling frameworks. Methods: Monthly glycated hemoglobin measurements were captured for up to nine years. These measurements come from patients with Type I diabetes without microvascular complications from the Diabetes Control and Complications Trial. Indices from Poincare Plots were estimated from these monthly series for each individual. Overall (global) and yearly indices (local) were calculated to identify the most suitable formulation for both estimation and prediction purposes. We assessed the fitting via mean squared root error of three different formulations of the indices in case we would like to include these densely sampled longitudinal data within a joint model for example: (1) model based on global parameters, (2) model based on global average and local dynamics, (3) model based on both local average and dynamics. Results: For series represented by larger Poincare indices (those above 1), the formulation that used at least one global parameter resulted in smaller errors. When indices were below 1, the formulation that used both local parameters for signal reconstruction resulted in smaller errors. Discussion: Non-linear dynamic indices can be used to summarize densely sampled longitudinal data. Results are somewhat counterintuitive in the sense that for large variation across yearly windows, global parameters provide a good fit when you would think that local parameters would give a better description of the observations encompassed in the corresponding analytic window. Future work includes: (1) examining further details of the information contained in the analytic window to identify the underlying reasons for the apparently paradoxical findings, and (2) examining patients with other disease profiles to assess the robustness of the methods for different disease conditions.

Kernel density estimation for circular data about COVID-19 in the Czech Republic

ABSTRACT. The term circular statistics describes a set of techniques used to model distributions of random variables that are cyclic in nature and these approaches can be easily adapted to temporal data recorded, e.g., daily, weekly or monthly. One of the nonparametric possibilities how to analyze these data is through kernel estimations of circular densities where the problem of how much to smooth, i.e., how to choose the bandwidth, is crucial.

In this presentation we describe the existing methods: cross-validation method, smoothed cross-validation, adaptive method and propose their modifications. We apply these methods on real data from the Institute of health information and statistics of the Czech Republic about total (cumulative) number of persons with a proven COVID-19 infection according to regional hygienic stations, number of cured persons, number of deaths and tests performed for whole country and regions coded based on nomenclature of territorial units for Statistics (NUTS). The results are visualized as circular histograms (rose diagrams) and calculated standardized characteristics are superimposed with choropleth map, where NUTS are shaded in diverging color scheme. All statistical analyses are performed in the R software.

Acknowledgment: The work was supported (partly) by the long-term strategic development financing of the Institute of Computer Science (RVO:67985807) and specific research of Masaryk University as support for student projects (MUNI/A/1615/2020).

References: Horová, I., Koláček, J. and Zelinka, J. (2012), Kernel Smoothing in MATLAB: theory and practice of kernel smoothing, World scientific, Singapore. Taylor, C. C. (2008). Automatic bandwidth selection for circular density estimation. Computational Statistics & Data Analysis, 52(7), 3493–3500. Ley Ch., Verdebout T. (2019), Applied directional statistics: Modern methods and case studies, Chapman and Hall, London.

Functional analysis of temporal data about patient’s health condition after total knee replacement

ABSTRACT. Persistent knee pain while walking or at rest, often caused by osteoarthritis (destruction of cartilage and changes of its mechanical properties), leads to total knee replacement (TKR) using arthroplasty implants. Patient Reported Outcome Measure questionnaires (PROMs) are commonly used to evaluate patient's condition. Some of the most commonly used PROMs for patients after TKR are Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) and Knee Society Score (KSS). WOMAC was designed to measure patient's degree of pain, stiffness and functional limitations of the affected joint through patient's self-evaluation. KSS combines patient's objective and functional characteristics and is filled by attending physician. Both of these questionnaires are used at Martin University Hospital during examinations of patients after TKR, where the data about TKR are recorded in the Slovakian Arthroplasty Register (SAR).

This study includes 2295 patients, who underwent primary TKR between January 1st, 2006 and December 31st, 2020. Patients were monitored before the surgery and then approximately 3, 6, 12, 24, and 36 months after the surgery. If a revision of primary TKR recorded in SAR was performed during the course of this follow-up, the patient is excluded from the subsequent statistical analyses. Our aim is to retrospectively evaluate patient's condition using their WOMAC scores and KSS and compare the condition and its changes through time among three structurally different types of knee implants, such as cruciate retaining, posterior stabilized condylar constrained and hinge knee implant, using functional data analysis and cubic splines. Statistical analyses were carried out using R software environment.

Acknowledgment: The work was supported (partly) by the long-term strategic development financing of the Institute of Computer Science (RVO:67985807) and specific research of Masaryk University as support for student projects (MUNI/A/1615/2020).

References: Ramsay, J.O., Silverman, B.W., 2005: Functional Data Analysis. New York: Springer-Verlag Nečas, L., Katina, S., Uhlárová J., 2013: Survival analysis of total hip and knee replacement in Slovakia 2003–2011. Acta Chirurgiae Orthopaedicae et Traumatologiae Cechoslovaca 80,1, 1–85.

Multivariate Analysis of Blood Biomarkers in Amyotrophic Lateral Sclerosis

ABSTRACT. Context: Amyotrophic lateral sclerosis (ALS) is a disease characterised by the selective degeneration of upper and lower motor neurons, resulting in paralysis of the respiratory muscles and death, typically within 3-5 years following symptom onset. It is a rare disease, and the pathogenesis is not well understood. The current study aims to expand on our current knowledge about the role of inflammatory biomarkers in the disease progression.

Problem and aim: There are few patients, the patients are very heterogeneous (both with regards to clinical characteristics and blood biomarkers), and the biological analysis of the samples is resource intensive, resulting in a small sample size and a large number of variables with missing data. Due to the small sample sizes, in this type of data usually each biomarker is analysed separately. Our goal is to predict disease progression (employing the ALSFRS-R scale) and survival based on the inflammatory biomarkers measured at the time of diagnosis (and used all together in one analysis) and other clinical characteristics.

Methods: We used multiple imputation to deal with the incompleteness, which allowed us to use data from all 85 patients, then an exploratory factor analysis identified five latent constructs, which we used in a cluster analysis to classify patients into groups that differ in functional decline based on a linear mixed model excluding clinical characteristics. We compared the differences in survival for the obtained clusters with Cox proportional hazards model.

Results: Five factors, collecting variables of specific cell types, were obtained. K-means clustering of the factor scores resulted in four clusters (each with at least ten patients). Two clusters had slow and two had faster decline in function, which was also reflected in the survival analysis. The differences between the clusters were attributed to a specific factor containing all variables of the same cell type (resting regulatory T cells).

Conclusions: Despite the sample size issues, we found clinically relevant latent constructs which allowed to classify the patients according to functional decline.

Possibilities and challenges when analysing large longitudinal data from a population based Norwegian registry (MoBa study).

ABSTRACT. Background : Large registry based data are a valuable source of information and are used extensively in medical research. All individuals born in Norway are given a unique ID number which makes it possible to link information from several registries. However, analyzing such data requires a high level of expertise and collaboration both from a data analyst and health professionals. The Norwegian Mother and Child cohort study (MoBA) established in 1999, collects longitudinal data on demographic and background characteristics of infants born in Norway, their mothers, and fathers. Child development and personality traits are assessed using questionnaires. However, different versions of the same questionnaires have been used over time with different numbers of included items thus making it challenging to use these data in longitudinal studies. Objective: to estimate associations between disruptive sleep patterns and colic at 6 months and child development later in life assessed at 18 months, 3 and 5 years using ASQ (Ages and Stages Questionnaire) and CBCL (Child Behavior Check-list) questionnaires. Methods: We used z-scores to evaluate differences between groups of children both at given time points and across the whole follow up as the number of included items varied for both questionnaires. The z-scores were constructed as follows: at each assessment point the dataset was divided into two groups based on the exposure variable (e.g. having a sleep problem or not). Those who did not report having sleeping problems served as the reference population. Changes at given time points and across the follow up trajectory were estimated with linear mixed models for repeated measures with unstructured covariance matrix to account for dependencies within included individuals. Results: We have analysed 75188 children using data collected at four time points. Our data revealed very small but statistically significant differences between children with disruptive sleep patterns early in life and the reference population. Conclusion: Our analyses revealed that disruptive sleep early in life does not have a clinically relevant negative effect on child development later in life. Correct and clinically relevant interpretation of results from large registry based studies require a close collaboration between medical professionals and statisticians.

Estimating time to confirmed disease progression in observational data sources with irregular visit schedules

ABSTRACT. Background: Multiple sclerosis (MS) is a chronic, progressive neurological disease. While long term follow-up of patients in observational data sources offer countless opportunities for comparative effectiveness research in MS, they also entail specific challenges. For example, time to confirmed disease progression (CDP), an important outcome in MS, is defined from treatment initiation until an increase on a severity score, provided the increase can be confirmed 6 months later. This definition is challenging to apply in observational data sources where patients follow irregular visit schedules because a progression and its confirmation may not be recorded during a visit.

Objective: To compare two single imputation approaches (last observation carried forward and rounding) to a multiple imputation approach (linear mixed model accounting for autocorrelation) for analyzing time to CDP in observational data sources.

Methods: Data were generated under various scenarios such that the informative visit mechanism was dependent on the observed covariates, on the treatment received or on the underlying outcome process. We compared the three imputation approaches to recover the missing EDSS scores, time to CDP and the hazard ratio. In a real data example, we compared dimethyl fumarate to fingolimod on time to CDP using data from the Multiple Sclerosis Partners Advancing Technology and Health Solutions database.

Results: The linear mixed model led to the lowest mean squared error for recovering the time to CDP across all simulation scenarios. Hazard ratio estimators were generally less biased and more efficient with the linear mixed model, especially when the visit process depended on the treatment or outcome. The linear mixed model also yielded much better coverage in these situations. In the real data example, preliminary results found diverging hazard ratio estimates across the three imputation methods, although confidence intervals were wide and overlapping.

Conclusions: Last observation carried forward and rounding allow to recover individual trajectories of disease progression, but the corresponding comparative effectiveness analysis does not account for uncertainty of the imputations and may be inaccurate. More advanced methods based on linear mixed models can improve imputations of time to CDP, and thus lead to more reliable inference on treatment effect.

14:45-16:15 Session IS1: INVITED : Modelling the global spread of Covid19 and impact of interventions
Nowcasting the spread of COVID-19 to inform control policies in Hong Kong

ABSTRACT. In this talk, I will discuss two of the COVID-19 nowcasting projects in my group that help informed the Hong Kong government on its pandemic control policies.

In the first project, we developed a new framework that parameterizes disease transmission models with age-specific digital mobility data. By fitting the model to case data in Hong Kong, we were able to accurately track the local effective reproduction number of COVID-19 in near real-time (i.e. no longer constrained by the delay of around 9 days between infection and reporting of cases) which is essential for quick assessment of the effectiveness of interventions on reducing transmissibility. Our findings showed that accurate nowcast and forecast of COVID-19 epidemics can be obtained by integrating valid digital proxies of physical mixing into conventional epidemic models.

In the second project, we assessed the relative transmissibility of two new SARS-CoV-2 lineages with the N501Y mutation in the receptor-binding domain of the spike protein have spread rapidly in the United Kingdom in late-2020. We estimated that the earlier 501Y lineage without amino acid deletion Δ69/Δ70 circulating mainly between early September to mid-November was 10% (6-13%) more transmissible than the 501N lineage, and the 501Y lineage with amino acid deletion Δ69/Δ70 circulating since late September was 75% (70-80%) more transmissible than the 501N lineage.

Statistical challenges in the monitoring of the SARS-CoV-2 pandemic

ABSTRACT. In England policy decisions during the SARS-COV-2 pandemic have relied on prompt scientific evidence on the state of the pandemic. Through participation in advisory groups to the government we have contributed to the real time pandemic assessment for the last fifteen months, from the early phase, to the current potential third wave, providing estimates of key epidemic quantities and short-to-medium-term projections of severe disease. In this talk I describe the statistical transmission modelling framework we have used throughout and the model developments introduced at various stages of the pandemic to tackle the challenges posed by the changing epidemiology, including: the changing relationship between the available data streams; the introduction of vaccination; and the ecological effects of new variants. In addition, I discuss the computational efforts made to ensure that model developments would not jeopardise the timely provision of relevant outputs to policy makers.

Modelling the COVID-19 pandemic: initial introduction, lockdown assessment, phasing out

ABSTRACT. Since its initial description in 2020, COVID-19 totaled millions of cases worldwide. Modelling has been used at all stages of the pandemic and will be illustrated here with examples taken with the French situation. Initial spread was analyzed by looking at the volume of exportation jointly with the pattern of flights out of the first recognized source in China and shed some light on features of the disease. After initial introduction, the disease spread in many countries, leading to unprecedented decisions for control including lockdown and curfews. Estimating the extent of disease spread after the first wave and to what extent the trace & test policies covered the population presented several challenges. Computing real time trends improved small term projections of disease spread. As vaccines were developed, optimal allocation in a context of uncertainty and limited availability could be studied as a means to phase out of social distancing. More recently though, the emergence of variants of increased transmissibility led to ask questions regarding overall spread and control anew.

14:45-16:15 Session OC1A: Mendelian randomisation & causal inference
How to deal with collider bias in Mendelian randomization analysis?

ABSTRACT. Background: Mendelian randomization (MR) is a method used to estimate the causal effect of a risk factor on an outcome by using genetic variants as instrumental variables. An instrumental variable must be associated with the exposure, and with the outcome only through the exposure and cannot be associated with any confounders. When the aim of the study is to obtain a causal effect for a particular subgroup of the population rather than a population causal effect (for example by stratifying on a specific variable), collider bias could be generated. Collider bias appears when we control for one variable that it is influenced by other two variables This bias could lead to an association between the instrument and the outcome, violating the instrument assumptions, and it could lead to invalid results.

Objectives: To identify potential collider bias in MR studies, to assess its impact on the causal effect estimates and to evaluate a new technique in MR as a solution for controlling the collider bias.

Methods: We proposed the stratification approach in MR analysis to study the causality between the risk factor and the outcome controlling the collider bias. This method creates a new variable based on both the collider and the instrument, which is afterwards categorized into quartiles and then the stratum-specific causal effects can be estimated. With this solution, we control the collider bias and therefore the estimates effect are unbiased. We simulate several datasets considering different levels of collider bias and strongness of the instrument. We apply this method to a real dataset. Results: This is an on-going study, and our aim is to be able to identify those scenarios where the collider could generate biased causal estimates and to find if the stratification approach is an appropriate solution. This method will be applied to estimate the causal effect of diabetes mellitus on pancreatic cancer among different levels of body mass index.

Conclusions: We detail a methodological process to analyse the impact of the collider bias and a potential solution to estimate the local causal effect.

The impact of instruments’ selection on Mendelian randomization results: a case study

ABSTRACT. Mendelian randomization (MR) investigates the causal effect of modifiable exposures on health outcomes within the Instrumental variable (IV) framework, using genetic variants as instruments. Therefore, a genetic variant should satisfy the IV assumptions to be considered as a valid instrument. MR analyses performed on summary data from large genetic association studies are increasingly being used and characterized by an increasing number of potential candidate instruments and a greater power. But, when multiple instruments are available, IV assumptions can be questionable and findings might highly depend on how those instruments are selected [1]. The present study aims at evaluating how MR estimates are sensitive to sets of instruments selected using different criteria. Applying different genetic data sources, the causal role of inflammation on Parkinson’s disease (PD) is investigated as a motivating example.

Considering two-sample MR methods available for summary genetic data, two main practical issues related to the instruments’ selection were explored: the Linkage Disequilibrium (LD) threshold and the pleiotropy strategy. Regarding LD, three sets of instruments were identified according to different thresholds of correlation among IVs. To rule out the presence of pleiotropy, two sets of instruments were derived excluding: (i) IVs with extreme Q-statistic; (ii) IVs identified as outliers by the Radial plot method. Causal effects were estimated using both fixed and random effects Inverse Variance Weighted methods and nine robust MR estimators [2]. The inflammation marker C-reactive Protein was used as exposure and PD as the outcome.

When different LD thresholds were set, no important differences in MR estimates were observed within each method. And there was no consensus between results obtained with different methods. Moreover, a different exclusion strategy for reducing pleiotropic IVs led to opposite effect estimates.

In MR studies, the findings can be sensitive to the choice of the instruments. Identification of appropriate strategies for selecting genetic variants is necessary and potential study-specific issues that may arise when different sets of instruments are considered should be accounted for.

Caution When Inferring the Effect Direction in Mendelian Randomization

ABSTRACT. In genetic association studies, Mendelian Randomization (MR) has gained in popularity as a concept to assess the causal relationship between two phenotypes. Some methods, the MR Steiger and bidirectional MR approaches, have been proposed as tools that can infer the causal direction between two phenotypes. Through simulation studies, we extend previous work to examine the ability of the MR Steiger and bidirectional MR approaches to correctly determine the effect direction in the presence of pleiotropy, measurement error, and unmeasured confounding. In addition, we examined the performance of these approaches when there is a longitudinal causal relationship between the two phenotypes, under weak instrument variables, and differing distributions for the phenotypes (binary, Poisson, etc). We also applied the Steiger method and bidirectional MR approach to the COPDGene study, a case-control study of Chronic Obstructive Pulmonary Disease (COPD) in current and former smokers, to examine the role of smoking on lung function in the presence of pleiotropy and measurement error.

Tying research question and analytical strategy when variables are affected by medication use

ABSTRACT. In an epidemiological setting where measurements fluctuate over time due to intercurrent events, ill-defined research questions could be particularly problematic. Medication use is one important cause for this change, as it is prescribed to target specific measures. When a research question fails to specify how the medication use should be handled methodologically, arbitrary decisions may be made during the analysis phase, which likely leads to a mismatch between the intended research question and the performed analysis. The mismatch can result in vastly different interpretations of the estimated effects depending on how one handled medication use. In some cases, the estimated effect may not even provide any clinical relevance. Thus, a research question such as ‘what is the effect of X on Y?’ requires further elaboration, and it should take into account whether and how medication use has affected the measurements of interest.

The importance of well defining a research question when intercurrent events occur has been stressed by causal inference experts (Young et al., 2020). In the field of randomized trials, several different estimands for intercurrent events such as post-randomization medication use have been proposed (ICH E9 addendum, 2020). Despite this, a recently conducted literature review by us on the handling of medication use in medical studies (in preparation) showed that the research question was formulated vaguely in a majority of studies and their intended aims were unclear.

Therefore in our study, we will discuss how well-defined research questions can be formulated when medication use is involved in observational studies. We will distinguish between a situation where a (possible time-varying) exposure is affected by medication use and where the outcome of interest is affected by medication use. For each setting, we will give examples of different research questions that could be asked depending on how medication use is taken into account in the estimand and discuss methodological considerations under each research question.

Identification of causal effects in case-control studies

ABSTRACT. Introduction: Case-control designs are an important yet commonly misunderstood tool in the epidemiologist’s arsenal for causal inference. We reconsider classical concepts, assumptions and principles and explore when the results of case-control studies can be endowed a causal interpretation.

Methods: We present a framework that is rich enough to articulate and study various target causal quantities (i.e., estimands) relating to intention-to-treat or per-protocol effects. We then establish how, and under which conditions, these estimands can be identified based on the data that are collected under popular sampling schemes (case-base, survivor, and risk-set sampling, with or without matching).

Results: We present a concise summary of our identification results that link the estimands to the (distribution of the) available data and articulate under which conditions these links hold.

Conclusion: The modern epidemiologist’s arsenal for causal inference is well-suited to make transparent for case-control designs what assumptions are necessary or sufficient to endow the respective study results with a causal interpretation and, in turn, help resolve or prevent misunderstanding. Our approach may inform future research on different estimands, other variations of the case-control design or settings with additional complexities.

14:45-16:15 Session OC1B: Bayesian clinical trial design (1)
Modular components in basket trials and connections among the applied tools

ABSTRACT. Basket trials have been a hot-topic in medical and statistical research throughout the last ten years. The goal of basket trials is to analyse one treatment among several diseases in one trial based on the medical justification of a common (genetic) pathway of the treatment. For this setting many different designs with different statistical techniques were proposed in the literature which resulted in an unclear situation of methodological opportunities.

Therefore, a modular approach is presented to introduce a structure to basket trial designs and to the available techniques. Currently used designs employ either only frequentist techniques [1], only Bayesian methods [2] or a combination of them throughout the course of the trial. Within the three genereal techniques a variety of different tools are applied. To clarify the situation, the available tools are classified and simplified, also connections are presented which show differences, similarities or even equivalence of tools that at first appear different. Knowing about connections of the tools facilitates the planning of basket trials and also their conduct in hands-on medical research because researchers have the opportunity to design the basket trial in the sense of ‘as simple as possible and as complex as necessary’ which makes a basket trial more tangible for all stakeholders of new clinical trials. We consider the presented work as a required basis to promote knowledge about basket trials, to motivate further research and to ultimately set the path for individualized treatments that showed their benefit in basket trials.

1. Cunanan K M, Iasonos A, Shen R, Begg C B, Gönen M. An efficient basket trial design. Statistics in Medicine. 2017; 36(10): 1568-1579.

2. Fujikawa, K, Teramukai, S, Yokota, I, Daimon, T. A Bayesian basket trial design that borrows information across strata based on the similarity between the posterior distributions of the response probability. Biometrical Journal. 2020; 62: 330– 338

Seamless Master Protocol with account for correlations within subgroups

ABSTRACT. Introduction: Master Protocol designs (Basket, Umbrella, Platform) allow simultaneous comparison of multiple treatments or disease subgroups. Master protocols can be also designed as seamless studies, in which two or more clinical phases are considered within the same trial. They can be divided in two categories: operational seamless, in which the two phases are separated as two independent studies or inferential seamless, in which the interim analysis is considered as an adaptation of the study. Bayesian designs taking into account correlation between treatments and doses are scarcely studied.

Aim: To propose and compare two Bayesian seamless Phase II/III designs (operational and inferential) using a binary endpoint for the first stage and a time to event endpoint for the second step.

Methods: For the first stage, a Bayesian hierarchical model accounting for multiple doses of multiple treatments while taking partial ordering the correlation structure into account was developed. After treatment and dose selection, based on posterior and predictive probabilities, the results of the first phase were incorporated into prior distributions of a time-to-event model. Extensive simulations were performed in order to compare the robustness and operating characteristics of the two seamless designs depending on several prior variabilities or effective sample sizes.

Results: The inferential seamless has in average better operating characteristics in terms of sample size required and precision. If one which to obtain the same operating characteristics under the operational seamless design, a bigger sample size is needed.

Conclusion: When using stage one data to build the prior distributions for the time to event model, it should be done carefully in order to not overpower the posterior distributions and influence trial results. Our proposition allows to avoid this kind of issue.

Monotonicity Rules for Inference in Basket Trials

ABSTRACT. A basket trial is a new type of clinical trial where several subgroups are exposed to a new treatment. They are especially popular in oncology, where the subgroups, denoted as so-called baskets, usually comprise patients with different primary tumor sites but a common biomarker or mutation. Most basket trials are single-arm phase II trials that investigate a binary endpoint such as tumor response. Several designs for such trials have been proposed. In simple approaches, the results of subgroups are pooled before the analysis if the results are similar. Advanced designs allow a more nuanced combination of results, many of which use Bayesian tools to share information between baskets. An exciting method was recently proposed by Fujikawa et al. (2020). In their design, the individual baskets are initially analyzed using a beta-binomial model. Information is then shared by computing the posterior distribution parameters as a weighted sum of the parameters of the individual posterior distributions. The weights are derived from a similarity measure between the individual posterior distributions and further tuning parameters. Compared to other basket trial designs with Bayesian components, this design is computationally very inexpensive. Operating characteristics can be calculated directly without having to rely on simulations. While information sharing increases the power, it can also lead to undesired results. Using the example of Fujikawa et al.‘s design, we show that the number of null hypotheses that can be rejected is not always monotonically increasing in the number of observed responses. As a consequence, there are scenarios where the treatment is declared futile in all baskets even when there are at least as many responses in all baskets as in other scenarios where the treatment is declared efficacious in at least one basket. Furthermore there are scenarios where the treatment is declared efficacious in some baskets but is declared futile in other baskets with more responses. We define monotonicity rules for the inference in basket trials and assess how the choice of the tuning parameter values that determine the amount of information sharing has an influence on whether these rules hold.

Response-adaptive randomization in clinical trials: from myths to practical considerations

ABSTRACT. Response-adaptive randomization (RAR) is part of a wider class of data-dependent sampling algorithms, for which clinical trials have commonly been used as a motivating application. In that context, patient allocation to treatments is defined using the accrued data on responses to alter randomization probabilities, in order to achieve different experimental goals. RAR has received abundant theoretical attention from the biostatistical literature since the 1930's and has been the subject of numerous debates. Recently it has received renewed consideration from the applied community due to successful practical examples and its widespread use in machine learning. Many position papers on the subject each give a specific or partial view on its usefulness and these views may be difficult for the non-expert to navigate through. This work aims to address this gap by providing a unified and updated review of methodological and practical issues to consider when debating the use of RAR in clinical trials.

Treatment allocation strategies for umbrella trials in the presence of multiple biomarkers: A comparison of methods

ABSTRACT. Background: Umbrella trials are an innovative trial design where different treatments are matched with subtypes of a disease, with the matching typically based on a set of predictive biomarkers. In practice, patients can test positive for multiple targeted biomarkers, hence eligible for multiple treatment arms. Consequently, different approaches could be applied to allocate multiple biomarker patients to a specific treatment arm. However, it is currently unclear how the method used to account for multiple biomarkers will affect the statistical properties of the trial. Methods: We conduct a simulation study to compare five approaches that have been or could be implemented to guide treatment allocation in the presence of multiple biomarkers – equal randomisation; randomisation with fixed probability of allocation to control; Bayesian adaptive randomisation (BAR); constrained randomisation (CR); and hierarchy of biomarkers. We evaluate these approaches on six operating characteristics under different scenarios in the context of a hypothetical phase II biomarker-guided umbrella trial. We assume a binary endpoint and restrict our focus to a setting of four targeted biomarkers and their linked treatments plus a single control. We define the pairings representing the pre-trial expectations on efficacy as linked pairs. Results: The hierarchy and BAR approaches have the highest power to detect a treatment-biomarker linked interaction. However, the BAR method is more robust to the biomarker ordering being invalid, a scenario when the hierarchy procedure performs poorly. When a treatment delivers an unanticipated detrimental effect, the BAR method allocates a higher proportion of multiple biomarker patients to the most promising treatments. On the other hand, The CR procedure allocates on average a higher proportion of patients to experimental treatments and thus best balances allocation to all treatment arms. Notably, all methods have reasonable bias in all scenarios. Conclusion: The choice of an approach to deal with treatment allocation in the presence of multiple biomarkers may considerably influence the trial operational characteristics. However, no method is optimal in all settings. Thus, pre-specification of an approach is important, and should be considered carefully in the context of the trial sample size, prevalence of the biomarkers, and prevalence of individual overlaps.

14:45-16:15 Session OC1C: Network meta analysis
Loop-splitting in network meta-analysis: a new approach to evaluating loop inconsistency

ABSTRACT. In a network meta-analysis, clinical trials evaluating multiple different treatment comparisons are modelled simultaneously, and estimation is informed by a combination of direct and indirect evidence. Network meta-analysis relies on an assumption of consistency, meaning that direct and indirect evidence should agree for each treatment comparison. Tests for inconsistency may be local or global. Existing tests do not handle treatments symmetrically and global tests based on a design-by-treatment interaction approach lack power. Here we propose new local and global tests for inconsistency and demonstrate their application to two example networks. We apply the local test to a loop of treatments in the network meta-analysis. We define a model with one inconsistency parameter that can be interpreted as loop inconsistency. The model builds on the existing ideas of node-splitting or side-splitting in network meta-analysis. To provide a global test for inconsistency, the model can be extended across multiple independent loops with one degree of freedom per loop. We describe an algorithm for identifying independent loops within a network meta-analysis. The models are applied first to a small network meta-analysis comparing 4 treatments for promoting smoking cessation. Local tests for inconsistency are applied to each loop and show no evidence of local loop inconsistency. Global tests for loop inconsistency are applied to every combination of independent loops. We demonstrate the invariance of the global model to choice of loops and find no global evidence of inconsistency (p=0.67). Next, the models are applied to a large network meta-analysis comparing 12 antidepressant drugs in adults with major depressive disorder. We describe how to identify a set of 31 independent loops in this network. The global model is applied and shows no global evidence of inconsistency (p=0.51). Our proposed models handle treatments symmetrically and are invariant to choice of reference treatment, which makes interpretation easier. The global model is invariant to choice of independent loops and we have shown how to identify a set of independent loops. In comparison with the existing approach to testing for global inconsistency in network meta-analysis, our model uses fewer degrees of freedom and is expected to improve power.

Decision curve analysis for treatment benefit in a network meta-analysis framework

ABSTRACT. Background: Predicting individualized treatment effects is of great importance, so that a treatment might be targeted to individuals who benefit and be avoided by those who don’t. Traditional measures of predictive accuracy, such as discrimination, do not evaluate the clinical usefulness of a model. Decision curve analysis can be used to determine whether a model should be applied in clinical practice or not, and is well described in the literature for models that compare two treatments in a randomized clinical trial.1 Objectives: Our main objective is to extend the decision curve analysis methodology into a network meta-analysis framework, where several treatment options are compared in several trials. We also exemplify our methodology on a prediction model for treating patients diagnosed with relapsing-remitting multiple sclerosis (RRMS). Methods: A threshold probability, based on considerations about the harms associated with each treatment and with an event, need to be determined for each treatment. Our methodology includes 12 steps to estimate the net benefit, i.e. the difference between the absolute benefits and the harms under possible strategies: treat all patients with the most effective treatment on average, treat patients based on a prediction model about multiple treatments, and treat none. The net benefit per strategy can then be plotted for a range of threshold probabilities, to reveal the most clinically useful strategy. Results: We applied our methodology on a network meta-analysis individualized prediction model, which we have previously developed for RRMS.2 We illustrated our decision curve analysis methodology using two different options: a common threshold probability for all treatments, and different threshold probabilities for each treatment. Our model appears to be clinical useful for a small range of threshold probabilities under both options. Conclusions: As models to compare different treatments for individualized prediction models are becoming widely used, our methodology could be an important tool for assessing the impact of such models in the clinical practice.

The Risk Of Bias due to Missing Evidence in Network meta-analysis (ROB-MEN) tool: web application and implementation in a network of antidepressant drugs

ABSTRACT. The risk of bias due to missing evidence, also called reporting bias, threatens the validity of systematic reviews and meta-analysis and, therefore, potentially affects clinical decision-making. Various methods are available to assess selective outcome reporting and publication bias separately, but a rigorous framework to evaluate the impact of both sources of bias on the meta-analysis results of a network of interventions is still lacking. We developed a framework and tool to assess the Risk Of Bias due to Missing Evidence in Network meta-analysis (ROB-MEN). We built on the tool for assessing Risk Of Bias due to Missing Evidence (ROB-ME, developed by Page et al. We expanded the ROB-ME framework to network meta-analysis (NMA). We used qualitative and quantitative methods to combine the risk of bias due to missing evidence in pairwise comparisons with their impact on the network estimates. Our framework first evaluates the risk of bias due to missing evidence for each direct comparison separately. We consider possible bias due to the presence of studies with unavailable results (known unknowns) and the potential for unpublished studies (unknown unknowns) before reaching an overall judgement about the risk of bias due to missing evidence in each comparison. Then, we evaluate the risk of bias due to missing evidence in each NMA estimate, which is our tool's final output. The bias and contributions from direct comparisons to the NMA are thus combined with the likelihood of small study-effects as evaluated by network meta-regression and the bias from unobserved comparisons. We present the ROB-MEN tool and illustrate its application in an NMA of 18 antidepressants from head-to-head studies* using the R Shiny app that we developed to facilitate the assessment process. The ROB-MEN tool will also be implemented in the open-source CINeMA web application ( to supplement the reporting bias domain.

*Cipriani A, Furukawa TA, Salanti G, et al. Comparative efficacy and acceptability of 21 antidepressant drugs for the acute treatment of adults with major depressive disorder: a systematic review and network meta-analysis. The Lancet 2018;391:1357–66. doi:10.1016/S0140-6736(17)32802-7

Network meta-analysis and random walks

ABSTRACT. Network meta-analysis (NMA) has been established as a central tool of evidence synthesis in clinical research. The results of an NMA depend critically on the quality of evidence being pooled. In assessing the validity of an NMA, it is therefore important to know the proportion contributions of each direct treatment comparison to each network treatment effect. To construct the matrix of contributions, Papakonstantinou et al (2018) presented an algorithm that identifies the flow of evidence in each evidence ‘path’ and decomposes it into direct comparisons. This method is based on the observation by König et al (2013) that each row of the hat matrix represents an evidence flow network for each treatment comparison. However, in certain cases, the algorithm presented by Papakonstantinou et al. is associated with ambiguity according to the selection of paths. The aim of our work is to demonstrate the analogy between NMA and random walks. We also aim to illustrate the clinical application of this analogy in deriving the proportion contribution matrix. A random walk on a graph is a stochastic process that describes a succession of random ‘hops’ between vertices which are connected by an edge. The weight of an edge relates to the probability that the walker moves along that edge. In statistical mechanics, there exists a well-established connection between electrical networks and random walks. We use the existing analogy between meta-analytic and electrical networks to construct the transition matrix for a random walk on the network of evidence. We show that the net number of times a walker crosses each edge of the network is directly related to the evidence flow graph in König et al. By then defining a random walk on the directed evidence flow network, we derive analytically the matrix of proportion contributions. The interdisciplinary analogy between NMA and random walks provides new insight into NMA methodology. In particular, the analogy leads to a derivation of the proportion contribution matrix without the ambiguity of previous algorithms. Our approach can therefore be used to reliably quantify the contribution of individual study limitations to the resulting network treatment effects.

Bayesian network meta-analysis methods for combining IPD and aggregate data from single-arm studies and RCTs

ABSTRACT. Background: In health technology assessment, evidence from a single-arm study assessing effectiveness of a new treatment, where individual participant data (IPD) are available, may need to be synthesised with aggregate data in a network of randomised controlled trials (RCTs) assessing existing treatments.

Objective: We aim to develop methods to perform a network meta-analysis (NMA) combining IPD and aggregate data from single-arm studies and RCTs, under either contrast- or arm-based parametrisations, and to compare the methods using an applied example in rheumatoid arthritis (RA).

Methods: We extend the contrast- and arm-based NMA methods by Hong et al [1] for IPD only, using the approach by Saramago et al [2] to combine IPD and aggregate data via shared model parameters. We apply the methods to a network of RCTs assessing biologic therapies as first-line treatments for RA, with American College of Rheumatology (ACR20) response criteria as the outcome measure. The network consists of aggregate-level data from 10 RCTs and IPD from one single-arm study. We also apply a method with independent baseline effects [2], to understand the impact of assuming exchangeable baseline effects.

Results: For tocilizumab compared to placebo, there was an increase in uncertainty when incorporating IPD from an additional single-arm study vs. an additional RCT; for the contrast-based approach, pooled log odds ratio (OR) = 1.55 (0.16, 2.85) vs. 1.54 (0.38, 2.74), and for the arm-based approach, log OR = 1.50 (0.32, 2.76) vs. 1.51 (0.38, 2.62). However, the difference between the estimates was not sufficiently large enough to change conclusions as neither of the 95% credible intervals contained zero. The estimates were also similar to those obtained when assuming independent baseline effects; log OR = 1.56 (0.35, 2.81).

Conclusions: Incorporating IPD from a single-arm study into a network of RCTs, to estimate the relative effect of new vs. existing treatments, can be achieved via an assumption of exchangeable baseline effects. This assumption may introduce some bias (discrepancy). However, in the example presented here it did not result in significant bias. Further research is required to understand when bias can arise and to what extent it can be adjusted for.

14:45-16:15 Session OC1D: Omics and genetic studies
ATLASQTL and EPISPOT: two joint hierarchical approaches for detecting and interpreting hotspots in molecular QTL studies

ABSTRACT. We present ATLASQTL and EPISPOT, two complementary Bayesian sparse regression approaches for joint analysis of molecular quantitative trait locus (QTL) data. ATLASQTL performs QTL mapping on a genome-wide scale, while EPISPOT refines this mapping and generates mechanistic hypotheses by exploiting large panels of epigenetic annotations as predictor-level information. Both approaches consider a series of parallel regressions combined in a hierarchical manner to flexibly accommodate high-dimensional responses (molecular levels) and predictors (genetic variants). This novel framework allows information-sharing across outcomes and variants, and directly models the propensity of variants to be trans hotspots, i.e., to remotely control the levels of many gene products, via a dedicated top-level representation. EPISPOT also couples QTL mapping with a hypothesis-free selection of annotations which contribute to the primary QTL effects. Both methods implement efficient annealed variational inference procedures that improve exploration of multimodal spaces and allow simultaneous analysis of data comprising hundreds of thousands of predictors, and thousands of responses and samples. This unified learning boosts statistical power and sheds light on the mechanistic basis of the uncovered hits. ATLASQTL and EPISPOT therefore mark a step forward to improving the thus far challenging detection and functional interpretation of trans-acting genetic variants, including hotspots. We illustrate the advantages of our framework in simulations emulating real-data conditions and in a monocyte expression QTL study, which confirms known hotspots and reveals new ones, as well as plausible mechanisms of action. Software for our methods are publicly available as packages implemented in R and C++: and

Explained Variation in the Linear Mixed Model

ABSTRACT. The coefficient of determination is a standard characteristic in linear models with quantitative response variables. It is widely used to assess the proportion of variation explained, to determine the goodness-of-fit and to compare models with different covariates. For models with categorical covariables only, the coefficient of determination reduces to the explained sum of squares known from ANOVA. However, there has not been an agreement on a similar quantity for the class of linear mixed models yet. We introduce a natural extension of the well-known adjusted coefficient of determination in linear models to the variance components form of the linear mixed model. This extension is dimensionless, has an intuitive and simple definition in terms of variance explained, is additive for several random effects and reduces to the adjusted coefficient of determination in the linear model without random effects. To this end, we prove a full decomposition of the sum of squares of the independent variable into the explained and residual variance. Based on the restricted maximum likelihood equations, we propose a novel measure for the explained variation which we allocate specifically to the contribution of the fixed and the random covariates of the model. In particular, hierarchical data (clustered data, repeated measurements, longitudinal data) can be investigated by this approach. We illustrate the usefulness of our approach on a typical real dataset with repeated measures, where we are able to partition the variability in the clinical endpoint to fixed effects (such as age, sex, health status), as well as random effects (patients). Another important application is the estimation of the single nucleotide polymorphism heritability in genome-wide association studies with complex traits. We compare our approach with existing approaches (GCTA-GREML, LD score regression) on real datasets of model organisms such as Arabidopsis thaliana.

Reconstructing KIR haplotypes taking ambiguous and missing data into account

ABSTRACT. Background The role of KIR (killer-cell immunoglobulin-like receptors) genes in improving outcomes after allogeneic hematopoietic stem cell transplantation is under debate. It is hard to assess, since the KIR region is genetically complex; exhibiting copy number variations, a huge allelic diversity and alleles are often measured with ambiguities. For better modelling the biological impact of KIR genes, haplotypes have to be reconstructed based on these complex data.

Objectives We present a method to reconstruct haplotype gene content based on genotype calls from independently genotyped KIR genes with a high percentage of ambiguities, i.e., partially missing data. We developed an Expectation-Maximization (EM)-algorithm to deal with the missing data components, taking linkage disequilibrium between different KIR genes into account, to estimate haplotype frequencies. A simulation study was performed to evaluate the effect of combining the information of multiple genes into one analysis and to compare the EM-algorithm with a naïve imputation method.

Methods The complete data is obtained by grouping the donor’s possible diplotypes of all KIR genes. Due to the high data dimensionality, the EM-algorithm has to be refined via heuristic grouping strategies, e.g., summarizing rare information. To further improve efficiency, haplotype frequencies are estimated iteratively, by adding genes one at a time. Our strategy ensures accurate estimates combined with a user controlled dimensionality.

Results Estimated haplotype frequencies with this new method are approximately unbiased in simulations. Real data analysis estimates are close to those obtained in an independent study. Our simulation study shows that the EM-algorithm combines information from multiple genes when linkage disequilibrium is high, whereas increased ambiguity levels increase standard errors. In comparison with a naïve imputation method, the EM-algorithm is more efficient.

Conclusions Our new EM-algorithm based method is the first to account for the full genetic architecture of the KIR loci. This algorithm can handle the numerous observed ambiguities, and allows for the grouping of haplotypes to perform implicit dimension reduction. Combining information from multiple genes improves the accuracy of estimates and allows for better haplotype reconstruction. This method can also be applied to other sets of genes with missing data.

COMET: an R package to identify sample cross-contamination in whole genome sequencing studies

ABSTRACT. Identification of sample cross-contamination is crucial in next generation sequencing (NGS) studies because undetected contamination may lead to bias in association studies. In PCR-free germline multiplexed whole genome sequencing (WGS) studies, sample cross-contamination may be investigated by studying the excess of non-matching reads at homozygous sites compared to the expected sequencing error fraction. In this presentation, we propose a probabilistic method to infer contaminated samples and their contaminant for low levels of contamination. The distance on the well plate between the contaminant and the contaminated sample may be penalized. The method is implemented in a free of charge R package. We compare it with the three alternative methods ART-DeCo, VerifyBamID2 and the built-in function in Illumina's DRAGEN platform and demonstrate its accuracy on simulated data. We illustrate the method using real data from the pilot phase of a large-scale NGS experiment with 9000 whole genome sequences. In the real data, our method was able to successfuly identify cross-contamination. Sample cross-contamination in NGS studies can be identified using a simple-to-use R package.

Evaluating DNA sequencing performance: concordance-discordance model and latent class model

ABSTRACT. Although the use of next generation sequencing (NGS) has been rapidly growing, evaluation of the performance of NGS often encounters difficulties, such as lacking gold standard, biased gold standard, especially when concerning real individual’s sequencing data. To evaluate the accuracy of NGS sequencing and reduce error, researchers often use technical replicates, biological replicates or results from multiple pipelines. The concordance rates between these replicates are also used as an indicator of sequencing accuracy. However the appropriateness of these substitute criteria has rarely been questioned. This study aimed to analyze whether the concordance rate is an adequate criteria for evaluating DNA sequencing performance, as well as the modelization of association between covariates and the error rates.

Two approaches were compared to estimate error rate given DNA characteristics and other covariates. The appropriateness of using concordance/discordance criteria as an alternative in the absence of gold standard was firstly analyzed. Situations with different values of sensitivity, specificity of NGS as well as different prevalences of variants were studied. The contribution of latent class models was then investigated, the true status of base-pairs being the latent variable. Finally, the clinical contribution of these two approaches for medical practice was analyzed according to the clinical context of DNA sequencing prescription.

14:45-16:15 Session OC1E: Multiple testing and randomization tests
Confidence intervals for the treatment effect in the Magnusson-Turnbull adaptive enrichment design

ABSTRACT. Establishing treatment efficacy in patient subgroups presents statistical challenges due to multiplicity and the potential for small samples which could lead to misleading conclusions. Thus, adaptive enrichment group sequential designs have been proposed for phase II/III trials. Such designs focus on ensuring maximum power to detect a treatment effect in either the whole population or a selected subgroup. However, the adaptive nature of the procedure makes quantification of uncertainty in treatment effects difficult. We therefore consider the problem of constructing individual and simultaneous confidence intervals for the treatment effects for subgroups at the termination of an adaptive enrichment trial. Focussing on the two-stage version of the enrichment designs function proposed by Magnusson and Turnbull (2013), our approach involves devising a suitable p-value function for the combined statistics based on space ordering methods. By inverting the relevant p-value function, one-sided individual confidence intervals with exact coverage for either the selected group or an individual subgroup. The construction of simultaneous confidence intervals for every group in the trial, using either a simple Bonferroni approach or the weighted Bonferroni method of Brannath and Schmidt (2014), is also explored. Future research should focus on the extension to other adaptive enrichment designs.

Graphical approaches for the control of generalized error rates

ABSTRACT. When simultaneously testing multiple hypotheses, the usual approach in the context of confirmatory clinical trials is to control the familywise error rate (FWER), which bounds the probability of making at least one false rejection. In many trial settings, these hypotheses will additionally have a hierarchical structure that reflects the relative importance and links between different clinical objectives. The graphical approach of Bretz et al (2009) is a flexible and easily communicable way of controlling the FWER while respecting complex trial objectives and multiple structured hypotheses. However, the FWER can be a very stringent criterion that leads to procedures with low power, and may not be appropriate in exploratory trial settings. This motivates controlling generalized error rates, particularly when the number of hypotheses tested is no longer small. We consider the generalized familywise error rate (k-FWER), which is the probability of making k or more false rejections, as well as the tail probability of the false discovery proportion (FDP), which is the probability that the proportion of false rejections is greater than some threshold. We also consider asymptotic control of the false discovery rate, which is the expectation of the FDP. In this presentation, we show how to control these generalized error rates when using the graphical approach and its extensions. We demonstrate the utility of the resulting graphical procedures on clinical trial case studies.

Evaluation of the Fill-it-up design to combine data from observational trials and RCTs

ABSTRACT. Context The most appropriate method to investigate the effects of interventions in clinical research is to conduct a randomised controlled clinical trial (RCT). However, conducting an RCT is often challenging. In particular, multicentre and multinational trials face some constraints for clinical researchers. The EPISTOP trial was conducted to compare conventional and preventive therapy for epilepsy in children with tuberous sclerosis complex [1]. In some centres, however, permission to conduct an RCT was not granted by the ethics board due to different guidelines for conducting clinical trials in children, resulting in observational data on the one hand and randomised data on the other. The statistical analysis and combination of the randomised and observational data poses a great challenge.

Objectives The aim is to extent the Fill-it-up design for the combination of observational data and data from RCTs. To avoid biased estimates of the treatment effect, observational data should be similar to randomised data to a reasonable extent and with a certain confidence.

Method The combination of observational and randomised data should only be considered if their equivalence is confirmed in an equivalence pretest. We therefore propose to pause the originally planned trial when a certain sample size of all study arms is reached in order to conduct the equivalence pretest. If equivalence is confirmed, the observational and randomised data will be pooled and no further recruitment is carried out. If equivalence cannot be confirmed, the observational data will be described separately and the final statistical analysis will be performed based on the randomised data only. For this purpose, the recruitment of the original study is continued. We investigate the performance of this study design in terms of adherence to the familywise error rate and overall power.

Results We show how the significance levels of the separate tests need to be adjusted to maintain the overall type-I-error probability and overall power of our design within acceptable limits while reducing the total randomised sample size in case of equivalence.

[1] Kotulska, K. et al.(2020), Prevention of Epilepsy in Infants with Tuberous Sclerosis Complex in the EPISTOP Trial. Ann Neurol, 89: 304-314.

Improved group sequential Holm procedures for testing multiple correlated hypotheses over time

ABSTRACT. Clinical trials can typically feature two different types of multiple inference: testing of more than one null hypothesis and testing at multiple time points. These modes of multiplicity are closely related mathematically but distinct statistically and philosophically. Regulatory agencies require strong control of the family-wise error rate (FWER), the risk of falsely rejecting any null hypothesis at any analysis. The correlations between test statistics at interim analyses and the final analysis are therefore routinely used in group sequential designs to achieve less conservative critical values. However, the same type of correlations between different comparisons, endpoints or sub-populations are less commonly used. As a result, FWER is in practice often controlled conservatively for commonly applied procedures.

Repeated testing of the same null hypothesis may give changing results, when an efficacy boundary is crossed at an interim but not at the final analysis. The mathematically correct overall rejection is at odds with an inference theoretic approach and with common sense. We discuss these two issues, of incorporating correlations and how to interpret time-changing conclusions, and provide case studies where power can be increased while adhering to sound statistical principles.

Randomization tests to address disruptions in clinical trials

ABSTRACT. Background In early 2020, the World Health Organization declared the novel corona virus disease (COVID-19) a pandemic. On top of prompting various trials to study treatments and vaccines for COVID-19, COVID-19 also had numerous consequences for ongoing clinical trials. People around the globe restricted their daily activities to minimize contagion, which led to missed visits and cancelling or postponing of elective medical treatments. For some clinical indications, COVID-19 may lead to a change in the patient population or treatment effect heterogeneity.

Methods We present three models for clinical trial disruptions. The first model will account for the change in patient population based on chronological bias. The other will model the disruption based on the assessment of an early biomarker that is correlated with the primary outcome. The third model will account for missed visits. We will measure the effect of the disruption on randomization tests. Randomization tests are a non-parametric, design-based method of inference. We derive a methodological framework for randomization tests that allows for the assessment of clinical trial disruptions, and we will conduct a simulation study to assess the impact of disruptions on type I error probability and power in practice. Finally, we will illustrate the results with a simulation study and a case study based on a clinical trial that was interrupted by COVID-19.

Results We show that randomization tests are robust against clinical trial disruptions in certain scenarios, namely if the disruption can be considered an ancillary statistic to the treatment effect. As a consequence, randomization tests maintain type I error probability and power at their nominal levels.

Conclusions Randomization tests can provide a useful sensitivity analysis in clinical trials that are affected by clinical trial disruptions.

14:45-16:15 Session OC1F: Methods for survival analysis
Testing for ignorable sampling bias under random double truncation

ABSTRACT. The issue of random double truncation is ubiquitous in clinical and epidemiological research. It occurs, for instance, in Survival Analysis with interval sampling, when the observation of the target lifetime is restricted to events within two given calendar dates (Zhu and Wang, 2014). Unlike one-sided truncation, which always shifts the target distribution in the observable world, double truncation may result in an ignorable sampling bias. This is because the left and right truncation variables may balance each other, in the sense of giving the lifetimes the same chances of observation regardless their size. In such a case, the ordinary empirical distribution function is consistent, and no correction for double truncation is needed. This is interesting, since the recovery of a distribution function from doubly truncated data typically suffers from a large variance. See de Uña-Álvarez and Van Keilegom (2021) for nonparametric maximum likelihood estimation (NPMLE) with doubly truncated data. Even when testing for ignorable sampling bias from doubly truncated data is highly relevant, no methods for this goal have been proposed so far. In this work I introduce a formal, omnibus test for the null hypothesis that the double truncation induces no observational bias on the target. The test is based on a distance between the NPMLEs of the sampling probability function under the null and under the alternative. The asymptotic null distribution of the test is derived and the consistency of the test is proven. A bootstrap resampling procedure to approximate the null distribution of the test in practice is introduced. This is not immediate, however, since the null hypothesis does not characterize the distribution of the truncation couple and, thus, Wang’s obvious bootstrap for truncated data is not applicable. The finite sample performance of the test is investigated through simulations. The method is applied to two different studies with interval sampling: (a) age at diagnosis for childhood cancer, and (b) age at diagnosis for Parkinson’s disease. While no relevant sampling bias is found in the first study, the hypothesis of ignorable sampling bias is largely rejected for the Parkinson’s disease data. Practical recommendations are given.

A martingale based approach for modelling the alternating recurrences in Cystic Fibrosis patients

ABSTRACT. Alternating recurrent events are often evidenced in varied clinical experiments (Kalbfleisch and Prentice, 2002). In the present article, we will use martingale based methods to take care of the dependence between the two recurrences for the above types of data. Using this method, the present article aims to analyze the Cumulative Mean Functions (CMF) for the alternating recurrent events. The history of the analysis of alternating recurrent events has evidenced several techniques. However, in most of the cases the association between the two alternating recurrences are considered using marginal methods (Yan and Fine, 2008; Sen Roy and Chatterjee, 2015 and many more). Some sparse attempts were made to address the issue of incorporating the association (Li et al., 2010; Chatterjee and Sen Roy, 2018). Nelson and Doganaksoy, 1989 illustrates a method of estimating the CMF's for recurrent events from the processes which are identically distributed. This was later generalized by Lawless and Nadeau, 1995 by incorporating regressors. Nonparametric estimation methods for CMFs were taken care of by them. However, only a few attempts were made where the CMF's were modeled for alternating recurrent episodes. In the present article, we will consider the counting process approach and the idea of combining the two martingales (Fleming and Harrington, 1991) will be used to bring in the association between the two recurrent episodes. Also, this will help us to extend the existing log rank test for the alternating recurrent events, using the combination of the martingales. For illustration purpose, a data (as reported by Fuchs et al., 1994) on the disease cystic fibrosis will be considered. Patients suffering from this disease often alternatively face recurrent episodes of illness and cure. The study reports time to relapse and time to cure of 647 patients being treated either to placebo (325) or to rhDNase (322). The data had previously been analyzed by many (Yan and Fine, 200; Chatterjee and Sen Roy, 2018, 2020 and many more). Our newly proposed distribution free method involving the mean functions can potentially give a new direction towards finding out the nature and dependence between the cure and ill events.

Impact of measurement error in time-varying prescription-based drug exposures in time-to-event analysis

ABSTRACT. Administrative databases have grown in popularity as sources of data for pharmaco-epidemiological studies since these often include drug prescription registries used to determine individual drug exposures. Yet, an inherent problem is that a filled prescription does not guarantee actual medication intake. A consequence of this non-adherence to the prescribed treatment is that true episodes of drug exposure potentially diverge from those reconstructed based on the history of filled prescriptions recorded in the study database. It is well known from the statistical measurement error (ME) literature that analyses using error-prone exposures may result in biased estimates of their associations with the outcome, and incorrect inference. Despite this, ME in prescription-based exposures is often overlooked and, even when researchers are aware of the presence of ME, very few studies aim to correct it.

We aim to illustrate the impact of ME due to prescription non-adherence on estimated drug associations, with specific focus on the estimation of complex treatment effects in time-to-event analyses with time-varying exposures.

Specifically, we will investigate through simulation studies the effects of exposure ME on estimated hazard ratios (HR) from the Cox model. The first simulation study will analyze Berkson ME in (i) time-fixed and (ii) time-varying exposures considering both linear and non-linear effects of exposure on the log hazard. Then, to evaluate the effects of prescription non-adherence, a plasmode simulation study will be conducted in which the observed prescription histories will be resampled from a real pharmaco-epidemiological dataset and then 'true' unobserved time-varying exposure patterns will be simulated under various assumed patterns of non-adherence. Finally, biases caused by existing ad hoc methods to reconstruct drug exposure episodes (e.g. filling gaps) will also be investigated. Based on preliminary results, we expect to show non-negligeable bias of the estimated HR and distortion of the estimated non-linear effect of exposure on the log-hazard.

In conclusion, we aim to show that exposure ME due to assumed exposures based on prescription registries is a non-negligeable source of bias that should not be overlooked in time-to-event analyses using time-varying exposures and thus bridge the gap to a more warranted implementation of ME correction methods.

Developing a Goodness of fit test for a joint model: The case of clustered survival and count data

ABSTRACT. In Epidemiology, infectious viral diseases such as dengue, encephalitis, hepatitis to name a few often result in responses on the survival and count of patients. Many studies have found these two responses to be correlated and thus joint models are recommended over univariate models. Often this scenario is more complex with data clustered within geographical units. A method that models the responses using a GLMM with a single random effect for the geographical units is considered. This model assumes exchangeability structure for the covariance matrix. The objective of this research was to develop a goodness of fit test, examine its Type one error and power and to apply this GOF test to infectious disease data clustered within geographical units. For the survival data a Discrete Time Hazard Model (DTHM) is used and a Negative Binomial (NB) model is used for the count data. These two responses were jointly modeled using a GLMM model. Once the joint model was fitted, the Hosmer-Lemeshow test was generalized to this scenario in order to group the expected values and get indicator variables. Under the null hypothesis of a well-fitting model the coefficients of the indicator variables have to be simultaneously equal to zero. A detailed simulation study has been used to examine the type one error and power of this test. A variety of sample sizes have been examined together with several intra-cluster correlation (ICC) values. A real data set on the infectious disease Dengue from Sri Lanka is used to illustrate the theory. The important conclusions from the study are that that the test maintains type one error for almost all the scenarios considered. The power of the test is good for at least moderate sample sizes (around 500 patients). The cluster effect was taken to be district and the model is shown to fit well. Theoretically, the importance of this research is that it fills a gap in the literature as there is no known GOF test for this situation. Practically there are many applications that can make use of this development particularly in Epidemiology and other areas of Medicine.

Confidence bands of the MRL function in some right censored prevalent cohort studies via the empirical likelihood

ABSTRACT. Survival data collected in a cohort of prevalent cases may be used to draw statistical inference on the natural progress of a disease. Since non-random sampling of subjects is involved, the data collected in this sampling scheme are biased. The most common case for modelling this bias is known as length-bias. It assumes the initiating event follows a stationary Poisson process. It is also often necessary to take into account the inability to follow-up some subjects, i.e. the presence of censored data. Length-bias is modelled in the context of survival data described above but can be seen in other sampling schemes. Life expectancy is a key concept in survival analysis. In this talk, an empirical likelihood-based procedure is proposed to obtain a simultaneous confidence band for the mean residual life (MRL) function of the unbiased survival time of interest where the available sample includes right censored length-biased data. The empirical log-likelihood is revealed to converge weakly to a mean zero Gaussian process. As a direct result, it is also shown that the weak convergence implies that the limiting distribution of the log-likelihood ratio statistic is chi-square. These asymptotic results are employed to obtain confidence bands and intervals for the MRL function. The finite sample performance of the proposed method is inspected through a simulation study. The proposed method is illustrated by modelling the MRL and the respective confidence bands/intervals for elderly residents of a retirement centre.

14:45-16:15 Session OC1G: mixed effects model
Bayesian multi-response nonlinear mixed-effect model: application of two recent HIV infection biomarkers

ABSTRACT. Since the discovery of the human immunodeficiency virus (HIV) 35 years ago, the epidemic is still ongoing in France. To monitor the dynamics of HIV transmission and assess the impact of prevention campaigns, the main indicator is the incidence. One method to estimate HIV incidence is based on the knowledge of the dynamics of two recent HIV infection biomarkers. Estimating HIV incidence from biomarkers first requires modeling their dynamics since infection using external longitudinal data.

The main objective was to jointly model the dynamics of two recent HIV infection biomarkers using a nonlinear mixed-effect model. We considered one random effect for each biomarker and a correlation of random effects to take into account the correlation of the biomarkers. Parameters were estimated using the Hamiltonian Monte Carlo method.

This procedure was first applied to the real data of the PRIMO cohort. This cohort recruited 298 volunteers infected with primary HIV infection. The patients were examined biologically at inclusion, at 1, 3 and 6 months, then every 6 months. At each visit, the concentration of the two antibodies was measured. We also simulated 200 datasets closed to the PRIMO cohort data.

The goodness of fit of our model was assessed by comparing the observed individual trajectories and those predicted by the model using real data. The Bayesian estimation procedure was validated through a simulation study using conventional indicators (bias, coverage rate and RMSE).

The results on the real data and on the simulation study indicate that the proposed approach gives satisfactory results for the estimation of the dynamics of the two biomarkers. For the real data, the predicted trajectories are closed of observed trajectories for all individuals. For the simulation study the absolute relative bias is between 0% and 8%, RMSEs are between 0.11 and 2.1, and the coverage rate is between 90% and 98%.

To our knowledge, this work is the first attempt to jointly study the dynamics of two biomarkers in a Bayesian nonlinear mixed-effect model. This modeling can potentially be used to estimate HIV incidence from HIV diagnoses surveillance data with values of markers at diagnosis.

Confidence, Prediction, Tolerance Intervals in Linear Mixed Models: Applications in (Non)-Clinical Trials

ABSTRACT. Confidence intervals (CI) have been widely accepted and used in the medical literature. However, such intervals mainly focus on uncertainty of “average” effects. In practice, it is often useful to predict the primary outcome for future patients, and to predict the ranges where most of patients will lie. This is where prediction intervals (PI) and tolerance intervals (TI) can provide much more information to medical researchers and to the patients. These intervals focus on future patients with an exchangeable interpretation under both frequentist or Bayesian paradigms. The literature about PI and TI in linear mixed models is usually developed for continuous response variables with some specific designs, which is a main limitation to their use.

We propose a formula of the two-sided PI which is generalizable under a wide variety of designs in mixed models (one random factor; nested and crossed designs with multiple random factors; balanced or unbalanced designs). Construction of two-sided TI are also detailed by using the expected mean squares for random effects. Computation of prediction and tolerance intervals can also be performed under the clinical trial continuous endpoint with repeated measures.

A simulation study is carried out to compare the widths and coverage probabilities of CI, PI and TI, to their nominal levels. Finally, these intervals are applied to real datasets from orthopedic surgery study (intralesional resection risk). While marginal prediction and tolerance intervals are not implemented in most of the statistical software, it will be shown how to calculate and interpret these intervals.

[1] Francq B, Lin D, Hoyer W. Confidence, Prediction and Tolerance in Linear Mixed Models. Statistics in Medicine 2019; 38: 5603-5622. [2] Francq B, Berger M, Boachie C. To Tolerate or To Agree: A Tutorial on Tolerance Intervals in Method Comparison Studies With BivRegBLS R Package. Statistics in Medicine 2020; 39:4334–4349.

Bayesian multivariate longitudinal data analysis assuming different association structures

ABSTRACT. Studies in life course epidemiology often involve different types of outcomes being collected on individuals, who are followed over time. These outcomes are mainly analysed separately, although it may be of scientific interest to study their associations. To model the correlation of multiple longitudinal outcomes, it is common to assume a multivariate normal distribution for the corresponding random effects. This approach, however, has its limitations in terms of interpreting the strength of association between the outcomes. To overcome this, we can include several longitudinal outcomes, as time-dependent covariates, in the model of the main longitudinal outcome. Another advantage is that several features of these longitudinal predictors could be used, namely the current value, the slope, and the area under the curve. We propose a multivariate mixed model that incorporates different functional forms to link multiple outcomes assuming the Bayesian framework. This approach was motivated by a dataset of Dutch adult patients with Late-onset Pompe disease. Late-onset Pompe disease is an autosomal recessive metabolic disorder and is characterized by progressive muscle weakness and loss of respiratory function. This disease is caused by an accumulation of glycogen in the lysosome due to deficiency of the lysosomal acid alpha-glucosidase enzyme. For these patients we have physical and patient-report outcomes. Clinicians are interested in investigating how the physical outcomes (e.g., Force vital capacity (FVC)) are associated with the patient-reported outcomes over time.

The use of mixed logistic modelling in the analysis of HIV latency study data in the context of low sample size and low outcome rates

ABSTRACT. Background: The goal of Human Immunodeficiency Virus (HIV) research is to find a cure. Although treatment greatly reduces viral levels to clinically undetectable levels, a latent reservoir of virus lies dormant during treatment and is reactivated if treatment is stopped. The low numbers of latent virus make studying and measuring this reservoir difficult and intrusive. Studies on HIV latency are therefore characterised by few participants with as much data extracted from biological samples as possible. This data is frequently analysed with mixed logistic regressions, however the low participant numbers, low rates of virus, and high across participant variability presents a number of challenges in using a mixed logistic model. Method: This project investigates the use of mixed logistic models in this context, when the sample size at participant level and rate of outcome are very low, and through simulation outlines the effects that this has on the accuracy of parameter and variance estimates, and their associated standard error. The impact of sample size on power at low outcome rates is explored, in combination with varying random effect sizes, reflecting the nature of the data seen in HIV latency studies. Results: We find that model performance is adversely affected when infection frequencies are below 1 cell per million in combination with low sample sizes (10 participants or less) and large variance components. Simulation demonstrates failure in model convergence, increased variability in simulated parameter estimates and inadequate power achieved. Conclusions : The low infection frequencies used in simulations in this study are seen in latency studies when analysing intact HIV virus in participants on anti-retroviral therapy. When applying the results of this study to latency data, issues with model performance indicate that when participant numbers are less than 10 in combination with such low outcome rates, other methods should be explored for analysing such data. Considering this, we also use the developed simulations to estimate sample size requirements for future HIV latency studies to guide future study design.

Mixed Modeling of Regional Infant Mortality Data over Twenty Years in North Rhine-Westphalia.

ABSTRACT. Infant mortality is one of the major indicators of quality of health and health provision. In North Rhine-Westphalia (NRW), as in other parts of Germany, infant mortality is declining. However, regions vary in the way they decline with respect to infant mortality. Hence, it is important to choose a model that accounts for the variability of infant mortality in each region to draw a valid conclusion. In this paper, we have built an appropriate statistical model using random coefficients approach and exemplified how the proposed model can be beneficially used for exploring trends in administratively regionally aggregated data of infant mortality in NRW from the year 1988 to 2010. We have measured the region-specific character by calculating the area under the infant mortality curves to identify which regions of NRW had higher infant mortality and which had lower in the specified duration. Our analysis reveals that the regions Gelsenkirchen, Mönchengladbach, Duisburg, Oberhausen, Krefeld, Oberbergischer Kreis, Hagen, Siegen-Wittgenstein had higher infant mortality, whereas Höxter, Solingen, Rheinisch-Bergischer Kreis, Rhein-Sieg-Kreis, Münster, Warendorf, Gütersloh, Herford had lower infant mortality. Moreover, we have found that the rural areas of NRW had less infant mortality compared to the urban areas.


ABSTRACT. The pharmaceutical industry gives rise to square tables of ordinal categorical data from the clinical trial of newly manufactured drugs. The Data arising from such trial have three dimensions: Drug X pre-treatment score X post- treatment score .The Proportional Odds Model is assumed to fit the data. The model is initiated by an underlying grouped continuous unobservable random variable. The ordinal categories on the post-treatment score is taken as a grouped continuous random variable. The data is stratified according to the pre-treatment score for purpose of homogeneity. Statistical tests of the null hypothesis : are formulated through the parameters of the proportional odds model. Three methods for testing the null hypothesis are examined : Regular Wilcoxon test, The classical Analysis of Covariance (ANACOVA) and a novel procedure existing in literature : Wilcoxon Van Elteren test for stratified data. A simulation study was carried out to examine the existing methods with the new method of application. The three methods are evaluated on a real data set of square ordinal responses from the clinical trial of a new drug.. Conclusions: The following conclusions can be drawn from the study of the statistical methods on square tables of ordinal data: TheWilcoxon Van Elteren test is proving to be most efficient in the presence of a pre- and post- treatment relationship when compared with the regular Wilcoxon test and the ANACOVA. The ANACOVA appears to be robust in that it gives the correct nominal significance level . but it loses power relative to the Wilcoxon Van Elteren test when there is a strong relationship between the pre-and post- treatment score. In analyzing ordinal data from a drug induced sympton change study, the post-treatment score can be identified as a sensitive variable in detecting treatment differences. Theoratical calculations on the parameter ‘q’ give consistency to the application of the Wicoxon Van Elteren test.

Causal inference with skewed outcome data: Moving beyond the “ignore or transform” approach

ABSTRACT. With continuous outcomes, the average causal effect is typically defined using a contrast of mean potential outcomes. However, in the presence of skewed outcome data, the mean may no longer be a meaningful summary statistic and the definition of the causal effect should be considered more closely. When faced with this challenge in practice, the typical approach is to either “ignore or transform” – ignore the skewness in the data entirely, or transform the outcome to obtain a more symmetric distribution for which the mean is interpretable. In many practical settings, neither approach is entirely satisfactory. An appealing alternative is to define the causal effect using a contrast of median potential outcomes. Despite being a widely acknowledged concept, there is currently limited discussion or availability of confounder-adjustment methods to generate estimates of this parameter.

Within this study, we identified and evaluated potential confounding-adjustment methods for the difference in medians to address this gap. The methods identified are multivariable quantile regression, adaptations of the g-computation approach, weighted quantile regression and an IPW estimator. The performance of these methods was assessed within a simulation study, and applied in the context of an empirical study based on the Longitudinal Study of Australian Children. Results indicated that the performance of the proposed methods varied considerably depending on the simulation scenario, including the severity of skewness of the outcome variable. Nonetheless, the proposed methods provide appealing alternatives to the common “ignore or transform” approach, enhancing our capability to obtain meaningful causal effect estimates with skewed outcome data.

Marginal Structural Models with Latent Class Growth Modeling of Time-varying Treatment.

ABSTRACT. A well known treatment to prevent Cardiovascular disease (CVD) are statins. Several randomized controlled trials (RCTs) have shown the efficacy of statins to prevent a first CVD event, that is, for primary prevention. Despite current evidence, it is still not clear if these conclusions can be applied to older adults as they are often excluded from RCTs. Moreover, there is little evidence concerning the effectiveness of statins for primary prevention among older adults in a real-life setting. Analysis of observational data could add crucial information on the benefits of actual statin's patterns of use. However, the number of unique treatment trajectories increases exponentially with the length of follow-up in longitudinal studies. Latent class growth models (LCGM) are increasingly proposed as a solution to summarize the observed longitudinal treatment in a few distinct groups. It is known that LCGM cannot be effectively combined with standard approaches, such as covariate adjustment in an outcome regression model, as they fail to control confounding bias. Marginal structural models (MSMs) are popular for their ability to deal correctly with time-varying treatment and covariates. We propose to use LCGM to classify individuals into a few latent classes based on their medication adherence pattern, then choose a working MSM that relates the outcome to these groups. We showed that the data-driven estimation of the trajectory groups can be ignored. As such, parameters can be estimated using inverse probability of treatment weights and conservative inferences can be obtained using a standard robust variance estimator. Simulation studies are used to illustrate our approach and compare it with unadjusted, baseline covariates adjusted, time-varying covariates adjusted and inverse probability of classes weighting adjusted models. We found that our proposed approach yielded estimators with little or no bias. We will apply our LCGM-MSM approach to a database composed by 572 822 Quebecers aged 66 or more and who are statins initiators to estimate the effect of statin-usage trajectories on a first CVD event. Our proposal is relatively simple to implement and we expect it to yield results that are clinically meaningful, easy to interpret and statistically valid.

Multidimensional mediators: Comparison between statistical methods using Data simulation

ABSTRACT. Background: Various statistical methods exist to evaluate mediation effect of a multidimensional score. Based on our experiences with quality of life and health literacy data, few differences in results of mediation effect are highlighted between methods. Data simulations could help to confirm this results using broader examples. Objectives: This study compares, through simulations, natural effect models (NEM) and structural equation modelling (SEM). Methods: Data were simulated using a Normal distribution, considering different correlation values between the independent variable (X), the dependent variable (Y) and the multidimensional scores M1,2,3. A 0.4 correlation between M1,2,3 were considered. Mediation effect was assessed with NEM using all scores (model 1) and a sum of the scores (model 2), and with SEM (model 3). The number of simulation was fixed to 2000 and sample size to 30, 50, 150 and 500. The percentage of direct and indirect effect rejection were calculated and agreement between models was assessed using the Bowker test and Cohen's Kappa coefficient. Results: First, we simulated data without correlation between X and respectively M1,2,3 and Y. As expected, all models concluded to a non-significant direct and indirect effect. Kappa values were higher when comparing model 1 and 2 and increased with sample size (from 0.4 to 0.9). When adding moderate correlation between only X and Y, less than 1% (up to 0%) of the simulated data leaded to significant indirect effect while more than 85% (up to 100%) concluded to significant direct effect (Kappa > 0.7). Secondly, data were simulated with a zero to moderate correlation between X and Y and moderate correlations between M1,2,3 and Y. The larger the sample size, the more the mediation effect was found to be partial rather than total with better agreement between models 1 and 2 (Kappa > 0.5). Distribution of mediation effect was however different according the methods (p < 0.0001). For larger sample size, all concluded to partial mediation. Conclusion: Differences between methods appear especially for small samples and when mediation effect was assumed. Further simulations should be carried out by varying the number of mediators, adding covariates and for other distributions.

The role of the matching algorithm in an analysis of the effect of hemoadsorption in patients with sepsis

ABSTRACT. In an observational study of 208 patients with therapy refractory septic shock, collected at the intensive care unit of the university hospital Zurich, the objective was to estimate the treatment effect of a hemoadsorption device (Cytosorb®) on in-hospital mortality. Septic shock is a life-threatening condition, incepted by a dysregulated hyperinflammatory immune response, the so-called cytokine storm, which is associated with a highly elevated mortality. In this context, hemoadsorption is discussed as effective treatment to reduce cytokine levels and inflammatory mediators in the blood. Lacking international validated protocols or guidelines, the decision to employ hemoadsorption remains fully at the discretion of the treating clinician. A set of predefined confounder variables was used for matching on treatment decision. The research question led us to set up a simulation study, following Morris et al. [1], to evaluate five different matching algorithms (nearest, optimal, caliper, full and genetic matching), and to compare their results regarding covariate balance as measured with the standardized mean difference, coverage, and precision of estimated odds ratios [2]. The odds ratios were estimated by unadjusted and adjusted logistic regression, employing the same covariates for adjustment than used for matching. Results of the simulation study showed that good balance could be achieved with full, genetic, and caliper matching, and these matching algorithms also yielded estimated treatment effects closest to the true values. Full matching was the only matching algorithm that could not achieve a coverage rate of more than 90%. We could show that additional adjustment for covariates provided better estimates. A considerable proportion of patients was lost during the matching process. In our simulation study, 18% of the treatment cohort was discarded after caliper matching leading to a substantial loss of power. In conclusion, matching is a powerful tool for the analysis of observational studies, nonetheless the matching algorithms, potential loss of power and unmeasurable confounding should be considered when assessing the results for generalizability.

References [1] Morris, T. P., White, I. R., and Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in medicine, 38, 2074–2102. [2] Simulation study protocol:

Estimating the causal effect of direct-acting antiviral agents on kidney function in a clinical cohort of chronic Hepatitis C patients

ABSTRACT. Introduction. Treatment with direct-acting antiviral (DAA) agents is highly effective at clearing Hepatitis C virus (HCV) among individuals with chronic HCV-infection: the cure rate of DAAs in clinical trials is >95%. However, the effect of DAAs on kidney function remains uncertain. Previous studies describe differential changes in glomerular filtration rate (GFR), a measure of kidney function, before and after receipt of DAA[1]. Interpretation of these results is limited by lack of a contemporary comparator group not receiving DAA. We aim to use causal inference methods to estimate 2-year change in GFR under two counterfactual interventions: had all versus no patients received DAA.

Methods. We used electronic health records (EHR) from Boston Medical Center, an urban safety-net hospital in the US. We included DAA-naive patients with chronic HCV-infection engaged in care, a Fibrosis-4 Score and GFR measurements between 2014 and 2018. We used the g-computation algorithm[2], which works by first fitting parametric models for the densities in the observed data, and then simulating a dataset under each counterfactual intervention by fixing the treatment variable, DAA. All models were adjusted for baseline demographic and clinical time-varying confounders. We estimated and compared the mean 2-year change in GFR in each simulated dataset, using 500 bootstrap samples to obtain 95% confidence intervals. Sensitivity analyses were run to ensure the parametric assumptions were appropriate given the complex nature of EHRs and the fact that implementing this method with a continuous outcome is fairly novel.

Results. The study included 2307 individuals with median age [IQR] 54 [42-60], of whom 67% were male, 18% had advanced fibrosis or cirrhosis, and 41% had chronic kidney disease (GFR≤90) at baseline. Estimated 2-year mean change in GFR was -5.8 ml/min/1.73m2 (95%CI:-7.7;-4.0) under DAA receipt and -4.9 (-7.3;-1.9) under no DAA receipt, with a mean difference of -0.9 (-4.4;2.3).

Conclusions. We found no causal effect of DAA on kidney function in this sample. This finding suggests continued clinical monitoring of patients’ kidney function is important, even after receipt of DAA. In future work, we intend to optimize methods for handling the challenges that come with utilizing EHRs to answer causal questions.

Should multiple imputation be stratified by exposure when estimating causal effects via outcome regression?

ABSTRACT. Despite recent advances in methods for causal inference in epidemiology, outcome regression remains the most widely used method for estimating causal effects in the simple time-fixed exposure and outcome setting. Missing data are common in epidemiologic studies and complete-case analysis (CCA) and multiple imputation (MI) are two commonly used methods for handling them. In randomized controlled trials (RCTs), it has been shown that MI should be conducted separately by treatment group, but the question of whether to impute by exposure group has not been addressed for observational studies, in which causal inference is understood as the task of emulating an RCT. We designed a simulation study to evaluate and compare the performance of five missingness methods: CCA, MI on the whole cohort, MI including an interaction between exposure and outcome in the imputation model, MI including interactions between exposure and all incomplete variables in imputation models, and MI conducted separately by exposure group. Bias, precision and confidence interval coverage were investigated. We generated data based on an example from the Victorian Adolescent Health Cohort Study, where interest was in the causal effect of adolescent cannabis use on young adulthood depression and anxiety in females. Three exposure prevalence scenarios and seven outcome generation models were considered, the latter ranging from no interaction to a strong positive or negative interaction between exposure and a strong confounder. Two missingness scenarios were examined: one with the incomplete outcome, the other with incomplete outcome and confounders, each with three levels of complexity in terms of the variables or interaction on which missingness depended. The simulation results show the relative bias of analysis approaches across all scenarios considered. MI by exposure group usually led to the least bias. Considering the overall performance, MI by exposure group is recommended if MI is adopted in settings such as these.

Pathways to inequalities in child mortality: a population level study in Wales

ABSTRACT. Context There has been an unprecedented rise in infant mortality rates in the UK since 2014, especially in disadvantaged areas. This trend is concerning since infant mortality is a sensitive indicator of the prevailing social conditions affecting health across the life course. Identifying potentially modifiable factors on the pathway linking childhood socio-economic conditions (SECs) to child mortality is important to inform public health policies to reduce health inequalities. Objectives To assess the extent to which intervening on maternal health, perinatal factors and/or birth outcomes might reduce inequalities in child mortality. Methods We conducted a causal analysis of linked population level data from the SAIL Databank on all singletons born in Wales between 2000 and 2019 and their mothers. The exposure of interest was mother’s quintile of small area deprivation 3-years prior to pregnancy; the outcome was child mortality between birth and age 15-years. The data included gestational age, birthweight, parity, maternal age, maternal health conditions before and during pregnancy, pregnancy complications, congenital anomalies, smoking during pregnancy and perinatal maternal mental health. Using the framework of interventional disparity measures, we estimated the contribution of a block of factors relating to maternal health and perinatal factors, and that of a block of birth outcomes to inequalities in child mortality. Confidence intervals will be calculated by non-parametric bootstrap. Results Initial results are based on a complete-case analysis of data on 463,200 live births out of which 1,719 died by age 15. The probability of having died by age 15 was 1.37 times as high in the most deprived quintile compared to the least. After shifting the distribution of maternal health, perinatal factors and birth outcomes in the most deprived population quintile to that in the least deprived quintile, the survival probability ratio between the most and least deprived children was reduced to 1.09. Conclusions Child mortality is a rare event but with clear socio-economic patterning. Initial results indicate that maternal health, perinatal factors and birth outcomes may explain most of the observed inequalities. Further analyses will aim to disentangle the contribution of these mediating blocks to identify potential public health policy entry points.

Target Trial Emulation and Missing Eligibility Data: A study of Palivizumab for child respiratory illness

ABSTRACT. Target trial emulation (TTE) seeks to apply the principles of a Randomised Controlled Trial (RCT) onto analyses of observational data in order to improve the quality of analysis. However, selection into the target trial necessitates that the data on eligibility is observed, and thus in practice, a TTE typically only includes those with complete case eligibility data. This presents a problem when eligibility data is Missing at Random (MAR), or Missing not at Random (MNAR). When the estimand of interest is the Average Treatment Effect (ATE), an analysis of complete case TTEs will often suffer from bias and issues of internal validity if the treatment has a heterogeneous effect on the outcome. The objective of this work is to investigate the source and size of any bias caused by taking complete case eligible individuals when eligibility is either MAR, or in particular, MNAR. We propose a solution whereby Multiple Imputation (MI) is used to predict missing eligibility data, which is then used to recruit individuals into a Target Trial, a method which, to our knowledge, is rarely proposed. This will be investigated using both simulation, and practical clinical data. Notable aspects of the work include imputation of eligibility criteria for inclusion into a target trial, and the Multiple Imputation of eligibility data using MNAR imputation models. We apply this work to an analysis of the effect of the effect of Palivizumab on hospitalisation in premature infants due to Respiratory Syncytial Virus (RSV) in England, by constructing a target trial using MNAR Multiple Imputation of gestational age. Results are yet to be finalised.

Targeted causal quantile estimation for measurement of postdischarge opioid use in a text-to-web survey

ABSTRACT. Background: We developed an automated system for recruiting discharged surgical patients via text message to report opioid consumption in an online survey. We planned to assess its effectiveness for estimating opioid use in surgical cohorts while addressing the potential for nonresponse bias.

Methods: Patients who underwent a surgical procedure at our institution between 2019 and 2020 were surveyed to quantify opioids consumed after discharge. Opioid consumption was measured through either survey response or electronic health record (EHR) documentation when no opioids were prescribed. Factors sourced from the EHR were tested for association with survey response and with opioid use, and were used to estimate a propensity score (PS) for measurement of opioid consumption among those prescribed opioids. The PS was estimated using logistic regression (LR) and ensemble machine learning (stratification, linear regression, LR, lasso, random forest, BART), and was evaluated by its area under the curve (AUC). The median and 75th percentile of opioid consumption were estimated for the 10 most common surgical procedure groups in morphine milligram equivalents (MMEs), including adjustment for observed nonresponse patterns using inverse probability weighting (IPW) and doubly robust targeted learning (TL) using quantile regression. SMS-recruited survey results were benchmarked against estimates calculated from a previous phone-based survey.

Results: We evaluated 6,553 surgical patients, of which 71% were prescribed opioids, 24% completed the survey, and 44% had opioid consumption measured. Characteristics significantly associated with opioid measurement include age, length of stay, and current tobacco use. The ensemble was able to predict opioid measurement with AUC (95% C.I.) = 0.593 (0.577 - 0.608), compared to 0.575 (0.560 - 0.591) for logistic regression (p = 0.017). Unadjusted opioid consumption was substantially underestimated when compared to earlier phone-based results. IPW and TL adjustment reduced underestimation bias of median MMEs for 60% of surgical procedures, and 90% of procedures when estimating the 75th percentile of MME consumption.

Conclusion: Adjustment for nonresponse using TL or IPW resulted in increased ranges for estimated opioid consumption. Without appropriate statistical adjustment, nonresponse can strongly bias estimates of typical opioid consumption as collected in patient surveys.

Does early weight gain mediate the causal effects of severity and duration of depression on the onset of metabolic syndrome?

ABSTRACT. In clinical research, the metabolic syndrome (MetS) has thusfar been used as an observed variable. We conducted a confirmatory factor analysis of the MetS in order to identify whether there’s a single latent MetS factor suggestive of a common pathophysiology, and to use SEM to analyze whether early weight gain mediates the causal effects of severity and duration of depression on the onset of MetS.

In the prospective METADAP cohort, 260 non-overweight patients with a Major Depressive Disorder (MDD) were assessed for early weight gain (EWG) (>5%) after one month of antidepressant treatment, and for the later incidence of MetS after three and six months of treatment. Measurement models of MetS at 3 and 6 months were tested. The variables were chosen based on a modified version of the International Diabetes Federation definition. An exploratory factor analysis (EFA) using principal components analysis was performed to assess the dimensionality of the constructs. Confirmatory factor analysis (CFA) was subsequently used to test the structure of the extracted factor models. Two SEMs were fitted to test the mediating role of EWG on the effects of the severity and the duration of depression on the onset of MetS at 3 and 6 months.

EFA of each of MetS at 3 and 6 months revealed a unidimensional structure. CFA showed that the two one-factor measurement models significantly reflected the constituent observed variables. An error term correlation path between triglycerides and HDL-C was added producing measurement models of moderate to good fit (CFI of 1 and 0.889, RMSEA of 0 and 0.089, and SRMR of 0.034 and 0.064, respectively). SEMs incorporating either models (CFI of 0.738 and 0.661, RMSEA of 0.086 and 0.096, and SRMR of 0.069 and 0.079, respectively) revealed that the relationship between duration and severity of depression on one hand, and MetS on the other is totally mediated by EWG. EWG was more positively and significantly correlated with MetS6 than with MetS3 (standardized coefficients of 0.23 versus 0.3; p<0.05).

Clinicians should monitor their patients for EWG in addition to the severity of their conditions, in order to minimize risk of onset of MetS.

Pathway specific population attributable fractions.

ABSTRACT. Population attributable fractions (PAF) represent the relative change in disease prevalence expected if an exposure was absent from the population. What percentage of this effect acts through particular pathways may be of interest. E.G. the effect of sedentary lifestyle on stroke may be mediated by blood pressure, BMI and several other mediators. Path-specific PAFs (PS-PAFs) represent the relative change in disease prevalence from an intervention that, conditional on observed covariates, shifts the distribution of the mediator to its expected distribution in a hypothetical population where the risk factor was eliminated. A related (more mechanistic) definition examines disease prevalence expected from an individual-level intervention assigning each individual the mediator they would have received if the risk factor had been eliminated.

Our aim here is not to decompose the total PAF for a risk factor into an additive sum over mediating pathways, but to instead fairly compare disease burden attributable to differing mediating pathways and as a result gain insights into the dominant mechanisms by which the risk factor affects disease on a population level. While PS-PAFs corresponding to differing pathways (mediating the same risk factor-outcome relationship) will each usually be less than the total PAF, they will often sum to more than the total PAF. In this manuscript, we present definitions, identifiability conditions and estimation approaches for PS-PAFs under various study designs. We illustrate results using INTERSTROKE [1], an international case-control study designed to quantify disease burden attributable to a number of known causal risk factors. A R package will be available.

References [1] M. J. O'Donnell, S. L. Chin, S. Rangarajan, D. Xavier, L. Liu, H. Zhang, P. Rao-Melacini, X. Zhang, P. Pais, S. Agapay, et al. , “Global and regional effects of potentially modifiable risk factors associated with acute stroke in 32 countries (interstroke): a case-control study," The lancet, vol. 388, no. 10046, pp. 761-775, 2016.

17:00-18:30 Session IS2: INVITED : Personalized Medicine with Dynamic Predictions
Personalized Schedules for Invasive Diagnostic Tests: With Applications in Surveillance of Chronic Diseases

ABSTRACT. Benchmark surveillance tests for diagnosing disease progression (biopsies, endoscopies, etc.) in early-stage chronic non-communicable disease patients (e.g., cancer, lung diseases) are usually invasive. For detecting progression timely, over their lifetime, patients undergo numerous invasive tests planned in a fixed one-size-fits-all manner (e.g., biannually). We present personalized test schedules based on the risk of disease progression, that aim to optimize the number of tests (burden) and time delay in detecting progression (shorter is beneficial) better than fixed schedules. Our motivation comes from the problem of scheduling biopsies in prostate cancer surveillance studies.

Using joint models for time-to-event and longitudinal data, we consolidate patients’ longitudinal data (e.g., biomarkers) and results of previous tests, into an individualized future cumulative-risk of progression. To use this predicted risk profile for scheduling invasive tests we propose three different approaches. First, by minimizing loss functions motivated by Bayesian decision theory, under the predicted risk profile, to obtain the optimal time of conducting a future invasive test. Second, planning tests on those future visits where the predicted cumulative-risk of the patient is above a particular threshold (e.g., 5% risk). Third, by estimating the expected number of tests (burden) and expected time delay in detecting progression (shorter is beneficial) for all possible test schedules, and then optimizing a utility function of the expected number of tests and delay to find a mathematically optimal schedule. Since we estimate the expected number of tests and delay in a personalized manner, they can be used directly by patients and doctors to compare various test schedules for their benefit and burden and make a shared decision for a suitable schedule. In all three approaches, a common specialty of our schedules is that they automatically update on each follow-up with newly gathered data.

We compare our methodologies with currently used heuristic schedules, and schedules based on partially observable Markov decision processes. We also evaluate criteria for the selection of risk-threshold (e.g., Youden index, F1 score, net benefit) in schedules based on risk-threshold. Last, we implement our methodology in a web-application for prostate cancer patients.

Comparing risk prediction models in longitudinal context

ABSTRACT. In this presentation, I will review our recent research on comparing risk prediction models in longitudinal cohort studies. The research was conducted under the commonly encountered clinical context where patient characteristics are measured at baseline, biomarkers and clinical history are collected over time as repeated measures until censoring or the occurrence of a terminal clinical event of predictive interest. First, in a comparison between static vs. dynamic prediction models, the latter always showed equivalent or improved prediction accuracy, and the magnitude of improvement depended critically on the concept of baseline and how the at-risk population vary over time. This result suggests that when longitudinal data are available, dynamic prediction is a useful tool that can replace the widely used static prediction approaches. Second, in a comparison between two dynamic prediction model approaches, landmark modeling vs. joint modeling, neither approach dominated the other in terms of prediction accuracy, and their performance under the simulation context depended on model misspecification. This comparison was made feasible by using a novel algorithm to simulate longitudinal data that satisfied infinitely many landmark models simultaneously. The result suggests that further research on both types of models is warranted. Third, in a comparison among various joint models for dynamic prediction, it was found that misspecification of longitudinal trajectories could decrease the prediction accuracy, which highlights the importance of future research on joint models with multivariate longitudinal data and flexible trajectories. A computational approach to facilitate the estimation of this kind of joint models will be presented.

Testing for Heterogeneity in the Utility of a Surrogate Marker

ABSTRACT. For many clinical outcomes, randomized clinical trials to evaluate the effectiveness of a treatment or intervention require long-term follow-up of participants. In such settings, there is substantial interest in identifying and using a surrogate marker that can measured earlier and be used to evaluate the treatment effect. Several statistical methods have been proposed to evaluate potential surrogate markers; however, these methods generally do not account for or address the potential for a surrogate to vary in utility by certain patient characteristics. Such heterogeneity is important to understand, particularly if the surrogate is to be used in a future trial to potentially predict or replace the primary outcome. Here, we develop a nonparametric approach to examine and test for heterogeneity in the strength of a surrogate marker of interest. Specifically, we propose an approach and estimation procedure to measure the surrogate strength as a function of a baseline covariate, W. We then propose a testing procedure to test for evidence of heterogeneity and, if there is heterogeneity, an approach to identify a region of interest i.e., a subgroup of W where the surrogate is useful. Lastly, we extend these methods to a setting where there are multiple baseline covariates. We examine the performance of these methods using a simulation study and illustrate the methods using data from the Diabetes Prevention Program clinical trial.

17:00-18:30 Session OC2A: Mediation analysis
Nonlinear mediation analysis with high‐dimensional mediators whose causal structure is unknown

ABSTRACT. Clinical research problem and statistical challenges: In settings that involve multiple possible mediators, fine-grained decompositions of path-specific effects are only valid under stringent assumptions. The assumptions are violated when – as often – the causal structure among the mediators is unknown, or there is unobserved confounding among the mediators. For example, the effect of a microRNA expression on the three-month mortality of brain cancer patients may be potentially mediated by a set of gene expression values whose internal causal structure is unknown. In contrast, interventional indirect effects for multiple mediators can be identified under much weaker conditions, while providing scientifically relevant causal interpretations. Nonetheless, current estimation approaches require (correctly) specifying a model for the joint mediator distribution, which can be difficult in practice when there is a high‐dimensional set of possibly continuous and noncontinuous mediators, such as in gene regulation networks or protein signalling networks.

Objective: In this article, we develop a definition of interventional effects that avoids the need to model the mediators’ joint distribution, by building on a previous suggestion for longitudinal mediation [1]. We exploit the definitions to propose a novel estimation strategy that uses nonparametric estimates of the (counterfactual) mediator distributions. Noncontinuous outcomes can be accommodated using nonlinear outcome models.

Statistical Methods: Estimation proceeds via Monte Carlo integration and requires an outcome model. When treatment is non-randomly assigned, a model for the treatment given the observed baseline covariates is required. No models for the mediators’ (joint) distribution are required.

Results: The validity of the methods are examined via simulation studies. The method is illustrated using a reanalysis of a high-dimensional mediation analysis by Huang and Pan [2]. Huang and Pan determined that the effect of a microRNA expression on mortality within three months among 490 patients suffering from a malignant brain tumor was mediated jointly via each of nine subsets of 1220 gene expression values in the tumor genome.

Conclusion: Analysis of the brain cancer study using the proposed interventional indirect effects detected the individual genes (within each of the nine subsets) whose expression value mediated the effect of the microRNA expression on three-month mortality.

Separable Causal Effects as alternative Estimands in Epidemiology and Biostatistics

ABSTRACT. Separable effects have been proposed as an alternative approach to causal mediation analyses especially with view to time-dependent settings. The basic idea refers to a way of elaborating our causal model in order to better motivate a mediational research question. Specifically, separable effects are concerned with situations where the treatment (or exposure) can be decomposed into two or more components that could (possibly hypothetically) be intervened upon independently and thus separately 'activate' different causal pathways. Formulating time-dependent mediational research questions in this way is appealing as it sidesteps 'cross-world' notions and assumptions involved in the analysis of natural (in)direct effects based on nested counterfactuals. Crucially, separable effects turn out to be especially useful in time-dependent setting, e.g. when the outcome is survival and mediation is via a process (Didelez, 2019). Moreover, they lead to a novel estimand in competing events settings, where the separable direct effect on the event of interest is the effect of the treatment component that only activates causal paths avoiding the competing event (Stensrud et al., 2020). The presentation will discuss the practical use of separable effects, and compared it in detail to more traditional approaches. Their interpretation, structural assumptions and estimation will be illustrated in the context of three examples typical for epidemiological and bio-statistical applications: (i) In an application of causal mediation analysis we aim at investigating how an exposure might affect cognitive decline in the elderly – these are processes taking place over time which will be addressed by the statistical analysis. (ii) Typically, survival analyses in cancer research are faced with competing risks, e.g. death due to other reasons, which no approach can ignore; we discuss the separable effect estimand and methods of analysis for such settings. (iii) Finally we consider the analysis of an RCTs that is affected by an undesirable intercurrent event, such as an adverse drug reaction; here, we show that separable effects specifically address research questions relevant to future drug development as they explicitly represent effects of modified treatments. This is based on joint work with Mats Stensrud.

Simulating hypothetical interventions on multiple mediators: Extending methods and practical guidance

ABSTRACT. Many epidemiological questions concern the pathways presumed to mediate an association, particularly in life course and social epidemiology. Invariably, the translational intent of such research questions is to inform potential intervention targets, but until recently causal mediation analysis methods did not define mediation effects in a way that acknowledged this interventional intent. In recent work [1,2] we have proposed a novel framework conceptualising mediation effects by mapping to a target randomised trial evaluating mediator interventions. This approach is particularly relevant in the context of mediators that do not correspond to well-defined interventions, a scenario where it is plausible, and indeed perhaps necessary, to consider hypothetical interventions that would shift the mediators’ distributions. The approach consists in specifying a target trial relevant to the research question, regarding the impact of shifting joint mediator distributions to user-specified distributions. These estimand assumptions are distinguished from identifiability assumptions, which are needed to emulate the effects of those shifts with the observed data. Estimation is done by simulation via a g-computation approach. By its very nature, the approach is context-specific: the target effects must be tailored to each specific question and context. Drawing on learnings from its application to several longitudinal cohort studies, some of which are already published, this paper presents further developments of the method to tackle practical issues encountered in its application. This includes alternative approaches to effect definition according to the question, consideration of different types of exposures and mediators, and guidance on handling issues such missing data, clustering and reporting. Findings will assist researchers in applying the method while tackling practical issues as soundly and easily as possible.

References 1. Moreno-Betancur M, Moran P, Becker D, Patton G, Carlin J. Mediation Effects That Emulate a Target Randomised Trial: Simulation-Based Evaluation of Ill-Defined Interventions on Multiple Mediators. Statistical Methods in Medical Research (in press) 2. Moreno-Betancur M. The target trial: A powerful device beyond well-defined interventions. Epidemiology. 2020;32(1):291-294.

Mediation with a survival outcome and time-varying mediator: empirical comparison and use in observational data

ABSTRACT. Mediation analyses seek an understanding of the mechanisms through which an exposure affects an outcome but there are statistical challenges for survival outcomes with longitudinal covariates from observational data. We are motivated by the study of cystic fibrosis-related diabetes (CFRD), a common comorbidity of cystic fibrosis (CF). CFRD negatively affects survival but the mechanisms are not well understood. We aim to illustrate and compare two recently proposed methods for mediation analysis in this setting using observational data from the UK CF Registry. Further, the sensitivity of each method to model misspecification and data availability is examined via a simulation study. In causal mediation analyses, analytical and identification challenges arise when working with a time-dependent mediator and covariates. Survival outcomes pose another difficulty: because the exposure affects the mediator both directly and indirectly through its effect on survival time, survival is a post-treatment confounder. In the observational data, individuals have different entry times and, when the exposure is the diagnosis of a disease, there is no natural time zero. We propose a stacked analysis dataset, constructed similarly to a landmark dataset, to maximally exploit the longitudinal data. Two mediation methods are employed. Aalen et al. (Biometrical Journal 2018) described a method based on exposure splitting and an additive hazards model for quantifying indirect effects. Vansteelandt et al. (Statistics in Medicine 2019) proposed a method based on path-specific effects that accommodates time-varying confounders and flexible outcome models. In the study population of 5,553 individuals there were 1,180 incident cases of CFRD. The indirect effect of CFRD on survival via lung function (FEV1%) increased slightly over time with the proportion mediated by FEV1% estimated at 9% [95% CI: 4%,18%] at 5 years post CFRD diagnosis by the method of Aalen and at 7% [95% CI: -14%, 18%] for the method of Vansteelandt. The simulation study suggests that both methods may suffer from bias when mediator measurements are infrequent; analogous to measurement error, the indirect effect estimates are attenuated. Also, if the assumption of no time-varying confounders is not valid, Aalen’s methods may produce significant bias in effect estimates.

Mediation with Irregular Longitudinal Biomarkers: An Application to Examining Obesity and Severe Disease in COVID-19 Patients

ABSTRACT. Obesity is an established risk factor for severe disease in patients with COVID-19. As inflammatory biomarkers, such as C-reactive protein (CRP), have been associated with both obesity and severe disease, effects of obesity on severe disease may be mediated in part through CRP. Data on laboratory values from electronic health records, which are repeately measured at irregular times during hospitalization, can help evaluate this hypothesis. However, common approaches to mediation analysis in causal inference generally assume that mediators are measured either at a single time point or longitudinally over regularly-spaced intervals. Though summary measures, such as the maximum over the course of follow-up, could be used, loss of information is expected from neglecting the full longitudinal trajectory.

We consider an approach that accounts for an irregular longitudinal mediator through functional regression modeling when estimating natural direct and indirect effects (NDE and NIE). The longitudinal mediator is regarded to be sampled points from an underlying mediator stochastic process over time that has time-varying effects on the outcome and exposure. A functional logistic regression (Müller 2005) is used to estimate the probability of exposure given the mediator process and covariates, modeling the cumulative effects of the mediator process over time. These propensity score estimates are then applied in estimators for the NDE and NIE based on inverse-probability weighting representations of the mediation formula (Pearl 2001), assuming nonparametric identification conditions.

We analyed data from 983 patients hospitalized with COVID-19 at Massachusetts General Hospital from March through May 2020. In preliminary results, we find that there are significant population-average natural direct and indirect effects of obesity (BMI>30 kg/m2) through CRP on a binary outcome for intensive care unit admission within 28 days (NDE=0.051, 95% CI -0.002-0.103; NIE=0.043, 95% CI 0.021-0.069), with 46% of the effect mediated through longitudinal CRP measurements. In contrast, using only the maximum value of CRP over follow-up resulted in lower degree of mediated effects (NDE=0.078, 95% CI 0.023-0.136; NIE=0.017, 95% CI 0.004-0.034), with 18% of the effect mediated. These results suggest that functional modeling can be a promising approach to incorporating irregular longitudinal biomarkers in mediation analysis.

17:00-18:30 Session OC2B: Bayesian clinical trial design (2)
Early completion of phase I cancer clinical trials with Bayesian optimal interval design

ABSTRACT. Phase I cancer clinical trials have been proposed novel designs such as algorithm-based, model-based, and model-assisted designs. Model-based and model-assisted designs have a higher identification rate of maximum tolerated dose (MTD) than algorithm-based designs, but are limited by the fact that the sample size is fixed. Hence, it would be very attractive to estimate the MTD with sufficient accuracy and complete the trial early. O’Quigley1 proposed the early completion of a trial with the continual reassessment method (CRM) among model-based designs when the MTD is estimated with sufficient accuracy. However, the proposed early completion method based on the binary outcome trees has a problem that the calculation cost is high when the number of remaining patients is large. Among model-assisted designs, the Bayesian optimal interval (BOIN) design provides the simplest approach for dose adjustment. We propose the novel early completion method for the clinical trials with the BOIN design when the MTD is estimated with sufficient accuracy. This completion method can be easily calculated. In addition, the method does not require many more patients treated for the determination of early completion. We confirm that the BOIN design applying the early completion method has almost the same MTD identification rate compared to the BOIN design through simulations conducted based on over 30,000 scenarios.

Decision rules for identifying combination therapies in open-entry, randomized controlled platform trials

ABSTRACT. The design and conduct of platform trials have become increasingly popular for drug development programs, attracting interest from statisticians, clinicians and regulatory agencies. Many statistical questions related to designing platform trials - such as what is the impact of decision rules, sharing of information across cohorts, and allocation ratios on operating characteristics and error rates – remain unanswered. In many platform trials, the definition of error rates is not straightforward as classical error rate concepts are not applicable. In particular, the strict control of the family-wise Type I error rate may not be applicable in certain settings. For an open-entry, exploratory platform trial design comparing combination therapies to the respective monotherapies and standard-of-care, we define a set of error rates and operating characteristics and then use these as a measure to compare a set of design parameters under a range of simulation assumptions. When setting up the simulations, we aimed for realistic trial trajectories, e.g. in case one compound is found to be superior to standard-of-care, it could become the new standard-of-care in future cohorts. Our results indicate that the method of data sharing, exact specification of decision rules and quality of the biomarker used to make interim decisions all strongly contribute to the operating characteristics of the platform trial. Together with the potential flexibility and complexity of a platform trial, which also impact the achieved operating characteristics, this implies that utmost care needs to be given to evaluation of different assumptions and design parameters at the design stage.

A Bayesian decision-theoretic approach to outcome-adaptive sequential multiple assignment randomised trials (SMARTs) with distinct intermediate binary endpoints

ABSTRACT. The ‘COVID-19 prevention and treatment in cancer: a sequential multiple assignment randomised trial (SMART)’ is an innovative multi-stage design that randomises high-risk cancer patients to prophylaxis and, if they develop COVID-19, re-randomises them to an experimental treatment conditional on their disease severity (NCT04534725).

Although SMARTs are excellent designs to identify personalised treatment sequences that are tailored to a patient’s evolving clinical status, also known as ‘dynamic treatment regimens’, they are typically analysed at completion so the dynamic treatment regimens are optimised only for future patients. But identifying and implementing an efficacious COVID-19 prophylaxis and treatment regimen for cancer patients is an immediate priority. Outcome-adaptive randomisation is one approach that could increase the chance that patients are randomised to the most promising treatment and also completely stop randomisation to a clearly inferior treatment, enabling rapid clinical implementation.

Routinely used outcome-adaptive randomisation algorithms do not account for potential treatment effects in the later stages of SMART designs. Such approaches may randomise patients to suboptimal regimens of prophylaxis and treatment. Methods that are typically used to optimise dynamic treatment regimens from SMART data could be used to inform the adaptive randomisation, but they are complicated, so very few guiding examples exist.

Q-learning, a statistical dynamic programming method that can be used to analyse SMART data, is one of the few algorithms that has been used to perform outcome-adaptive randomisation for a SMART. In its simplest form, Q-learning uses stage-wise statistical models and backward induction to incorporate later-stage ‘payoffs’ (i.e., clinical outcomes) into early-stage ‘actions’ (i.e., treatments). We propose a Bayesian decision-theoretic Q-learning method to perform outcome-adaptive randomisation. This approach allows dynamic treatment regimens with distinct binary endpoints at each stage to be evaluated, a known limitation of the Q-learning method.

Our simulation study, motivated by the cancer trial, aims to examine whether the Bayesian decision-theoretic Q-learning method can expedite treatment optimisation and, compared to routinely used adaptive randomisation approaches that do not consider later stages of the SMART, assign more trial participants to optimal dynamic treatment regimens.

Modelling time varying recruitment rates and site activation prediction in multicentre clinical trials: A comparison study.

ABSTRACT. Multicentre Phase II/III clinical trials are large scale operations that often include hundreds of recruiting sites (centres) in several countries. Planning of operational aspects of a clinical trial requires selection of sites and countries to adhere to study protocol and recruitment requirements. It is thus critical to accurately predict site activation and recruitment timelines, to optimize success of a trial. Such predictions occurring prior to trial initiation assist study teams with trial monitoring progress, and also assist them to take proper actions during the trial when recruitment data indicate deviations from the study plan.

In this work we showcase our experience from modelling recruitment in clinical trials sponsored by AstraZeneca between 2010-2020. We show that recruitment rates tend to vary during a trial, depending on therapeutic area and country. However, industry standard has often employed a homogeneous Poisson model (Anisimov et al 2007) which models patient recruitment rates as a time-constant function. Instead, we show how a non-homogenous Poisson modelling approach (Urbas et al 2020) is used to account for time-varying recruitment rates, and we demonstrate improved accuracy in trial prediction timelines. The latter approach utilises and ensemble of five models, four of which explicitly model time-varying recruitment rate and one assuming a homogenous process. Bayesian modelling averaging is used to combine estimations from models.

We will present a thorough descriptive analysis of our data per therapeutic area and investigate the impact of model misspecifications under the following scenarios: (a) when recruitment rates are modelled as constant but change during the trial, (b) where recruitment rates are modelled as time-varying on homogeneous data, and (c) when time-varying effect is not correctly specified by the model. Additionally, we will investigate how failure of predicted site activation and number of sites can significantly change predictions against existing study plans.

Compromise Bayesian test decisions under type I error rate constraint

ABSTRACT. Bayesian clinical trials can benefit of historical information through the elicitation of informative prior distributions. Concerns are however often raised about the potential for prior-data conflict and the impact of Bayes test decisions on frequentist operating characteristics, in particular type I error rates. Indeed, power gains through incorporation of historical information are typically not possible when requiring strict conditional type I error rate control, even when historical information is dynamically discounted based on the observed degree of prior-data conflict (Kopp-Schneider et al., 2020). This observation motivates the development of principled borrowing mechanisms, which strike a balance between frequentist and Bayesian decisions. Ideally, the trust assigned to historical information defines the degree of robustness to prior-data conflict one is willing to sacrifice. However, such relationship is often not directly available when explicitly considering inflation of type I error rates.

We thus investigate a rationale for inflation of conditional type I error rate in a one-arm one-sided test situation which explicitly and linearly relates the amount of borrowing and the amount of frequentist type I error rate inflation, while satisfying Bayesian optimality criteria under the (inflated) type I error rate constraint. To this aim, we exploit the known duality between test error costs and prior probabilities (e.g. Berger, 1985). The solution is equivalent to a slight modification of the restricted Bayes solution of Hodges and Lehmann (1952), and a characterization is made in the spirit of Efron and Morris (1971), who addressed a closely related problem in the context of estimation. Connections with the robust mixture prior approach, particularly in relation to the choice of the mixture weight and robust component, are made. Simulations are performed to show the properties of the approach for normal and binomial outcomes, and extensions to two-arm and two-sided situations are discussed.

17:00-18:30 Session OC2C: Epidemic modelling of COVID19
Dynamic predictive modelling of the first wave of the COVID-19 pandemic in Canada using a deterministic density-dependent susceptible-exposed-infected-recovered (SEIR) model that accounts for age-stratified ageing, reporting delays and mortality risks.

ABSTRACT. Background: As the COVID-19 pandemic continues to evolve, examining its differential dynamics by demographic and behaviour patterns is crucial. However, intersectional models are limited.

Objectives: Develop a dynamic epidemic model for COVID-19 in Canada that accounts for the impacts of age and health capacity, to uncover related associations and trends.     Methodology: Using age-specific COVID-19 mortality risk estimates and sociodemographic data, I modelled COVID-19 cases with a deterministic age-stratified density-dependent Susceptible-Exposed-Infectious-Recovered (SEIR) model, that accounts for ageing, reporting and testing delays, and mortality by age group. Cases were grouped into three age strata, namely 0-39, 40-59 and 60+, according to disease-related mortality risks. Stratum-specific reporting delays were assigned based on assumptions about socio-behavioural patterns and timelines of symptom onset. The model was calibrated using line-list data from Statistics Canada and Worldometre.

Results: The model modestly overestimates cumulative cases but underestimates two-thirds of all deaths, and the 60+ age group is most affected. After calibration, the fit between model-predicted time-series data and surveillance data improves to near-perfection. Age-stratum-specific risks of death from COVID-19 in Canada differ from global statistics. People aged 60+ are overrepresented in COVID-19 deaths despite a relatively even distribution of cases by age stratum. The reported burden of COVID-19 is affected by reporting delays, which vary significantly by age category.

Conclusion: Modelling incidence and prevalence data reveals that the course of the epidemic and disease outcome vary significantly by age, and the reported burden of COVID-19 is affected by reporting delays, which also vary by age category.    Advocacy message: The near-perfect fit between the calibrated model and line-list surveillance data validates the pertinence of a comprehensive age-stratified SEIR model to contextualize the pandemic. The model can be applied to other contexts, and can facilitate identifying high-risk populations, super-spreaders and limitations in health capacity, which would serve to mitigate the spread of COVID-19. Nonetheless, as the pandemic and public health response continue to evolve, epidemic trends may change.

Nowcasting CoVID-19 Deaths in England by Age and Region

ABSTRACT. Understanding the trajectory of the daily numbers of deaths in people with CoVID- 19 is essential to decisions on the response to the CoVID-19 pandemic. Estimating this trajectory from data on numbers of deaths is complicated by the delay between deaths occurring and their being reported to the authorities. In England, Public Health England receives death reports from a number of sources and the reporting delay is typically several days, but can be several weeks. Delayed reporting results in considerable uncertainty about the number of deaths that occurred on the most recent days. Adjusting for reporting delays is complicated by day-of-the-week reporting effects, changes over calendar time in the delay distribution, and excess variability (overdispersion) in the daily numbers of reports.

Our aim is to estimate the number of deaths on each day in each of five age strata within seven English regions. Such estimates are known as ‘nowcasts’. We use a Bayesian hierarchical model that involves a submodel for the number of deaths per day and a submodel for the reporting delay distribution. This model accounts for reporting-day effects and longer-term changes over time in the delay distribution. There is also a computationally efficient way to fit the model when the delay distribution is the same in multiple strata, e.g. over a wide range of ages. In this presentation, we shall describe this model and show an example of the resulting nowcasts for England.

Reference: Nowcasting CoVID-19 Deaths in England by Age and Region. Seaman SR, Samartsidis P, Kall M, De Angelis D. MedRXiv. doi:

An evaluation of the stability, precision and performance of phenomenological models applied to COVID-19 cases and deaths in South Africa

ABSTRACT. Background: Phenomenological models present users with a purely data driven approach for modelling infectious disease data. Although these models are well established and have been successfully applied to other outbreaks, the use of these models for estimating epidemiological parameters and doing short- and longterm prediction for COVID-19 cases and deaths has been limited . Objectives: We present an evaluation of models fitted to South African data by investigating fit to data, convergence, stability of model parameters over time, use of piecewise models, change in predictions as the estimation period changes and sensitivity of model to perturbations in data. Models are fitted using least squares estimation with normal as well as Poisson assumptions of underlying conditional distribution. A simple SEIR model may used to compare important model estimates. Methods: Publicly available data on COVID-19 cases and deaths in South Africa were used. Well known nonlinear phenomenological models, including the logistic, Gompertz and Richards models, were fitted using nonlinear least squares. Robust confidence intervals for model parameters were estimated using parametric bootstrap. Piecewise models were fitted where the data dictated its use. Models are fitted to daily as well as weekly data in order to improve convergence. Fit criteria are used to compare models. Provincial models may be investigated. Results: The model that fitted the data most consistently was the three-parameter logistic. Greater stability in parameters were observed for models fitted to case data whereas death data was perturbed by noise affecting model fitting. The Gompertz model tended to overestimate the final size of the pandemic, whereas the Richards model underestimated it. Conclusion: Phenomenological models may provide a robust way to support findings from mathematical models, as well as be a tool to introduce complex disease modelling concepts to the general public. Phenomenological models are a useful tool for short term prediction, but may provide unreliable long term predictions early in the outbreak. Further results will be presented on the use of the piecewise models, the Poisson vs Normal distributions as well as the modelling of the second wave.

Estimation of incubation time and latency time distribution of SARS-CoV-2: the impact of distributional assumptions

ABSTRACT. Background

The distribution of incubation time (from infection to symptom onset) and latency time (from infection to start of infectiousness) are key quantities in the analysis of infectious diseases. Both guide decisions on contact tracing and quarantine policies. Typically, the event time of symptom onset is observed exactly, whereas for the time origin (infection) only an exposure interval is known. Assuming that the risk of infection is constant within that interval, data can be made single interval-censored by transforming the time scale. Although the role of pre- and asymptomatic transmission is evident, estimates of the latency time distribution are lacking. Since start of infectiousness is determined by sequential test results, data on latency time are doubly interval censored. To simplify estimation of incubation and latency time, it is common practice to use parametric distributions. The appropriateness of such distributions remains unclear, especially for the right tail of the distribution which informs quarantine policy. As in the tail there are less observations, its estimate will strongly depend on the assumed parametric distribution. Hence, we hypothesize that a nonparametric approach is more appropriate.


Incubation and latency time distributions are estimated from Vietnamese data. Intensive contact tracing and large-scale testing has created a unique data set. We use smoothed nonparametric maximum likelihood methods and compare them with their parametric counterparts. We also consider an alternative approach based on renewal processes (Deng et al., 2020*). We compare all approaches and their sensitivity to the imposed assumptions in a simulation study.


Accurate estimates of the incubation time and latency time distribution are important to optimize quarantine policy in Vietnam and elsewhere. We contribute to methodological literature, by comparing (non)parametric approaches, as well as to clinical literature: estimation of latency time for SARS-CoV-2 is a novelty.

*) Deng Y, You C, Liu Y et al. Estimation of incubation period and generation time based on observed length-biased epidemic cohort with censoring for COVID-19 outbreak in china. Biometrics 2020; DOI:10.1111/biom.13325.

Regional estimates of reproduction numbers with application to COVID-19

ABSTRACT. Context: In an ongoing epidemic, public health decisions are based on real-time monitoring of the spread of the disease. To this end one often considers reproduction numbers which measure the amount of secondary cases produced by a single infectious individual. While non-pharmaceutical interventions are applied on a subnational level, estimation of reproduction numbers on this level may be difficult due to low incidences.

Objectives: The study aims to provide reasonable estimates of reproduction numbers on the county level during periods of low incidence.

Methods: We start with the well known renewal equation and, using techniques from small-area estimation, assume county level reproduction numbers to be random with a common distribution. Under this model we use maximum-likelihood estimation to obtain estimates of reproduction numbers on both the county and national level. Both estimators are analysed by a simulation study and applied to German case data from Robert-Koch Institut with focus on a local outbreak in summer 2020.

Results: The simulation study shows that the estimator yields sensible estimates of both the national and county reproduction numbers. It can handle low case counts, and may be used to distinguish local outbreaks from more widespread ones. For scenarios where incidences are low it handles local outbreaks, such as the one considered in the German case data, better than previous methods.

Conclusions: The new estimator provides insight on the spread of an epidemic on the subnational level despite low case counts.

17:00-18:30 Session OC2D: Meta-analysis : network and other
Quantifying the robustness of primary analysis results: a case study on missing outcome data in pairwise and network meta-analysis

ABSTRACT. Conducting sensitivity analyses is an integral part of the systematic review process to explore the robustness of results derived from the primary analysis. When the primary analysis results can be sensitive to assumptions concerning a model's parameters (e.g. missingness mechanism to be missing at random), sensitivity analyses become necessary. However, what can be concluded from sensitivity analyses is not always clear. For instance, in pairwise and network meta-analysis, conducting sensitivity analyses usually boils down to examining how ‘similar’ the estimated treatment effects are from different re-analyses to the primary analysis or placing undue emphasis on the statistical significance. To establish objective decision rules regarding the robustness of the primary analysis results, we propose an intuitive index, which uses the whole distribution of the estimated treatment effects under the primary and alternative re-analyses. This novel index summarises the robustness of primary analysis results in a single number per treatment comparison, and it is compared to an objective threshold to infer the presence or lack of the robustness. In the case of missing outcome data, we additionally propose a graph that contrasts the primary analysis results to those of alternative scenarios about the missingness mechanism in the compared arms. When robustness is questioned according to the proposed index, the suggested graph can demystify the scenarios responsible for producing results inconsistent with the primary analysis. The proposed decision framework is immediately applicable to a broad set of sensitivity analyses in pairwise and network meta-analysis. We illustrate our framework in the context of missing outcome data in pairwise and network meta-analysis using published systematic reviews.

Meta-analysis of randomised trials with continuous outcomes: methods that adjust for baseline should be used

ABSTRACT. We revisit a methodological case study on methods for meta-analysis of trials reporting aggregate/summary results for continuous outcomes measured at baseline and follow-up. The Trowman method, proposed in the original methodological case study to adjust for baseline imbalance, is compared with three aggregate data (AD) meta-analytic approaches that synthesize: 1) follow-up scores; 2) change scores; or 3) ANCOVA estimates recovered from reported data; and also to a novel individual participant data (IPD) meta-analysis approach, where we generate pseudo IPD (based on the reported AD) followed by ANCOVA. The pseudo IPD analysis provides identical estimates to the true IPD analysis given that it is a likelihood-based approach making use of the appropriate sufficient statistics for ANCOVA. The methods are demonstrated on two real datasets, the original Trowman example, where considerable imbalance at baseline occurred and on a second example of meta-analysis of obstructive sleep apnea studies with moderate baseline imbalance and substantial effect modification by the baseline measurements. The Trowman method makes strong, unrealistic assumptions and may provide erroneous conclusions regarding within- and across-trials effects. ANCOVA methods yield more precise treatment effect estimates than standard AD approaches and are recommended for the meta-analysis of continuous outcomes, with or without baseline imbalance. The pseudo IPD approach, generating pseudo IPD and fitting standard IPD models under one- or two-stage methods, is further advocated as it can additionally investigate potential differential responses to treatment. The one-stage approach comes with a plethora of modelling options e.g., study-stratified or random study intercepts, adjustment terms and allows to adopt more clinically plausible scenarios for the within-trial residual variances in a straightforward manner. The two-stage pseudo ANCOVA approach will often give very similar results to the one-stage approach (under the same modelling assumptions) and it is easier to perform as it requires less statistical expertise. Recovering ANCOVA estimates using AD could serve as good alternative, if no effect modification is expected. To initiate a paradigm shift in undertaking more ANCOVA meta-analysis in the future we developed an interactive tool using R Shiny, where the user is guided in algebraic data calculations followed by appropriate analysis.

Flexible generic framework for evidence synthesis in health technology assessment

ABSTRACT. Background: Network meta-analysis (NMA) is commonly used to simultaneously assess multiple competing interventions. It facilitates the quantitative synthesis of the evidence base, which is reported using different data formats, individual participant data (IPD) or aggregate data (AD). Moreover, the evidence can come from non-randomized studies (NRS) or randomized controlled trials (RCT). RCTs are considered the highest quality of evidence due to its low risk to be associated with selection bias, however, the restricted settings of RCTs limit the generalizability of their results. Combining RCT and NRS evidence in NMA may help overcome some of RCTs limitations. Our aim is to develop a generic NMA framework to synthesise IPD and AD evidence from RCTs and NRS.

Methods: We built a Bayesian generic NMA model as an extension of the three-level hierarchical model that combine IPD and AD, to incorporate both RCT and NRS evidence in four different ways. Namely we compared: (a)naïve approach, (b) a model that uses NRS as prior information; (c,d) two different bias adjustment models. These models were used to analyze a network of 3 pharmacological interventions and placebo for patients diagnosed with relapsing remitting multiple sclerosis. The dataset consists of 2 AD and 4 IPD, for a total of 4181 patients, enrolled in either phase III RCTs or a cohort study (SMSC). The models were implemented in R. We conducted a network meta-regression with age as a covariate. Across studies, we assumed the relative treatment effects are exchangeable and the covariate effects are fixed. Results: The four models described above were compared to a model that considered evidence for RCTs only. Dimethyl fumarate’s and natalizumab’s posterior estimates agree to a large extent. For glatiramer acetate, penalization of AD studies as high or unclear risk of bias produced variable estimates of its effect.

Conclusions: The inclusion of RCT and NRS evidence in NMA is needed to consider all the available evidence. Arbitrarily ignoring RCT or NRS may lead to biased results and misleading conclusions.

The HTx project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825162.

Component Network Meta-Analysis Including Individual Participant Data and Summary Aggregate Data

ABSTRACT. Background: In many settings, particularly public health, interventions are ‘complex’ meaning that they are comprised of multiple components. Component network meta-analysis (CNMA) is an evidence synthesis method developed to identify combinations of intervention components that are potentially most effective, including combinations that have not been evaluated in previous research.

Methods: To identify which interventions should be recommended for particular sub-populations of patients, existing CNMA methods were adapted to include covariates using individual participant data (IPD), where it was available, and summary aggregate data (SAD). The method was applied to a Cochrane Collaboration systematic review dataset that investigated interventions for promoting the use of safety practices for preventing childhood poisonings at home. The interventions in this dataset were made up of the following components: usual care (UC), education (Ed), free or low cost equipment (Eq), installation (In) and home safety inspection (HSI).

Results: A network meta-analysis of the SAD identified the Ed+Eq+HSI intervention to be the most effective. The CNMA approach allowed the effect of each component and combinations of components to be estimated. The adapted CNMA method including IPD and SAD has the potential to reduce the uncertainty in the component effect estimates compared to the CNMA using only SAD, minimise ecological bias and identify the intervention combination that has the potential to be the most effective for specific subgroups of the population.

Conclusions: This research will demonstrate how evidence on complex interventions can be synthesised to provide better information to policy decision makers for making evidence-based recommendations to sub-populations and direct future research. CNMA has the potential to identify potential intervention combinations that have not been tested in trials but may be more effective than those that have.

Bayesian multivariate network meta-analysis model for the difference in restricted mean survival times

ABSTRACT. Network meta-analysis (NMA) is essential for medical decision-making. NMA enables inference for all pairwise comparisons between healthcare interventions available for the same indication, by using both direct evidence and indirect evidence. In randomized trials with time-to event outcome data, such as lung cancer trials, conventional NMA methods rely on the hazard ratio and the proportional hazards assumption, and ignore the varying follow-up durations across trials. We introduce a novel multivariate NMA model for the difference in restricted mean survival times (RMST). Our model synthesizes all the available evidence from multiple time points simultaneously and borrows information across time points via within-study covariance and between-study covariance for the differences in RMST. We analytically derived the within-study covariance and estimated the model under the Bayesian framework. We evaluated our model by conducting a simulation study. Our multiple-timepoint model yields lower mean squared error over the conventional single-timepoint model at all time points, especially when the availability of evidence decreases. We illustrated the model on a network of randomized trials of second-line treatments of advanced non-small-cell lung cancer. Our multiple-timepoint model yielded increased precision and detected evidence of benefit at earlier timepoints as compared to the single-timepoint model. Our model has the advantage of providing clinically interpretable measures of treatment effects.

17:00-18:30 Session OC2E: Some thought on research in biostatistics
Phases of methodological research in biostatistics – a proposal to increase transparency and reproducibility

ABSTRACT. When a new data analysis method is applied in clinical research, it has usually already undergone some evaluations to demonstrate its suitability and possibly optimality or robustness. The stages of establishing a new method cover theoretical justification, investigations of analytical and asymptotic properties and numerical efficiency of estimation algorithms, and performance evaluations in simulated and real-world data sets. Here we propose to frame this process into a series of ‘phases’ in analogy to the well-established drug development pipeline. Using examples we explain our proposal and discuss implications for the practice of biostatistical research. In Phase I a method’s ‘safety’ may be under study: does it produce a meaningful result in some simple cases? Phase II may cover typical proof-of-concept studies in which a method’s inventors demonstrate ‘efficacy’, i.e., that their method outperforms others under idealised conditions. Phase III studies could be protocol-based methodological comparisons, with clearly defined in- and exclusion criteria for the domain of application, and could even include a blinding mechanism to separate the roles of data generator and data analyst in simulations. Phase IV studies, finally, may cover the ‘post-marketing surveillance’ to develop guidelines for when to use and when to avoid a new method, and to understand its pitfalls. In both Phase III and IV studies, disclosure statements should clarify if and how a method’s inventors were involved in the design and conduct of the comparison. While this classification is an initial attempt and needs further development, we are confident that the transparency and reproducibility of methodological research may greatly benefit from such a systematic process of development and evaluation. In particular, it may emphasize the importance of well-conducted Phase III or Phase IV studies in methodological research. Moreover, it may make transparent biases towards methods favoured by the study authors, avoid over-optimistic interpretation, and raise awareness of the value of further comparison studies. By way of conclusion, we propose that the phase of research should become an essential attribute of methodological research studies in clinical biostatistics.

The p-value conundrum: how can a Bayesian analysis help? A case study in reproductive and maternal-fetal medicine

ABSTRACT. There is a natural desire to interpret result from randomised controlled trials (RCTs) in a definitive way. Thus, overall interpretation of results from RCTs tend to focus on whether the result is statistically significant, even while attempting to abide by good practice reporting recommendations to provide estimated effect sizes along with confidence intervals. In the evaluation of treatment effects with binary primary outcomes, any small change in effect will often be important. Yet, RCTs will have limited power to detect these small effects due to statistical uncertainty. This means that drawing conclusions based on statistical significance becomes problematic. Firstly, there is the possibility of conflating a non-significant result with evidence of no difference – possibly leading to abandoning treatments that might actually work (overly definitive interpretation). Secondly, there is the risk of placing too much emphasis on statistical significance and conflating a result which is statistically not significant, yet suggestive of an effect, with an inconclusive result – perhaps leading to a failure to adopt treatments in a timely way (overly cautious interpretation). Bayesian estimates of the probability of the treatment having a positive effect offer a means to resolve this conundrum.

In this presentation we report the findings from a review where we systematically apply Bayesian methods to a sample of randomised trials in one particular field of medicine. To this end, we report results of a systematic review of a contemporary sample of two-arm individually randomised superiority trials in reproductive and maternal-fetal medicine, published in high-impact general medical and specialty journals between January 2015 and December 2020.

For each identified RCT we express the treatment effect for the primary outcome using relative differences (with 95% confidence interval) using a frequentist method and contrast this with a reanalysis using a Bayesian approach (with uninformative priors) to obtain the Bayesian posterior probability of a positive impact. We demonstrate how the Bayesian approach can resolve the p-value conundrum and lead to qualitative different conclusions in an important area of medicine. We offer practical guidance for journal editors, reviewers, statisticians and trialists to avoid misinterpreting trial results.

On the marginality principle, with a focus on ratios and interactions

ABSTRACT. The marginality principle states that, for estimation of interactions or other higher-order effects, it is illegitimate to exclude their main (or otherwise lower-order) effects. Some accounts state a second aspect of the principle: for estimation of main (or otherwise lower-order) effects, it is illegitimate to ignore their interactions or any other higher-order effects. For both aspects it is assumed to be a priori not known that the effects are zero. Many statisticians obey the first aspect, even if they do not know the marginality principle by name, but the second is either not known or considered secondary. Variables derived as the ratio of two measured variables commonly appear in research data. Examples include body mass index, total cholesterol to HDL, waist–hip ratio, left ventricular ejection fraction, and heart rate. Using the example of a regression model that includes a ratio as a covariate, we show how the two aspects of the marginality principle can be understood as two sides of the same coin. Using some of the listed examples of ratios, we argue that adhering to the first aspect while ignoring the second can lead to absurd modelling choices. The example generalises directly to interactions and less directly to polynomials. We conclude that for statistical models including continuous covariates: 1) awareness of both aspects of the marginality principle avoids making arbitrary default modelling choices; 2) it will be necessary to violate the principle in some way for many real modelling tasks; 3) the terms it is reasonable to omit will depend on context, but may be the ‘lower-order’ terms.

An extension to reporting guidelines for systematic reviews of prediction model studies (TRIPOD-SRMA)

ABSTRACT. Introduction Guidelines exist for reporting the development, validation and updating of risk prediction models (TRIPOD), and for the reporting of systematic reviews (PRISMA). However, no specific guidance exists for reporting systematic reviews of prediction models which can have different aims, ranging from identifying prediction models through to comparing predictive performance of models. Therefore, existing reporting guidelines require modification to be more suitable for reporting systematic reviews and meta-analyses of prediction model studies.

Objectives To develop an extension to the TRIPOD reporting guidelines, specific to systematic reviews of prediction model studies.

Methods Existing reporting guidelines were reviewed. Relevant guideline items were combined and assessed for suitability by two researchers, considering the different aims of systematic reviews: i) identification of prediction models within a broad clinical field, ii) identification of prediction models for a target population, iii) identification of prediction models for a particular outcome, iv) assessing the performance of a particular prediction model, and v) comparison of prediction models (in terms of predictive performance), where aims iv) and v) may also include meta-analysis. Item suitability and wording were discussed within the working group and a draft extension to TRIPOD was produced. Online Delphi surveys are currently being conducted, involving researchers with experience in systematic reviews and prediction modelling to provide feedback on the proposed items. Results from the Delphi surveys will inform the selection and wording of items in TRIPOD-SRMA.

Results PRISMA and TRIPOD-Cluster (submitted) were identified as the most relevant reporting guidelines. They contained many overlapping items; while PRISMA contained some items specific to systematic reviews, TRIPOD-Cluster contained some items specific to prediction models. Items from both guidelines were combined, resulting in many items being merged and modified, while other items specific to model development or individual participant data were removed. Results from the Delphi survey will be presented and a complete draft of our extension (TRIPOD-SRMA) proposed.

Conclusions TRIPOD-SRMA is an extension of existing reporting guidelines to provide more tailored guidance for reporting systematic reviews of prediction model studies.

Reference based multiple imputation for trials – what’s the right variance and how to estimate it?

ABSTRACT. Reference-based multiple imputation (MI) such as jump to reference has become a popular approach to handling missing data in clinical trials. One feature of these methods is that estimates of variance using Rubin’s rules are larger than the true repeated sampling variance, raising the question of whether use of Rubin’s rules is still appropriate. Recently Cro et al have argued that it is, since this results in so called information anchoring – the variance is (approximately) the same as under an MAR analysis.

I argue that reference based imputation methods are not truly information anchored, when information is judged in terms of repeated sampling variability, and that the repeated sampling variance is the 'right' one. That this variance reduces as the proportion of missing data increases is a logical consequence of strong assumptions made by reference-based methods. The question of which variance is correct is critical, as it materially affects the statistical power of clinical trials using this method.

For estimating the true repeated sampling variance, I describe and illustrate a simple to apply and computationally efficient combination of bootstrapping and MI. I present simulation results demonstrating the performance of this approach for reference based imputation with both repeatedly measured continuous endpoints and recurrent event endpoints. Finally, I argue that if true repeated sampling information anchoring is desired, new methods must be devised that satisfy this criterion.

17:00-18:30 Session OC2F: methods for analysis of electronic health records
Immortal time bias for life-long conditions in retrospective observational studies using electronic heath records

ABSTRACT. Background: Immortal time bias is common in real-world observational studies using electronic health records but is typically described for pharmacoepidemiology studies where there is a delay between cohort entry and treatment initiation.

Aims: For this study, immortal time bias is described in the context of electronic health record observational studies where the exposure is a life-long condition/disability.

Methods: Using intellectual disability as an example, one million patients from the UK primary care database (Clinical Practice Research Datalink [CPRD]) linked with national deaths data (Office for National Statistics) were selected to compare four different approaches to handling immortal time bias and their impact on life expectancy in patients (aged 10+ years) with and without the exposure of interest (intellectual disability) from 2000 to 2019. The four approaches were: (i) treatment of immortal time as observational time; (ii) exclusion of immortal time before date of first exposure diagnosis; (iii) exclusion of immortal time before assumed date of entry (by the clinician) of first exposure diagnosis; and (iv) treatment of exposure as a time-dependent measure. Smoothed life expectancy curves were estimated using flexible parametric survival models, stratified by exposure status.

Results: When not included in cohort entry criteria (Method 1), disproportionately high life expectancy was observed for the exposed population over the earliest calendar period compared with later periods. This effect attenuated but remained when date of diagnosis was incorporated into entry criteria (Method 2 and 4). Setting cohort entry to assumed date of entry of diagnosis (Method 3) resulted in a substantial loss of subjects and person-time and provided a poor proxy measure for date of entry of diagnosis in this CPRD cohort.

Conclusions: Immortal time bias presents a significant problem for comparing studies of life-long conditions/disabilities using electronic health record data. Implications of the findings and recommendations are discussed.

Identifying high-risk groups for BMI change using electronic health records from 2.3 million adults

ABSTRACT. Despite the urgent need to develop new and targeted strategies for population approaches to obesity prevention, current policy has largely been informed by cross sectional studies that do not allow identification of population groups at highest risk of BMI gain. In this study, we calculated longitudinal changes in BMI over 1-, 5- and 10-years and investigated transition between BMI categories using population-based electronic health records of 2,328,477 adults in England (1998- 2016) from CALIBER. To estimate the 1, 5 and 10-year BMI change we selected at random one pair of measurements per individual within window intervals of 6 months to 2 years, 4 years to 6 years and 8 years to 12 years respectively. The problem of missing values arises because not all individuals had two BMI measurements within the specific windows. We assumed that the missingness mechanism for BMI change was missing not at random (MNAR) and applied multiple imputation with delta adjustment. More specifically, we applied multiple imputation and we then added delta values to our imputed datasets of BMI change. The assumption under the calculation of the delta values was that that the average 10-year BMI estimates per age group and sex from CALIBER, after multiple imputation, would be the same with the corresponding estimates from an annual cross-sectional survey, the Health Survey from England. We then utilised logistic regression models to estimate the relationship of age, sex, social deprivation, ethnicity and region with BMI change. Youngest age was the most important sociodemographic factor for BMI change, The odds ratio of transitioning from normal weight to overweight or obesity in the youngest (18-24 years) compared to oldest (65-74 years) individuals was 4.22 (3.85-4.62)), from overweight to obesity 4.60 (4.06-5.22), and from non-severe to severe obesity 5.87 (5.23-6.59). Among the youngest adult, socially deprived men 72% transitioned from normal weight to overweight and 68% from overweight to obesity over 10 years. Multiple imputation using delta adjustment provides a flexible and transparent means to impute missing data under MNAR mechanisms, especially when these delta values can be calculated, using information from another study.

Multiple imputation of sporadically-missing continuous time data by Brownian bridge stochastic interpolation

ABSTRACT. Context The LIBERATES randomised controlled trial utilised continuous glucose monitoring (CGM) sensors to collect primary outcome data. The CGM aims to collect high-frequency data at regularly spaced intervals, several times per hour over a period of weeks, potentially generating over 1300 observations within a 2-week period. As was expected with wearable devices, there were many intervals in the final dataset with no observed data, occurring at different times of day and of differing lengths, with an unsynchronised resumption of the data stream. Conventional statistical analyses rely on single imputation methods that make often-implausible assumptions (such as linear or constant trends, ignoring cyclical trends within the data) and don’t account for inherent extra uncertainty. Multiple imputation is an established approach to analysis of data that is partially missing, but methods to impute longitudinal data are generally suited to simple rectangular datasets with discrete assessment time points. Although stream data may be summarised to a single measure – or a series of repeated measures, since data streams are partially-observed, imputing at the summary level may lead to implausible imputations, incompatible with the data observed. Objective 1. To demonstrate multiple forms of the Brownian Bridge stochastic interpolation approach to multiply-impute plausible paths across intervals of missing continuous data. 2. To start from a simple Brownian Bridge moving to more complex interpolation incorporating additional time-varying information derived from the input dataset. Methods By treating the underlying glucose values as a continuous-time stochastic process, missing data in the stream was addressed using Brownian Bridge interpolation. Multiple plausible paths were created to allow for the inherent additional uncertainty due to missing data. The method is developed by incorporating more observed data from the dataset, allowing for volatility varying over a 24-hour cycle, different volatility profiles per participant and differing drift patterns per participant over a 24-hour cycle. Results Comparisons are made, using the LIBERATES dataset, between simpler single imputation methods, such as last observation carried forward and simple linear interpolation, and the Brownian Bridge interpolation methods. Limitations relating to the data collection are highlighted, including distinguishing between partially and completely missing outcome measures, and measurement limits.

Handling missing data from wearable devices in clinical trials

ABSTRACT. Accelerometers and other wearable devices are increasingly being used in clinical trials to provide an objective measure of the impact of an intervention on physical activity. These devices measure physical activity on very fine intervals of time called epochs, which are typically aggregated to provide daily or weekly step counts. In this setting, missing data is common as participants may not wear the device as per protocol, or the device may fail due to low battery or water damage. However, there is no consensus on key issues in handling missing data from such devices. Controversy remains for even the most basic aspects, such as how to determine whether a measurement is missing or not.

We propose an analysis framework that uses wear time to define missingness on the epoch and day level, and propose a multiple imputation approach, at an aggregated level, which treats partially observed daily step counts as right censored. This flexible approach allows the inclusion of auxiliary variables as well as sensitivity analysis to be performed. We illustrate its application to the analysis of exercise trials including the 2019 MOVE-IT trial.

References: Ismail K, Stahl D, Bayley A, Twist K, Stewart K, Ridge K, Britneff E, Ashworth M, de Zoysa N, Rundle J, Cook D, Whincup P, Treasure J, McCrone P, Greenough A, Winkley K. Enhanced motivational interviewing for reducing weight and increasing physical activity in adults with high cardiovascular risk: the MOVE IT three-arm RCT. Health Technol Assess. 2019 Dec;23(69):1-144. doi: 10.3310/hta23690. PMID: 31858966; PMCID: PMC6943381.

Using wristwear device to assess impact of COVID19 lockdown on physical activity

ABSTRACT. Wearable activity trackers such as Fitbit have grown in popularity in recent times. Although less accurate than research-grade accelerometer, Fitbit can nonetheless produce very useful data for measuring physical activities. During the last 12 months, various lockdown measures have been introduced in different parts of the world in attempts to slow the spread of the virus. While arguably effective at slowing the pandemic, COVID19 lockdown has had profound impacts on physical and psychological well-being of those impacted.

Using Fitbits step-count data collected over multiple individuals at 1-minute interval, I assess the impact of lockdown on physical activity using segmented generalized linear model (GLM) that allows for differences in mean level, diurnal and seasonal trends of physical activity before and after lockdown. Results from both linear and negative binomial (NB) regressions will be presented and compared. The results show there is significant between-individual heterogeneity of the impact of lockdown, with an overall negative impact of lockdown on physical activity.

In the second part of the talk, I will highlight several statistical issues inherent in the Fitbits data, including strong autocorrelation and measurement errors and propose methods to calibrate FitBits data using data from research-grade accelerometer.

17:00-18:30 Session OC2G: Analysis of gene expression and omics data
Prostate cancer intratumor heterogeneity assessment by depth measures analysis on imaging texture features

ABSTRACT. Background and Motivations: Personalized treatment has become a crucial point of modern medicine. Specifically, in patients with cancer, the optimization of therapeutic decision based on prognostic risk assessment is essential. At this regard, preliminary findings have shown imaging-derived biomarkers for spatial intratumor heterogeneity to be fundamental in understanding tumor severity and evolution, impacting on pre-treatment clinical-pathological prognosis [1]. However, a consensus about quantitative definition of heterogeneity has not yet been reached and, with it, informed clinical decisions cannot be implemented. Although quantitative tumor characterization from tomographic PET/CT imaging data inspection, namely radiomics, is catching on, redundancy and high dimentionality of imaging biomarkers prevent its translation to medical practice, calling for agnostic and lossless feature transformation, resulting in an exhaustive lesions texture profiling. Objective: In this work, we propose a depth-based metrics for lesion profiling in order to quantitevely define intratumor heterogeneity in patient with metastatic prostate cancer. Statistical methods: 84 patients with multi-metastatic recurrent prostate cancer enrolled in a clinical trail have been analyzed. All patients underwent whole-body [18F]FMCH PET/CT for restaging, lesions were segmented and radiomic texture features were extracted form ROI. The 37 radiomic features were sliced into 6 semantic groups and Mahalanobis depth has been computed on each group [2]. Each lesion ended up to be described by a 6-dimentional vector, namely its radiomic profile. Lesions profiles were clustered according to a k-means algorithms (k=2) and clusters were tagged as risk classes. Finally, patients’ intratumor heterogeneity was evaluted in terms of concordance of lesions risk factor within patients. Results: Preliminary results showed that texture analysis of [18F]FMCH PET/CT with depth measures is able to overcome redundancy and dimentionality issues with no information loss, in order to characterize intratumor heterogeneity in patients with recurrent prostate cancer. The ultimate goal is to provide an imaging biomarker for risk stratification, thus guiding patients’ treatment decision making.

Investigating Down syndrome by integrating methylation and glycomics using supervised PO2PLS

ABSTRACT. Background: Down syndrome (DS) is a condition that leads to premature or accelerated aging in affected subjects. They develop diseases that are typically observed at a higher age. Studies at the molecular level of DS have reported several alterations in methylation and glycomics. However, these studies were conducted on each omics level separately, overlooking the relationship between omics levels. Joint analysis of methylation and glycomics in the context of DS is needed to gain insight from a multi-omics perspective. Our motivating datasets were measured on 29 DS patients and their healthy siblings and mothers.

Aim: We aim to investigate the premature/accelerated aging in DS by analyzing methylation and glycomics jointly, and identify relevant CpG sites and glycans, using our newly proposed method supervised probabilistic O2PLS (supervised PO2PLS).

Method: For dimension reduction of high-dimensional correlated omics data, we consider two-way orthogonal partial least squares (O2PLS), which constructs a few joint latent variables that explain the covariance, while taking into account the heterogeneity between the two omics datasets. However, O2PLS does not model the outcome variable, i.e., the DS status, thus needs additional steps to link the latent variables to DS. We propose a probabilistic framework, supervised PO2PLS that combines omics integration and outcome regression in one model. It constructs joint latent variables that explain the covariance between omics and also the variance in the outcome variable. On the ‘global’ level, it allows for statistical inference on the relationship between the omics datasets, and between the omics data and the outcome as well. On the individual variable level, the assigned weights make it easy to identify important features. All the parameters in the model are estimated using maximum likelihood, taking into account the omics data and the outcome variable simultaneously. A simulation study to evaluate the performance of supervised PO2PLS and results of the DS data analysis will be presented.

Conclusion: To conclude, we study aging in DS by considering methylation and glycomics data together. Our proposed method that jointly analyzes multiple omics data with outcome variable may provide new insight into the underlying mechanism of premature/accelerated aging at multi-omics level.

Model selection characteristics when using MCP-Mod for dose-response gene expression data

ABSTRACT. In the context of drug development, understanding the dose-response relationship of a candidate drug is crucial to determine a target dose of interest. Classical approaches in clinical Phase II dose-finding trials rely on pairwise comparisons between doses and placebo.

A methodological improvement to this is the MCP-Mod (Multiple Comparison Procedure and Modeling) approach, originally developed for Phase II trials (Bretz et al., 2005). MCP-Mod combines multiple comparisons with modeling approaches in a multistage procedure. First, for a set of pre-specified candidate models, it is tested if any dose-response signal is present. Second, considering models with detected signal, either the best model is selected to fit the dose-response curve or model averaging is performed.

We extend the scope of application for MCP-Mod to in-vitro gene expression data and assess its characteristics regarding model selection for concentration-gene expression curves. Precisely, we apply MCP-Mod on single genes of a high-dimensional gene expression data set, where human embryonic stem cells were exposed to eight concentration levels of the compound valproic acid (VPA). As candidate models we consider the sigmoid Emax, linear, quadratic, Emax, exponential and beta model. Through simulations we investigate the impact of omitting one or more models from the candidate model set to uncover possibly redundant models and to evaluate the precision and recall rates of selected models.

Our results clearly support the consideration of various dose-response models when analyzing dose-dependent gene expression data. These include, inparticular, the often neglected non-monotone models such as the quadratic model.

Measured by the AIC, all models perform best for a considerable number of genes. For less noisy measurements the popular sigmoid Emax model is frequently selected. For more noisy data, often simpler models like the linear model are selected, but mostly without relevant performance advantage compared to the second-best model. It is also noticeable that the commonly used standard Emax model has an unexpected low performance.

Information sharing across genes for improved parameter estimation in dose-response curves

ABSTRACT. Determining the right dose is a critical goal in drug-development, from the pre-clinical phase (e.g. in toxicology) to phase 3 studies. In order determine an optimal dose of interest without being restricted to the experimental doses, a common approach is to fit a parametric curve to the dose-response data.

In toxicological assays and for human in vivo data, increasing the number of considered doses or the numbers of replicates or patients per dose yields a higher quality of the fitted curve, but causes critical additional costs. However, technologies for measuring high-dimensional gene expression data are well established. Thus, a statistical approach to obtain higher-quality fits of dose-response curves when gene expression is the target of interest is to exploit similarities between high-dimensional gene expression data. This idea can also be called information sharing across genes. Parameters of the fitted curves can be linked, according to either a priori assumptions or estimates of the distributions of the parameters, in a Bayesian framework.

Here, we consider the special case of the sigmoidal 4pLL model for estimating the curves associated with single genes in a toxicological in vitro assay, and we are interested in the concentration at which the half-maximal effect is reached, the EC50. This value can be considered a reasonable indicator for a relevant expression effect of the corresponding gene. We introduce an empirical Bayes method for information sharing across genes in this situation, by modelling the distribution of the EC50 values across all genes. Based on this distribution, for each gene, a weighted mean of the corresponding individually estimated parameter and of the overall mean of the estimated parameters of all genes is calculated, hence shrinking parameter estimates to the overall mean. We evaluate our approach using a simulation study that is based on the structure of a real gene expression dataset. Results show that the Bayesian method works well in terms of reduction of the mean squared error between true underlying value and estimate. Finally, the method is also applied to the real gene expression dataset to demonstrate the influence of the analysis strategy on the results.

A Bayesian approach to estimating dynamic models of co-regulated gene expression

ABSTRACT. Time-course gene expression datasets provide insight into the dynamics of complex biological processes, such as immune response, disease progression, and organ development. It is of interest to identify genes with similar expression patterns over time because such genes often share similar biological characteristics. For instance, they may be co-regulated by the same transcription factors. However, identifying genes with similar temporal expression patterns is challenging because gene expression datasets consist of thousands of genes, measured at a small number of time points, and the time dynamics of gene expression are highly nonlinear. We propose a Bayesian approach to estimating ordinary differential equation (ODE) models of gene expression, from which we derive new metrics that capture the similarity in the time dynamics of two genes. These metrics, which are based on the familiar R^2 value, are simple and fast to compute and can be used for generating clusters or networks of closely related genes. The salient feature of our method is that it leverages external biological databases that document known interactions between genes; the method automatically uses this information to define informative prior probability distributions on the parameters of the ODE model. This ultimately encourages genes with known relationships to receive higher similarity scores, and allows us to infer the functionality of under-studied genes whose time dynamics are very similar to more well-studied ones. We also derive optimal, data-driven shrinkage parameters that balance the ODE model's fit to both the data and to external biological information. Using real gene expression datasets collected from fruit flies, we demonstrate that our approach produces clusters and networks of genes with clear biological interpretations. By doing so, our method is able to reduce the dimensionality of gene expression datasets and reveal new insights about the dynamics of biological systems.

A methodological approach to assess data quality from the Clinical Practice Research Datalink

ABSTRACT. Introduction: Electronic health records (EHRs) are databases that store routinely-collected anonymised patient data. EHR data is increasingly used for research over time. However, data validation and quality assessments are often not performed. Previous systematic reviews have identified barriers and highlighted the need for generalised approaches to assess EHR data quality. We derived a methodological approach to assess data quality from the UK Clinical Practice Research Datalink (CPRD), demonstrated with application to the full blood count (FBC) blood test, which consists of up to 20 parameters. We provide recommendations for researchers who wish to access and analyse EHR data.

Methods: Laboratory data from CPRD was accessed for primary care patients aged at least 40 years at study entry with at least one FBC blood test. Medical codes and entity codes, two coding systems used within CPRD, were used to identify FBC blood test records (step 1) and cross-checked (step 2), with mismatches explored (step 3) and, where possible, rectified (step 4). The reliability of units of measurement are described (step 5) and reasons for missing data discussed (step 6).

Results: There were 138 medical codes and 14 entity codes for the FBC in the data (step 1). Medical and entity codes consistently corresponded to the same FBC parameter in 95.2% (n=217,752,448) of parameters (step 2). In the 4.8% (n=10,955,006) mismatches, the suggested FBC data was often already present elsewhere in the patient’s record (step 3). The most common parameter rectified was mean platelet volume (n=2,041,360), which we identified as not having an existing entity code, and 1,191,540 could not be rectified (step 4). Units of measurement were often missing, partially entered, or did not correspond to the blood value (step 5). One FBC parameter, red blood cell distribution width, had the most missing data (98% of 16,537,017 FBC tests), which we identified to be a result of haematology laboratories supressing output before delivering test results to primary care practices (step 6).

Conclusion: Performing data quality checks can help to understand the extent of any issues in the dataset. We emphasise balancing large sample sizes with reliability of the data.

Modeling Child Mortality in the presence of Clustering

ABSTRACT. Introduction: Though Under Five Child Mortality (U5CM) rate has significantly reduced globally, it is still a major public health problem in Low and Middle Income Countries (LMIC) compared to High Income Countries. In 2019 for instance, 1 in 13 children died before reaching their fifth birthday in Sub-Saharan Africa. Kenya is among the countries that have recorded high U5CM rate in the world. In 2020, U5CM in Kenya stands at 43.2 deaths per 1000 live births. This rate is almost twice the newly set third Sustainable Development Goals (SDG) target that seeks to reduce U5CM to about 25 per 1000 live births in all countries by 2030. The study analyses data from the Kenya Demographic and Health Survey (KDHS) conducted in 2014 which is associated with challenges such as clustering, variable selection, high level of missingness among others. We model U5CM in the presence of clustering. When observations are clustered within groups or multiple event times are clustered within individuals, dependence between event times in a cluster is of interest.

Objective: To establish the determinants of U5CM accounting for clustering at household level.

Methods: Two Random Survival Forest (RSF) models are fitted; the log-rank split-rule and the log-rank score split-rule. This is an attempt to handle the problem of non-proportional hazards that is often associated with survey data. We focus attention on the gamma frailty model with households as the random effects (frailty).

Results and Conclusion: We apply the two RSF models to the data. The two models identified the number of children under five years in the household as the most important predictor of U5CM. Both models are similar in identifying the same covariates affecting mortality of children under five, that is Number of children under five in the household, duration of breastfeeding, births in the past five years, total children ever born, total living children, mother's education and toilet facility. The gamma frailty model shows an increase in the effect of the same covariates on the outcome (U5CM). If dependence within the household is ignored, the effects of these covariates are underestimated.

Nguyen’s Information Criteria (NIC)

ABSTRACT. Objective : To develop a new strategy for the selection of variables for Big Data. Methods : ROP classification trees have the particularity of including neurons in the nodes [1], leading to obtain perfect tree (PT), with no classification errors. By assembling PT in large quantities, a new family of ten statistical information criteria, including three dimensions, to describe and understand how a feature contributes to a PT classification was developped [2]. The first category represents the informative and predictive quality of a feature and included 6 criteria. Among them, the first criterion, NIC1, is defined as the probability that a feature will obtain a PT. The second category represents the information measuring the proximity of a feature in the pathogenic sequence to a state Y and concerned 2 criteria. Among them, NIC8 is defined as the one-node model probability. The third category represents the information measuring the complexity of the relationship between a feature and state Y and concerned 2 criteria. Among them, NIC10, is defined as the probability to get an unique solution. With these 10 criteria, the variables can be ranked to an order of importance, according to a hierarchical strategy or a score. Two public databases were used (Wisconsin Breast Cancer, 569 observations, 30 features and GSE22513, placitaxel resistance genomic expression, 54675 probes and 14 duplicate observations) to compare the performance of this new statistical information in the selection of variables, against Gini information or against the SVM-RFE (support vector machines - recursive feature elimination) cross-validation procedure. Results : The first NIC allowed the Akaike information criterion to be minimized more quickly than data obtained with the Gini index when the features were introduced in a logistic regression model. The selected features based on the NICScore showed a slight advantage compared to the SVM-RFE procedure. Conclusion : A field of research for the development of criteria for evaluating the pathophysiological proximity of features to a biological event of interest and the complexity of their relationships was open. With these information, a mapping of the cascade of features leading to the event of interest can be expected.

Long-term oral prednisolone exposure for bullous pemphigoid: a population-based study using ‘big data’ and missing data algorithms

ABSTRACT. Oral prednisolone is the mainstay of treatment for bullous pemphigoid, a rare auto-immune blistering skin disorder of older people. Moderate to high dose treatment is often initiated in secondary care, but then continued in primary care. The aim is to use ‘big data’ and missing data algorithms to describe long-term oral prednisolone prescribing in UK primary care for adults with bullous pemphigoid, for the first time ever.

A cohort study using routinely collected data from the Clinical Practice Research Datalink was undertaken to identify people (≥18 years) with an incident diagnosis of bullous pemphigoid in the UK between 1998-2017. Oral prednisolone exposure was characterised in terms of the proportion of individuals with bullous pemphigoid prescribed oral prednisolone following diagnosis. Prednisolone dose and duration were extracted when available and imputed when missing. These data were often missing when prescriptions were issued with information restricted to the free text field, such as “Take as indicated by your dermatologist”. Implausible and missing values were handled using the DrugPrep algorithm with the decisions described and validated by Joseph et al.(2019).

Following the diagnosis of bullous pemphigoid, 2,312 (69.6%) of 3,322 people with incident bullous pemphigoid received a prescription for oral prednisolone in primary care. Of the users, only 321 (13.9%) people had complete data for all prescriptions. For the remaining people, the dose, start date, or treatment duration were imputed for at least one prescription. The median duration of exposure was 10.6 months (IQR 3.4 to 24.0). Of those prescribed prednisolone, 71.5% were continuously exposed to prednisolone for >3 months, 39.8% for >1 year, 14.7% for >3 years, 5.0% for >5 years, and 1.7% for >10 years. The median cumulative dose was 2,974mg (IQR 1,059 to 6,456).

A high proportion of people with incident bullous pemphigoid are treated with oral prednisolone in UK primary care. The potential iatrogenic risks posed to this population of predominately older people are high. Clear communication between primary and secondary care and consideration of steroid-sparing alternatives may be appropriate, and, where prednisolone is deemed the safest options, appropriate monitoring and prophylaxis for potential side effects are important measures.

Exploring risk stratification in cardiomyopathies using a deep learning approach for survival prediction

ABSTRACT. Cardiomyopathies are primary myocardial disorders characterized by an important genetic background. Most severe outcomes are cardiovascular (CV) death and life-threatening arrhythmias. An appropriate risk stratification, which can be relevant for clinical management of patients, is still lacking. This is a preliminary study aimed to provide a survival prediction model for a better risk stratification. We consider a cohort of 1453 patients enrolled in the Heart Muscle Disease Registry of Trieste (Italy), one of the largest and best characterized cardiomyopathy cohorts. Cause-specific survival models will be estimated for two endpoints: 1) CV death; 2) the occurrence of sudden cardiac death (SCD) or major ventricular arrhythmias (MVA). Baseline predictor factors will include 18 phenotypic variables spanning clinical, echocardiographic, ECG, Holter-monitoring domains. A subgroup of 467 patients have been screened for genetic rare variants in 35 candidate genes with Next Generation Sequencing (NGS) technology. Resulting variations will be annotated using multiple pathogenicity scores and then aggregated at gene-level. Preliminary analysis includes comparison of cumulative incidence curves with the appropriate test for competing risk. Given the relevant number of candidate predictors, the possible presence of non-linear effects and interaction-effects (especially for genetic factors), in the next stage of the study we will apply the recently published multilayer deep neural network (DNN) for survival prediction by Sun et al. (2020). The algorithm consists in a feed-forward neural network with partial likelihood estimation with Efron’s approach and L1 penalty for the resulting loss function. Prediction importance measures can be obtained using the LIME method. The cohort appears appropriate for investigating long-term outcomes: the median follow-up is 110 months (IQR 26-208). Number of events are 180 and 241 for the CV death and SCD/MVA respectively. At the current stage of the analysis, we observe an unexpected trend in the reduction of arrhythmic events for patients carrying variant of uncertain significance (p=0.087) that is stronger in males (p=0.049). Our first results support the need for a more complex multi-parametric prognostic model, accounting for the impact of distinct genes. The application of interpretability strategies will enhance the identification of clinically meaningful subgroups of patients.

Can animal studies on rodents help better understand Alzheimer’s disease in humans?

ABSTRACT. Alzheimer’s disease (AD), the major cause of dementia, is a widespread neurodegenerative disease induced by misfolded protein tau. The neurofibrillary degeneration significantly correlates with disease progression, thus better understanding of the tau cascade represents an important target for potential therapeutic strategies for the clinical course of AD.

In this study we investigate and compare the protein expression in choroid plexus tissue from transgenic rodent models expressing human truncated tau with three microtubule-binding repeats (SHR24) and spontaneously hypertensive rats (SHR) used as controls. SHR24 line satisfied several key histological criteria typical for neurofibrillary degeneration in human AD, the rats developed tau pathology in the spinal cord and partially in the brainstem, and in the motor cortex.

Choroid plexus tissue was isolated from 4 to 14 months old rats (with 2 months step). Our data contain normalized abundances of 11 245 proteins in 52 animal samples. Also raw abundance, spectral count and other variables are included. 2209 proteins were identified by 3 or more peptides. In average a protein was identified by 9 peptides.

Data were assessed using multidimenstional statistical methods as PCA, hierarchical clustering and PLS-DA in order to find differences in protein expression between transgenic and contol rats. The R software was used to carry out the statistical analysis.

Acknowledgment: The work was supported (partly) by specific research of Masaryk University as support for student projects (MUNI/A/1615/2020).

References: Zilka N, Z Kazmerova, S Jadhav, P Neradil, A Madari, D Obetkova, O Bugos, M Novak. Who fans the flames of Alzheimer's disease brains? Misfolded tau on the crossroad of neurodegenerative and inflammatory pathways. Journal of Neuroinflammation 9,1 (2012): 1–9. Levarska, L, N Zilka, S Jadhav, P Neradil, M Novak. Of rodents and men: the mysterious interneuronal pilgrimage of misfolded protein tau in Alzheimer's disease. Journal of Alzheimer's Disease 37,3 (2013): 569–577. Novak P, S Katina, et al. Safety and immunogenicity of the tau vaccine AADvac1 in patients with Alzheimer's disease: a randomised, double-blind, placebo-controlled, phase 1 trial. The Lancet Neurology. 2017 Feb 1;16(2):123–34.

Non-linear and non-additive associations between the pregnancy exposome and birthweight

ABSTRACT. Birthweight is an indicator of fetal growth and environmental-related alterations of birthweight have been linked with multiple disorders and conditions progressing into adulthood. Although a few studies have assessed the association between birthweight and the totality of exposure, herein ‘exposome’, in maternal urine and cord blood; no prior research has considered a) the maternal serum prenatal exposome, which is enriched for hormones, and b) non-linear and synergistic associations among exposures. We measured the maternal serum exposome during pregnancy using an untargeted metabolomics approach and birthweight for gestational age (BWGA) z-score in 410 mother-child dyads enrolled in the PRogramming of Intergenerational Stress Mechanisms (PRISM) cohort. We leveraged a Bayesian factor analysis for interaction to select the most important metabolites associated with BWGA z-score and to evaluate their linear, non-linear and non-additive associations. We also assessed the primary biological functions of the identified proteins using the MetaboAnalyst, a centralized repository of curated functional information. We compared our findings with those of a traditional metabolite-wide association study (MWAS) in which metabolites are individually associated with BWGA z-score. Among 1110 metabolites, 46 showed evidence of U-shape associations with BWGA z-score. Most of the identified metabolites (85%) were lipids primarily enriched for pathways central to energy production, immune function, and androgen and estrogen metabolism, which are essential for pregnancy and parturition processes. Metabolites within the same class, i.e. steroids and phospholipids, showed synergistic relationships with each other. Our results support that the aspects of the metabolic exposome during pregnancy contribute linearly, non-linearly and synergistically to variation in newborn birthweight.

Machine learning to support Reinke’s edema diagnosis from voice recordings

ABSTRACT. Voice diseases mean an important disorder that affects people’s communication. One of the most prevalent voice disorders is Reinke’s edema (RE), an inflammatory disease of the vocal folds, which makes the voice to become deep and hoarse. Machine learning techniques fed by acoustic features extracted from voice recordings can be used as a non-invasive and low-cost tool to diagnose RE. Specifically, several feature selection and classification models have been implemented and evaluated to determine how accurate they are to distinguish between healthy and pathological voices, as well as to discover the most relevant features for the detection of RE. Two databases have been considered: i) the commercial database MEEI [1] (53 healthy subjects and 25 RE patients), and ii) our own database collected at the San Pedro de Alcántara Hospital in Cáceres, Spain (30 healthy subjects and 30 RE patients). Voices were recorded following a research protocol based on the phonation of the vowel /a/ in a sustained way. Different acoustic features were extracted from the voice recordings. After that, a feature selection employing the recursive feature elimination technique and several classifiers were applied to the different selected predictors. Ten models were analyzed, being classified classified into ensemble and non-ensemble ones. They include a broad variety of methods, such as decision trees, close neighbors, neural networks, support vector machines, Bayesian classification, regression analysis and linear discriminant analysis. All classification models were validated using a 10-fold stratified cross-validation, this process is replicated a total of 500 times, taking a new partition of the data in each repetition, so the mean and standard deviation of each evaluation metric is calculated. The best performance has been achieved with an ensemble model based on neural networks, obtaining an accuracy of 100% at MEEI database and 95.49% at our own database. The obtained results are competitive with those found in the scientific literature, so that these techniques could be successfully used within a computer aided system to support the diagnosis of Reinke’s edema disease.

The Need for Expanded Standardized Reporting for Machine Learning Methods in Clinical Prediction

ABSTRACT. Reporting of clinical prediction modeling is known to be poor, which hampers risk of bias assessment and validation and ultimately hinders translation of this research into clinical practice. Tools such as TRIPOD (Collins et al., 2015) and PROBAST (Moons et al., 2019) aim to improve reporting and enhance understanding of such studies, however, the current surge in popularity of machine learning (ML) may compound these reporting problems. Although Collins and Moons announced the development of an ML-specific TRIPOD in The Lancet in 2019, researchers in this field need basic guidance today. In this work we aim to: (1) highlight the major reporting deficiencies according to current guidelines, (2) describe the ML-specific analysis details currently being reported, and (3) make suggestions on which current items require communication in greater detail and which additional items should be introduced. This work is based on an ongoing Cochrane Review of clinical prediction models in multiple sclerosis (MS). Of the 58 studies included after screening the initial database search results, 24 utilize ML methods to predict future clinical outcomes in MS. Even though a majority of these ML studies have been published after the introduction of TRIPOD in 2015, unclear reporting of data sources, participants, missing data/loss to follow up, and final models persists. The details reported regarding model development and performance evaluation vary greatly, suggesting that authors may not be aware of the details necessary for study assessment or of best practice regarding data pre-processing, hyperparameter tuning, nested resampling, and clinically relevant evaluation. While current tools provide a strong foundation, ML-based clinical prediction studies require reporting of additional details which are not yet addressed by these tools. Several PROBAST items require re-interpretation for fair assessment of the ML studies. Specific guidance with respect to sample size requirements, data leakage, hyperparameter tuning, optimism-adjusted model performance, and model sharing is necessary. A major challenge in creating such guidance will be to capture all algorithms in use and yet-to-be-created, while also helping the user to assess studies within the context of each specific algorithm.

Virtual biopsy in action: a radiomic-based model for CALI prediction

ABSTRACT. Background & Aims: Chemotherapy-associated liver injuries (CALI) have a major clinical impact, but their non-invasive diagnosis is still an unmet need. The present work aims at elucidating the contribution of radiomic analysis to diagnosis of sinusoidal dilatation, nodular regenerative hyperplasia (NRH) and non-alcoholic steatohepatitis (NASH).

Methods: Patients undergoing liver resection for colorectal metastases after oxaliplatin- or 67 irinotecan-based chemotherapy between January 2018 and February 2020 were retrospectively analyzed. Radiomic features were extracted from a standardized volume of non-tumoral liver parenchyma outlined in the portal phase of preoperative post-chemotherapy computed tomography (CT). Multivariate logistic regression models and CART were applied to identify predictors of CALI and internally validated.

Results: Overall, 78 patients were analyzed. Three fingerprints derived from radiomic features were considered as independent predictors of grade 2-3 sinusoidal dilatation: GLRLM_f3 (OR=12.25), NGLDM_f1 75 (OR=7.77), and GLZLM_f2 (OR=0.53). The combined clinical/radiomic predictive model had 82% accuracy, 64% sensitivity, and 91% specificity (AUC=0.87 vs AUC=0.77 of the model without radiomics). Three radiomic parameters were independent predictors of NRH: conventional_HUQ2 (OR=0.76), GLZLM_f2 (OR=0.05), and GLZLM_f3 (OR=7.97). The combined clinical/radiomic model had 85% accuracy, 81% sensitivity, and 86% specificity (AUC=0.91 vs AUC=0.85 without 80 radiomic features). One radiomic feature was associated with NASH: conventional_HUQ2 (OR=0.79). Steatohepatitis was predicted with 91% accuracy, 86% sensitivity, and 92% specificity (AUC=0.93 vs AUC=0.83 without radiomic features). In the validation set, accuracy was 72%, 83 71%, and 91% for sinusoidal dilatation, NRH, and NASH, respectively.

Conclusions: Radiomic analysis of liver parenchyma may provide a signature that, in combination with clinical and laboratory data, improves diagnosis of CALI.

Creation of adverse drug reactions assessment tool

ABSTRACT. Clinical research problem (context): International health organizations around the world are actively pursuing pharmacovigilance by collecting data on adverse drug reactions (ADRs). ADRs profile and clinical manifestation become most important when choosing best treatment options and considering patient related drug safety decisions. Due to different levels of drug use, peculiarities of medical practice, population age and comorbidities, the epidemiological incidence of ADRs varies. Various scales or classifiers are used for the analysis of ADRs, but there is no universally agreed tool in clinical practice. A detailed analysis of ADRs may contribute to the country's pharmacovigilance and ease the burden on the health care system.

Statistical challenges (objective): We aimed to develop a new ADR’s assessment tool to be used in clinical practice by adapting most common ADR‘s assessment scales. The purpose of this tool is to answer to the clinician whether ADRs are clinically relevant. Thus, there is a need to minimize the number of items for convenience in everyday use with minimal loss of accuracy.

Statistical methods: The study analyzes the documentation of patients for whom ADR was confirmed. Epidemiological data, drug classes and patient risk factors were analyzed. The most commonly described ADR assessment scales or questions of ADR type, causality, predictability, severity and preventability in the literature were selected. Data was evaluated on selected scales by two independent investigators and statistical analysis was performed to assess the inter-rater agreement. Machine learning was used to reduce a set of items and create a combined reduced questionnaire as a new ADRs assessment tool. Data were analyzed using the R software.

Results and Conclusions: The significant variables from the different scales were determined using a machine learning algorithm. This method was used for the creation of a new ADR assessment tool which after validation will be implemented into the health information system and used in clinical practice.

Use of machine learning models combined with innovative interpretation methods to identify prognostic factors

ABSTRACT. OBJECTIVES. The objective of this exploratory analysis is to identify the prognostic factors of the complete pathological response (pCR) post neoadjuvant treatment using Machine Learning methods. This research complements a retrospective national observational study including HER2+ early breast cancer patients receiving trastuzumab-based neoadjuvant and adjuvant therapy in France.

METHODS. Prognostic factors were identified using a 2-step method. The first step consists in using a Support Vector Machine (SVM) model, trained and optimized through grid search with a cross-validation on a learning dataset (75% of patients), then tested on a validation dataset (25% of patients) including the clinical characteristics of the patients as well as the characteristics of the centers. Then the model was applied to the analysable population (n = 301) to calculate the global accuracy, sensitivity and specificity. The second step consists in identifying the variables influencing the prediction of pCR, using three agnostic interpretation methods: SHAP (with SHAPLEY values) to identify the most important variables and the magnitude of their impact on the model, Partial Dependence Plot (PDP) to visualize the overall impact (negative or positive) of each variable on the model, and Local Interpretable Model-agnostic Explanations (LIME) to further explain the prediction of pCR for each patient.

RESULTS. The final SVM model presented an accuracy of 0.65, a sensitivity and a specificity of 0.47 and 0.80 respectively. The SHAP results for this model showed that the variables with the most impact on pCR status are the stage at diagnosis, the T classification, the presence of invaded lymph nodes and the estrogen receptors. Individual results from LIME and PDP approaches tended to reinforce global results from the SHAP method. Results are consistent with knowledge from literature.

CONCLUSION. We showed the interest of new interpretation tools for identifying potential prognostic factors from more complex Machine Learning models. These models allow a consideration of all the variables without a priori which is a strength for a more in-depth analysis of medical data. Other Machine Learning models (K-nearest neighbors, DecisionTree, RandomForest, XGBoost, AdaBoost) are being tested and will be compared to SVM in order to improve predictive quality of the model.