Location: Erasmus MC, Rotterdam
Room: Collegezaal 4
Time: 13:00 – 17:00
This year’s Spring Meeting of the Dutch Biostatistics Society, features plenary talks of the jury members of the Hans van Houwelingen Award 2018:
and our former BMS-ANed president
For more information on the talks, see below. During this meeting the winner of the Hans van Houwelingen Award 2018 will be announced. More information about the program of June 1st will follow.
The 2018 General Assembly of the BMS (Biometric Section of the Dutch Society for Statistics and Operations Research – VVS) and ANed (Dutch region of the International Biometric Society – IBS) will take place during this Spring meeting. The following documents are available here:
Clelia di Serio
Big data and biomedicine, from reproducibility to causal interpretation.
A statistical challenge
Reproducibility of science nowadays is one of the major challenge for modern biomedical research. Indeed, dealing with big data from OMICS and NGS technologies or data from electronic medical records (EMR) leads to the crucial question: how to translate “big data” in “big information”? Issues like rapid evolution of technologies or low representativeness of sample from big surveys make very often results not comparable with other studies. Literally reproducibility is the ability of an entire experiment or study to be replicated. The definition looks straightforward but it curries lots of implications. First one should consider that replicability is not synonym of repeatability. Repeatability means that an experiment can be repeated under same conditions and under same measurement procedure and using same analytic methods. Replicability has much to do with “generalizability” of results that are also expected to be replicable by independent data, analytical methods, laboratories and technology, thus in presence of different sources of variation both with small samples (basic research) or large sample (epidemiological and clinical studies). Thus, the strict connection with statistical inference paradigm is immediately seen. Since medical studies – both in basic and clinical research – are commonly used to quantify effects of risks factors on simple or complex outcomes that should be of interest for general health and personalized cures, it is easily seen how replication is of critical importance where results can inform substantial policy decisions: However, general consideration about sustainability of research concerning time, expense, and rapid evolution of biomedical technology make very commonly impossible to fully replicate studies. Approaching issues on replicability in a fully statistical perspective may help in introducing corrections in statistical terms through appropriate statistical procedures, thus reducing costs of repeating experiments and studies. In this contribution we will consider statistics as acting in two different main directions: first as a tool to “evaluate replicability” of a study when change in the biomedical techniques could dramatically affect comparability of longitudinal data; second as a method to “induce replicability “. Two examples are shown from gene therapy data and a big observational study on prostate cancer.
Prediction and explanation in studies with rare events: problems and solutions
We discuss problems arising in the analysis of logistic regression models with sparse data sets, considering the case where interest lies in both prediction and effect estimation. The first problem is separation, where perfect separability of the outcomes in a data by the covariates set hampers effect estimation (Mansournia et al, 2018). Another problem is low accuracy despite a high sample size. Several approaches to deal with these problems have been proposed, e.g. methods based on bias reduction such as Firth’s logistic regression, or the use of weakly informative priors in a Bayesian framework. Because of many advantageous properties, Firth’s logistic regression has become a standard approach for the analysis of binary outcomes with small samples. It is implemented in the R package logistf and in many other statistical software systems. Whereas it reduces the bias in maximum likelihood estimates of coefficients, bias towards one-half is introduced in the predicted probabilities. The stronger the imbalance of the outcome, the more severe is the bias in the predicted probabilities. We propose a simple modification of Firth’s logistic regression resulting in unbiased predicted probabilities (Puhr et al, 2017). While this method introduces a little bias in the regression coefficients, this is compensated by a decrease in their mean squared error. We demonstrate the properties of our proposed method in a comparative simulation study, including also other methods, and exemplify its use with real data.
Using Propensity score to adjust for residual confounding in small area studies
Small area ecological studies are commonly used in epidemiology to assess the impact of area level risk factors on health outcomes when data are only available in an aggregated form. However the resulting estimates are often biased due to unmeasured confounders, which typically are not available from the standard administrative registries used for these studies. Extra information on confounders can be provided through external datasets such as surveys or cohorts, where the data are available at the individual level rather than at the area level; however such data typically lack the geographical coverage of administrative registries. We develop a framework of analysis which combines ecological and individual level data from different sources to provide an adjusted estimate of area level risk factors which is less biased. Our method (i) summarises all available individual level confounders into an area level scalar variable, which we call ecological propensity score (EPS), (ii) implements a hierarchical structured approach to impute the values of EPS whenever they are missing, (iii) includes the estimated and imputed EPS into the ecological regression linking the risk factors to the health outcome. Through a simulation study we show that integrating individual level data into small area analyses via EPS is a promising method to reduce the bias intrinsic in ecological studies due to unmeasured confounders; we also apply the method to real case studies in environmental epidemiology.
Jeanine-Houwing Duistermaat and Haiyan Liu
Statistical challenges in functional data analysis
The current increase of the availability of temporal datasets provides many opportunities for methods development. Examples are integration and joint analysis of multiple temporal datasets and modelling of sparse and irregular data from Electronic Health Records (EHR). One of our motivating studies aims to build a prediction tool for disease progress of Scleroderma using data from EHRs. Scleroderma is a rare, clinically heterogeneous multisystem disorder which greatly affects patients’ physical and psychological functioning. Since only 15% of the patients show progress of the disease, prediction of progression is important for clinicians and patients to decide on follow up and treatment strategies. One of the outcomes of progression of the disease is drop in DLCO which is an index of lung function capacity. In our dataset, we have DLCO measurements for 152 patients with 2 to 7 visits over 60 months. DLCO measurements appear to change continually over time, hence they are (sparse) functional data. In addition to the historical DLCO measurements, we have access to measurements for four biomarkers. Our aim is to predict Scleroderma disease progress based on patient’s historical data together with the information of all other patients, and biomarkers. Here the methodological challenges are sparsity and irregularity of the data.
To address these challenges, we propose a functional principal component analysis method and scalar-on-function regression method. The restricted maximum likelihood method is employed to estimate the eigenelements of underlying covariance function and scores are estimated through conditional expectation method. Then the DLCO trajectories are recovered by using the truncated Karhunen-Loeve decomposition based on the estimated eigenelements and scores. Similar FPCA procedure is also applied to predict a patient’s last visit DLCO value by borrowing the information of all the other patients and its own history (with the last visit DLCO value being removed).
We will present our methods, results of the data analysis and discuss future challenges for modelling temporal datasets.
Please register for the event through the form below. The meeting is free for members of the VVS-OR and/or the Dutch Region of the IBS. The meeting costs 10 euros for non-members. We want to point out that PhD students can join ANed for free and senior members for merely 27 euros per year on our membership page.