Stijn Vansteelandt

Ghent University & London School of Hygiene and Tropical Medicine

Title: Machine learning for the evaluation of treatment effects: challenges, solutions and improvements

Abstract

The evaluation of treatment effects from observational studies typically requires adjustment for high-dimensional confounding. This is the result of a lack of comparability between treated and untreated subjects in possibly many (pre-treatment) factors that are also related to outcome. While such adjustment is routinely achieved via parametric modelling, it is not entirely satisfactory as model misspecification is likely, and even relatively minor misspecifications over the observed data range may induce large bias in the treatment effect estimate. Over the past 2 decades, there has therefore been growing interest in the use of machine learning methods to assist this task. This is not surprising if one considers the enormous contributions that the machine learning literature has offered on how to predict outcomes based on possibly high-dimensional predictors or features. In this talk, I will therefore focus on the use of machine learning for the evaluation of (causal) treatment effects. This turns out to be a challenging task: while the prediction performance of a given machine learning algorithm can be measured by contrasting observed and predicted outcomes, such evaluation becomes impossible when machine learning is used for treatment effect estimation since the true treatment effect is always unknown. In this talk, I will demonstrate that naive use of existing machine learning algorithms is problematic for treatment evaluation and explain why that is the case. I will next give a gentle introduction to pioneering work on Targeted Learning and on Double Machine Learning, and will discuss improvements that we have made to these techniques. Throughout the talk, machine learning will be considered in the broad sense as any algorithm that uses data to learn a proper model for the data, thus including (though not being limited to) routine variable selection procedures. The talk is based on joint work with Oliver Dukes (Ghent University) and will be accessible to attendees without a detailed understanding of machine learning algorithms.

Stijn Vansteelandt is Professor of Statistics at Ghent University (Belgium) and Professor of Statistical Methodology at the London School of Hygiene and Tropical Medicine (UK). As a causal inference expert, he primarily develops methods for causal machine learning, mediation analysis, time-varying confounding control, and for handling intercurrent events in randomised experiments. He has authored over 150 peer-reviewed publications in international journals on a variety of topics in biostatistics, epidemiology and medicine, such as the analysis of longitudinal and clustered data, missing data, mediation and moderation/interaction, instrumental variables, family-based genetic association studies, analysis of outcome-dependent samples, phylogenetic inference, meta-analysis, post-selection inference and interim analysis. He is currently Associate Editor of the Journal of the Royal Statistical Society (Series B) and has previously served as Co-Editor of Biometrics, the leading flagship journal of the International Biometrics Society, and as Associate Editor for the journals Biometrics, Biostatistics, Epidemiology, Epidemiologic Methods and the Journal of Causal Inference.