21 May 2021
# Graphical Models and their Applications in Genomics

This spring the BMS-ANed organises an online meeting in honour of the Hans van Houwelingen Award 2020:

Date: 21 May 2021

Time: 14:00-16:05 (CET)

Location: online

Registration is required and is through the form below

*see below for the titles and affiliations of the speakers*

14:00 | 14:05 | Opening by Mark van de Wiel (president of BMS-ANed) |

14:05 | 14:30 | Pariya Behrouzi |

14:30 | 15:00 | Marco Scutari |

15:00 | 15:05 | Break |

15:05 | 15:35 | Marloes Maathuis |

15:35 | 16:05 | Sach Mukherjee |

16:05 | 16:10 | Closing |

**Pariya BehrouziĀ **– WUR Wageningen

*Detecting epistatic selection with partially observed genotype data using copula graphical models*

In this talk, I address several problems related to modeling complex systems. Fields such as genetics and genomics often involve large-scale models in which thousands of components are linked in complex ways. What is perhaps most distinctive about the graphical model approach is its suitability in formulating probabilistic models of complex phenomena, while maintaining control over the computational cost associated with these models. In real world, not all datasets are continuous. The ordinal data or mixed ordinal-and-continuous data routinely arise in many fields. IĀ will introduce a copula graphical model for reconstructing a conditional independence network for such data. As a motivating example, I focus on detecting loci ā locations on a genome ā in A.thaliana that do not segregate independently condition on other loci, thus leading to various plant disorders.

**Marco Scutari **– SUPSI, Switzerland

*Mapping complex data with Bayesian networks*

Bayesian networks are an important model in machine learning due to their flexibility and intuitive graphical representation. They have been adapted to handle several types of data with structures that are more complex than the complete, discrete data they were originally defined on. In this talk we will discuss how to learn and apply them to incomplete data, time series, and collections of related data sets.

**Marloes Maathuis **– ETH, Switzerland

*False Discovery Rate Control for Gaussian Graphical Models*

We propose a method to control the finite sample false discovery rate (FDR) when learning the structure of a Gaussian graphical model. Our method builds on the recently proposed knockoff idea of Barber and Candes for linear models. We extend their approach to the graphical model setting by using a local (node-based) and a global (graph-based) step: we construct knockoffs and feature statistics for each node locally, and then solve a global optimization problem to determine the threshold for each node. We then estimate the neighborhood of each node, by comparing its feature statistics to its threshold, resulting in our graph estimate. We establish finite sample FDR control of this procedure. Our proposed method is very flexible, in the sense that one has a lot of freedom in the choice of the feature statistics, the optimization problem and the way in which the final graph estimate is obtained. For any given data set, it is not clear a priori what choices of these hyperparameters are optimal. We therefore use a sample-splitting-recycling procedure that first uses half of the sample to select the hyperparameters, and then learns the graph using all samples, in such a way that the finite sample FDR control still holds. Finally, we compare our method to the state-of-the-art in simulations and on a real data set.

**Sach Mukherjee **– DZNE, Germany and University of Cambridge, UK

*Graphical models for heterogeneous data*

Contemporary large-scale biomedical data are often heterogeneous, in the sense of spanning multiple, possibly latent, groups (such as disease subtypes, population strata etc.), which may be statistically non-identical.Ā I will discuss the estimation of graphical models in this setting, with an emphasis on high-dimensional data. I will discuss also connections to clustering and regression modelling, showing how a combination of suitably defined mixtures, data reduction and regularised estimation can allow effective analysis in very high dimensions, even when mean signals (i.e. cluster-like mean shifts between latent groups) are weak.