We run a research program on Statistical Genomics, with a focus on approaches for identifying novel genetic etiologies of rare diseases. Our group develops statistical and computational methods for biological inference from omics data and clinical data on rare disease patients, particularly those with non-malignant hematological or immune disorders. Our group investigates the role of non-coding mutations and of unconventional genetic architectures in the etiologies of unexplained rare diseases. Our research findings will improve the diagnostic and prognostic utility of genome sequencing. The research themes of our group include:

(1) Modeling of intermediate molecular phenotypes: It is now feasible to generate omics data from disease-relevant tissues in hundreds of study participants. Statistical integration of genome sequencing data with measures of molecular mediators of disease (e.g. gene expression) from rare disease cases and controls can empower the identification of rare variants with deleterious consequences.

(2) Integrating polygenic scores in the modeling of high-penetrance rare variants: Some rare diseases are diagnosed and clinically characterised by reference to a quantitative trait that acts as a causal intermediate (or close proxy) for pathology and symptoms. The extremes of a heritable quantitative trait include individuals with an extreme polygenic load and individuals carrying a small number of rare variants with a large effect on the expected value of the trait. Accounting for the polygenic component of risk in rare variant genetic association studies of rare diseases will boost their statistical power.

(3) Identifying non-coding mutations that cause rare diseases: Over 200,000 coding mutations have been causally implicated in a rare disease. The number of causal non-coding mutations, however, is three orders of magnitude smaller, in part because their functional effects are often cell type specific and inaccurately predicted by existing algorithms. The development of computational methods for estimating purifying selection from large genetic datasets and for identifying the locations of regulatory elements in disease-relevant cell types, will accelerate the discovery of novel causes of disease in the non-coding regions of the human genome.

(4) Investigating the genetics of non-malignant blood-related disorders: Unexplained inherited hematological and immune disorders are attractive targets of methodological omics research because blood samples from study participants can be accessed non-invasively. Furthermore, understanding the etiologies of certain types of these diseases (e.g. platelet disorders) may be of relevance to very common medical conditions such as heart attack and stroke. The cheapening of whole-genome sequencing and the recent publication of reference transcriptomic and epigenetic maps of mature blood cells and blood cell progenitors, make it the perfect time for studying the etiologies of hematological disorders. This can be achieved in collaboration with clinical researchers in the USA, the UK (through the 100,000 Genomes project) and beyond.