The PRS Lab

Methods Development & Applications of Polygenic Risk Scores

Welcome to our lab

Our lab develops statistical and computational approaches to understand how human genetic variation, in combination with the environment, leads to disease. Given their proxy for genetic liability itself, our primary focus is on the theory and application of Polygenic Risk Scores (PRS). In 2015, we published the popular PRS software, PRSice (‘precise’), followed in 2019 by PRSice-2 (PRSice website here). We have run several PRS workshops (eg. our PRS Summer School) and in 2020 we published our Guide to PRS paper, with accompanying PRS tutorial.

We believe that genetic liability to disease is more complex than implied by the additive model of present polygenic risk scores, that the interplay between the genome and the environment in causing disease needs to be better understood, and that analysis of diverse populations across diverse environments can provide the greatest power to understand the causes of disease. 

The research in our lab follows 4 key themes:

1) Pathway-specific, function-informed, polygenic risk scores

2) Polygenic risk scores for diverse and admixed populations

3) Using genetics to infer the environmental causes of disease

4) The Statistical Genetics of Brain Disorders

We need to bring together the fields of statistical genetics, GWAS, functional genomics, population genetics and epidemiology in order to understand how individual genetic profiles combine with the environment to produce human traits and disease – so if you are interested in any of these fields and the research of our lab then please feel free to email Paul (paul.oreilly@mssm.edu) to discuss more or to enquire about our open student and postdoc positions.

 

Contact us

O’Reilly Lab

LocationGenetics Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, NYC

Email: paul.oreilly@mssm.edu 

Initiatives that we’re involved in:

Data Ark: a Mount Sinai Data Commons, which the O’Reilly Lab helped to set sail!

Initiatives that we’re involved in:

CEYE provides research experience for talented NYC high school students from underrepresented backgrounds.

Lab Themes

Pathway-specific, function-informed, PRS

The standard polygenic model, which assumes that everyone lies on a linear spectrum from low to high risk, is an over-simplification given the known heterogeneity of disease and functional sub-structure of the genome. A primary focus in our lab is on developing pathway-specific, function (i.e. multi-omics) informed, polygenic risk scores. We believe that these will better reflect how genetic liability manifests and leads to disease, as compared to standard PRS that only consider genome-wide aggregated risk irrespective of specific genetic profiles. Moreover, we believe that pathway-specific PRS will pave a clearer path towards stratified medicine given their focus on functional variability of individual genetic profiles.

 

 

 

PRS for diverse and admixed populations

PRS are mostly derived from European-ancestry GWAS, making their predictive power lower when computed in individuals of non-European ancestry. This problem has been widely reported in recent years, yet PRS in e.g. recent African-ancestry individuals are still often computed using European GWAS and PRS methods applied to European data. This takes no account for known population genetic factors affecting the data, such as: LD, genetic drift, natural selection and G*E interactions. The clinical utility and aetiological insights provided by PRS may have limited relevance to individuals of recent African and other non-European ancestry unless PRS methods are developed specifically for application to diverse and admixed populations. Moreover, performing research in diverse populations living in diverse locations is an ideal way to better understand the genetic and environmental causes of disease.

Using genetics to infer the environmental causes of disease

Almost all environmental risk factors for disease have a genetic component, which itself must be a genetic component of the disease. This creates a complex interplay between genetic and environmental causes of disease, which needs to be investigated carefully in order to disentangle the effects of each. While genetic variants that are convincingly associated with disease (controlling for pop. structure) have the convenient feature that they must be causal, they may only be causal because of environmental risk factors that they interact with or trigger, and so could be non-causal in other environmental settings or if the environment changes (e.g. due to health policy intervention or social changes). However, because of this, genetics provides an opportunity to test the causality between putative environmental risk factors and disease – but only if the complex network between genetics and the environment (both cellular and society level) is accounted for and accurately modelled.

Paul O’Reilly

I am an Associate Professor in Statistical Genetics at the Icahn School of Medicine at Mount Sinai NYC, having joined in 2019 from King’s College London (previously at Imperial College London, where I did my PhD supervised by David Balding and where I became faculty in 2011). In order to gain insights into how the human genome evolves and gives rise to disease, I have developed Statistical Genetics methods and software in Genetic Epidemiology (eg. multi-trait GWAS: MultiPhen, Polygenic Risk Scores: PRSice) and Population Genetics (eg. detecting selection: Ped/Pop method, simulating inversions: invertFREGENE), while my applied work focuses on the statistical genetics of brain disorders. 

Sam Choi

I am a computational scientist focused on producing methods, algorithms and open source software in statistical genetics. My PhD focused on developing statistical methods for heritability estimation, supervised by Pak Sham at the University of Hong Kong. I joined the O’Reilly lab in 2016 at King’s College London and have since focused on the theory and application of polygenic scores, implementing PRSice-2, one of the popular polygenic score software, and producing our guide to PRS (see below). Since then, I have focused on the development of PRSet, a set based polygenic score software, and have also developed EraSOR, a software to adjust for sample overlap in PRS studies. My current research includes the development of computational algorithms and software to advance risk prediction and thus guide precision medicine. See my Github for software and scripts.

Beatrice Wu

I am a computional scientist focused on the application of statistical genetics methods and software to brain disorders, in particular schizophrenia and Alzheimer’s disease. My PhD focused on identifying genetic risk factors of schizophrenia, supervised by Pak Sham at the University of Hong Kong. I joined the O’Reilly in 2016 at King’s College London and have since focused on the application of polygenic scores to psychiatric disorders and Alzheimer’s disease.

Clive Hoggart

I am a statistician focused on Bayesian approaches and predictive modelling in statistical genetics, having completed a PhD in Bayesian methods in forensic science supervised by Adrian Smith. I have produced numerous methods in statistical genetics,  including methods for admixture mapping (including software ADMIXMAP), novel methodology and software for the joint modelling of all SNPs genome-wide in genetic association studies (HyperLASSO), estimation of missing heritability attributable to allelic heterogeneity and identification of parent-of-origin effects using genetic data from unrelated individuals. Since moving to the Icahn School of Medicine in 2020, my research focus has been on methods to improve the portability of polygenic risk scores across different populations and ancestries.

Conrad Iyegbe

I am a computational biologist with a background in both experimental and epidemiological research in infectious disease and psychiatric genetics. I did my PhD in infectious disease genetics at King’s College London. I am interested in understanding how a person’s environmental experiences and genetics converge to shape disease risk and other long-term outcomes. A particular recent focus has been in the longstanding association between influenza and schizophrenia (e.g. see our paper). I recently joined the O’Reilly lab, where my primary focuses will be the development and application of statistical genetics methods for diverse populations and on the functional aspects of pathway-specific PRS.

Judit García-González

I am a computational biologist with a background in experimental and statistical genetics research. I received my PhD in genetics from Queen Mary University of London, where I investigated the interplay between smoking and genetic vulnerability to psychiatric disorders using animal models and statistical genetics approaches in human cohorts. In late 2020, I joined the O’Reilly lab to develop methods that incorporate functional genomic information to polygenic risk scores, with a focus on psychiatric disorders.

Hawa Diallo

I am a senior high school student from the High School for Math, Science and Engineering at the City College NYC. I am a participant in the Biomedical Research Program that partners my school and the Mount Sinai Center for Excellence in Youth Education (CEYE). I joined the O’Reilly Lab in summer 2020 to investigate the connection between race and genetics and to find out whether race as defined by society corresponds at all to human genetics.

Selected Publications

Investigating the effects of genetic risk of schizophrenia on behavioural traits

Socrates, Maxwell,…, O’Reilly. 2021. NPJ Schizophrenia.

  

Tutorial: a Guide to Performing Polygenic Risk Score Analyses

Choi, Mak, O’Reilly. 2020.  Nature Protocols

PRSice-2: Polygenic Risk Score Software for large-scale data

Choi & O’Reilly. 2019. GigaScience

 

PRSice: Polygenic Risk Score Software

Euesden, Lewis, O’Reilly. 2015. Bioinformatics

 

 

Using genetic data to strengthen causal inference in observational research

Pingualt JB, O’Reilly et al. 2018. Nature Reviews Genetics

  

 

The emerging molecular architecture of schizophrenia, polygenic risk scores and the clinical implications for GxE research

 Iyegbe et al. 2014. Social Psych. and Psych. Research 

Effect of Damaging Rare Mutations in Synapse-Related Gene Sets on Response to Short-term Antipsychotic Medication in Chinese Patients With Schizophrenia: A Randomized Clinical Trial

Wang*, Wu* et al. 2018. JAMA Psychiatry  

Multivariate simulation framework reveals performance of multi-trait GWAS

Porter HF and O’Reilly. 2017. Scientific Reports

 

Novel approach identifies SNPs in SLC2A10 and KCNK9 with evidence for parent-of-origin effect on body mass index

Hoggart et al. 2014. PLoS Genetics

  

Genome-Wide Association Study Identifies Six New Loci Influencing Pulse Pressure and Mean Arterial Pressure

Wain*, Verwoert*, O’Reilly*, Shi*, Johnson* et al. 2011. Nature Genetics 

 

Confounding between recombination and selection, and the Ped/Pop method for detecting selection

O’Reilly, Birney, Balding. 2008. Genome Research

 

InvertFREGENE: Software for Simulating Inversions in Population Genetic Data

O’Reilly, Coin, Hoggart. 2010. Bioinformatics

  

Admixture provides new insights into recombination

O’Reilly & Balding. 2011. Nature Genetics

Fine-scale estimation of location of birth from genome-wide SNP data 

Hoggart*, O’Reilly* et al. 2012. Genetics

 

Simultaneous Analysis of all SNPs in Genome-Wide and re–sequencing association studies

Hoggart, De Iorio, Whittaker, Balding. 2008. PLoS Genetics

Software

 

    

     Methods and software for GWAS and PRS analyses: 

  • PRSice-2: Polygenic Risk Score Software for large-scale data
  • PRSet: Software for calculating and analysing pathway-specific PRS
  • MultiPhen: Method and R package for performing multi-trait GWAS on individual-level genetic data
  • hyperLASSO: Method for performing GWAS on all SNPs genome-wide simultaneously 
  • ADMIXTUREMAP: Method for performing admixture association studies 

 

    

     Methods and software for population genetic simulation and inference:

  • invertFREGENE: Software for simulating inverse polymorphisms in population genetic data.
  • pcLOCATE: Method for inferring location of birth from genetic principal componentsd for inferring location of birth from principal components
  • Ped/Pop: Method for detecting recent positive selection by comparing pedigree and population recombination rates 
  • BAYESFST: Bayesian method and software for detecting selection based on the Fst statistic

    Open Positions

     

    We need to bring together the fields of statistical genetics, GWAS, functional genomics, population genetics and epidemiology in order to understand how individual genetic profiles combine with the environment to produce human traits and disease – so if you are interested in any of the lab themes and the research of our lab then please feel free to email Paul (paul.oreilly@mssm.edu) to discuss more or to enquire about our open student and postdoc positions.