The Ma’ayan Laboratory applies computational and mathematical methods to study the complexity of regulatory networks in mammalian cells. We apply machine learning and other statistical mining techniques to study how intracellular regulatory systems function as networks to control cellular processes such as differentiation, dedifferentiation, apoptosis and proliferation. We develop software systems to help experimental biologists form novel hypotheses from high-throughput data, while aiming to better understand the structure and function of regulatory networks in mammalian cellular and multi-cellular systems.
We lead two NIH funded Centers: the BD2K-LINCS Data Coordination and Integration Center (DCIC), and the Knowledge Management Center for the Illuminating the Druggable Genome.
Assemble the Largest Novel Collection of Gene Set Libraries
Gene Set Enrichment Analysis and Gene Ontology analysis are central to all biological investigations that measure gene and protein expression at the global scale. Such analyses were limited until recently to pathways enrichment and/or gene ontology enrichment. We showed that enrichment analysis can be expanded to using data from many biological domains. By developing the tools: Kinase Enrichment Analysis (KEA), ChIP-X Enrichment Analysis (ChEA), Lists2Networks and Enrichr, we demonstrated that many resources can be converted to useful gene set libraries and these can better inform analyses from genome-wide expression studies. So far, over 100,000 unique users have utilized the enrichment analyses software tools we developed.
Develop Novel Methods to Identify Differentially Expressed Genes, Perform Gene Set Enrichment Analysis and Setup Benchmarks for Such Methods
One of the key statistical tests in the fields of systems biology and genomics sciences is the identification of differentially expressed genes and performing gene set enrichment analyses to identify biological themes from gene expression data. We develop multivariate methods to better identify the more “correct” differentially expressed genes from genomics studies, and to better perform enrichment analysis. Using a novel benchmarking strategy that we developed, we show we can fairly compare methods such as: limma, SAM and DESeq for differential expression, and GSEA for gene set enrichment analysis to evaluate these methods’ performance.
Understand the Structure and Dynamics of Cellular Regulatory Networks
Analysis of the gene sets and networks we have collected and analyzed can uncover design principles and modules that make up the complex organizational structure of mammalian cells. Analysis of Big Data in the field also points to the existence of various experimental biases and computational limitations. We aim to develop new theories and new algorithms to better extract knowledge from such complex data. Many of the theoretical observations we extracted from the topologies of biological networks and gene sets are manifestations of general design principles observed in many complex systems, not just in biological networks, and we are interested in understanding how such principles emerge and are related.