Research

The Ma’ayan Laboratory applies computational and mathematical methods to study the complexity of regulatory networks in mammalian cells. We apply machine learning and other statistical mining techniques to study how intracellular regulatory systems function as networks to control cellular processes such as differentiation, dedifferentiation, apoptosis and proliferation. We develop software systems to help experimental biologists form novel hypotheses from high-throughput data, while aiming to better understand the structure and function of regulatory networks in mammalian cellular and multi-cellular systems.

NIH-funded Centers

Largest and Most Diverse Collection of Annotated Gene Sets

Gene set enrichment analysis is central to many biological and biomedical projects that measure mRNA and protein expression at the whole-genome scale. Gene set enrichment analysis is typically limited to few literature-base background knowledge libraries such as those created from the Gene Ontology and from pathway databases such as KEGG, WikiPathways, and Reactome. We have demonstrated that enrichment analysis can be expanded to using data from many other biological domains. For developing the tools Enrichr, Enrichr-KG, Rummagene, Rummageo, kinase enrichment analysis (KEA), ChIP-seq enrichment analysis (ChEA), and Harmonizome, we have integrated data from many key biomedical resources into useful gene set libraries. These libraries better inform enrichment analyses from omics studies. So far, over 2 million unique users used these bioinformatics software applications with a current rate of ~4,000 unique users per day.

Original Methods to Identify Differentially Expressed Genes, Perform Gene Set Enrichment Analyses, and Benchmark these Data Analysis Methods

One of the key statistical tests in the fields of transcriptomics is the identification of differentially expressed genes. We developed a multivariate method called the Characteristic Direction to better identify the “correct” differentially expressed genes. The Characteristic Direction method was extended to also perform improved enrichment analysis using a similar concept. Using a unique benchmarking strategy, we can objectively evaluate the Characteristic Direction method and many other leading methods for differential expression and enrichment analyses such as limma, GSEA and DESeq.

Translational Computational Research in Cancer and Kidney Disease

In collaboration with other experimental and computational biology laboratories, we have made great strides in the past several years in studying kidney disease, diabetes, HIV, and cancer. We have developed unique computational methods that led to the identification of potential targets and drugs for attenuating kidney fibrosis, diabetic kidney disease, and HIVAN. Our collaborative work also proposed treatment combinations for early-stage kidney disease intervention. These advances were possible by applying the unique algorithms that we developed which include: Expression2Kinases, SigCom LINCS, and TargetRanger.

Innovative Bioinformatics Software Infrastructure

To lower the barrier of entry for bioinformaticians and to streamline the development of bioinformatics software applications, we developed Appyters. With Appyters bioinformaticians can rapidly develop full-stack web-based bioinformatics applications from their Jupyter Notebook. Currently over 100 Appyters are available from the Appyters Catalog. For a CFDE Partnership project, our team developed the Playbook Workflow Builder, a platform that facilitates the visual dynamic construction of bioinformatics workflows. Along these efforts, we also created FAIRshake, a flexible framework for performing manual and automated evaluation of digital objects for adherence to defined community established standards.