Enrichment Analysis Tools | Ma'ayan Laboratory, Computational Systems Biology

The Ma’ayan Lab pioneered the development of several enrichment analysis tools. These tools are used by the community to interpret large-scale omics datasets by identifying statistically significant overlaps between nascent gene sets generated experimentally and curated gene sets of pathways, ontologies, transcription factors, diseases, phenotypes, cells and tissues, and other annotations. Our enrichment analysis tools assist researchers to generate hypotheses that uncover molecular mechanisms that explain molecular changes to key biological and pathophysiological processes.

Browse through the following categories to explore the various original resources we have developed:

Enrichr

Gene-List Enrichment Analysis Tool
An integrative web-based and mobile gene-list enrichment analysis tool that includes 225 gene-set libraries, an alternative approach to rank enriched terms, and various interactive visualization approaches to display enrichment results using the JavaScript library Data-Driven Documents (D3). Enrichr is freely available online. The software can also be embedded easily into any tool that performs gene list analysis.
PMID: 23586463
PMID: 27141961
PMID: 33780170

Enrichr-KG

Knowledge Graph Implementation of Enrichr
Enrichr-KG is a knowledge graph database and a web-server application that combines selected gene set libraries from Enrichr for integrative enrichment analysis and visualization. The enrichment results are presented as subgraphs made of nodes and links that connect genes to their enriched terms. In addition, users of Enrichr-KG can add gene-gene links, as well as predicted genes to the subgraphs. This graphical representation of cross-library results with enriched and predicted genes can illuminate hidden associations between genes and annotated enriched terms from across datasets and resources.
PMID: 37166973

ChEA3

ChIP-X Enrichment Analysis Version 3
A transcription factor enrichment analysis tool that ranks TFs associated with user-submitted gene sets. The ChEA3 background database contains a collection of gene set libraries generated from multiple sources including TF–gene co-expression from RNA-seq studies, TF–target associations from ChIP-seq experiments, and TF–gene co-occurrence computed from crowd-submitted gene lists.
PMID: 31114921

ChEA-KG

ChIP-X Enrichment Analysis – Knowledge Graph
Here we present a different approach to reconstruct the human gene regulatory network (GRN). By submitting thousands of gene sets from the RummaGEO resource for transcription factor enrichment analysis with ChEA3, we are able to distill signed and directed edges that connect all human transcription factors to construct a high quality human GRN. The GRN has 130,793 signed and directed edges between 703 source and 1,543 target transcription factors. The network is made accessible via an interactive web-based application called ChEA-KG. ChEA-KG enables users to query the GRN by searching for single or pairs of transcription factors, as well as by submitting gene sets to perform transcription factor enrichment analysis with ChEA3 and then place the enriched transcription factors in context of ChEA-KG..
PMID: 40832173

KEA3

Kinase Enrichment Analysis Version 3
Infers upstream kinases whose putative substrates are overrepresented in a user-inputted list of genes or differentially phosphorylated proteins. The KEA3 database contains putative kinase-substrate interactions collected from publicly available datasets. Gene sets of putative kinase substrates are used as the primary units of analysis in KEA3. These gene sets are organized in gene set “libraries.” Libraries are supersets of kinase substrate sets that are aggregated based on the database from which they are derived.
PMID: 34019655

Rummagene

Massive Mining of Gene Sets from Supporting Materials of Biomedical Research Publications
Rummagene is a web server application that provides access to hundreds of thousands human and mouse gene sets extracted from supporting materials of publications listed on PubMed Central (PMC). To create Rummagene, we first developed a softbot that extracts human and mouse gene sets from supporting tables of PMC publications. So far, the softbot scanned 6,327,912 PMC articles to find 147,611 articles that contain 793,703 gene sets. These gene sets are served for enrichment analysis, free text and table title search. Users of Rummagene can submit their own gene sets to find matching gene sets ranked by their overlap with the input gene set. In addition to providing the extracted gene sets for search, we investigated the massive corpus of these gene sets for statistical patterns. We show how Rummagene can be used for transcription factor and kinase enrichment analyses, for universal predictions of cell types for single cell RNA-seq data, and for gene function predictions. Finally, by combining gene set similarity with abstract similarity, Rummagene can be used to find surprising relationships between unexpected biological processes, concepts, and named entities.
PMID: 38643247

RummaGEO

Massive Mining of Gene Expression Signatures from the Gene Expression Omnibus (GEO)
RummaGEO is a web server application that enables gene expression signature search against all human and mouse RNA-seq studies deposited into GEO. To enable such a search engine, we performed offline automatic identification of conditions from uniformly aligned GEO studies available from ARCHS4, and then computed differential expression signatures to extract gene sets from these signatures. In total, RummaGEO currently contains 178,975 human and 203,427 mouse gene sets from 30,576 GEO studies. Overall, RummaGEO provides an unprecedented resource for the biomedical research community enabling hypotheses generation for many future studies.
DOI: 10.1016/j.patter.2024.101072

blitzGSEA

Efficient Computation of GSEA through Gamma Distribution Approximation
blitzGSEA is an algorithm that is based on the same running sum statistic as GSEA, but instead of performing permutations, blitzGSEA approximates the enrichment score probabilities based on Gamma distributions. blitzGSEA achieves significant improvement in performance compared with prior GSEA implementations, while approximating small P-values more accurately
PMID: 35143610

modEnrichr

A Suite of Gene Set Enrichment Analysis Tools for Model Organisms
An expansion of Enrichr for four model organisms: fish, fly, worm and yeast. The modEnrichr suite of tools provides the ability to convert gene lists across species using an ortholog conversion tool that automatically detects the species.
PMID: 31069376