Recently Published | Ma'ayan Laboratory, Computational Systems Biology

These tools and digital resources are from the latest peer-reviewed publications and preprints from the Ma’ayan Lab. These tools and resources feature new computational methods, datasets, visualization and analysis tools produced by the lab’s interdisciplinary team. Each entry highlights our commitment to open science, reproducibility, and collaboration across the biomedical research community.

Browse through the following categories to explore the various original resources we have developed:

Rummagene

Massive Mining of Gene Sets from Supporting Materials of Biomedical Research Publications
Rummagene is a web server application that provides access to hundreds of thousands human and mouse gene sets extracted from supporting materials of publications listed on PubMed Central (PMC). To create Rummagene, we first developed a softbot that extracts human and mouse gene sets from supporting tables of PMC publications. So far, the softbot scanned 6,327,912 PMC articles to find 147,611 articles that contain 793,703 gene sets. These gene sets are served for enrichment analysis, free text and table title search. Users of Rummagene can submit their own gene sets to find matching gene sets ranked by their overlap with the input gene set. In addition to providing the extracted gene sets for search, we investigated the massive corpus of these gene sets for statistical patterns. We show how Rummagene can be used for transcription factor and kinase enrichment analyses, for universal predictions of cell types for single cell RNA-seq data, and for gene function predictions. Finally, by combining gene set similarity with abstract similarity, Rummagene can be used to find surprising relationships between unexpected biological processes, concepts, and named entities.
PMID: 38643247

RummaGEO

Massive Mining of Gene Expression Signatures from the Gene Expression Omnibus (GEO)
RummaGEO is a web server application that enables gene expression signature search against all human and mouse RNA-seq studies deposited into GEO. To enable such a search engine, we performed offline automatic identification of conditions from uniformly aligned GEO studies available from ARCHS4, and then computed differential expression signatures to extract gene sets from these signatures. In total, RummaGEO currently contains 178,975 human and 203,427 mouse gene sets from 30,576 GEO studies. Overall, RummaGEO provides an unprecedented resource for the biomedical research community enabling hypotheses generation for many future studies.
DOI: 10.1016/j.patter.2024.101072

L2S2

LINCS L1000 Signatures Search Engine
As part of the LINCS initiative, 248 cell lines were profiled with the L1000 transcriptomics assay to measure the response to 33,621 small molecules and 7,508 single gene CRISPR knockouts (KOs). From this massive dataset, we computed sets of up- and down-regulated genes. These gene sets are served for search by the LINCS L1000 Signature Search (L2S2) platform. L2S2 provides instant results when searching across over 1.678 million chemical perturbations and CRISPR KO signatures. The platform includes filters for FDA-approved drugs and signature directionality. With the L2S2 search engine, users can identify small molecules and single gene CRISPR KOs that produce gene expression profiles similar or opposite to their submitted gene sets.
PMID: 40308216

Playbook Workflow Builder

Interactive Construction of Bioinformatics Workflows
The Playbook Workflow Builder (PWB) is a web-based platform to dynamically construct and execute bioinformatics workflows by utilizing a growing network of input datasets, semantically annotated API endpoints, and data visualization tools contributed by an ecosystem of collaborators. Via a user-friendly user interface, workflows can be constructed from contributed building-blocks without technical expertise. The output of each step of the workflow is added into reports containing textual descriptions, figures, tables, and references. To construct workflows, users can click on cards that represent each step in a workflow, or construct workflows via a chat interface that is assisted by a large language model (LLM). Completed workflows are compatible with Common Workflow Language (CWL) and can be published as research publications, slideshows, and posters.
PMID: 40179105

lncRNAlyzr

Enrichment Analysis for lncRNA Sets
lncRNAlyzr is a webserver application designed for lncRNAs enrichment analysis. lt has a database containing 33 lncRNA set libraries created by computing correlations between lncRNAs and annotated coding gene sets. After users submit a set of lncRNAs to lncRNAlyzr, the enrichment analysis results are visualized as ball-and-stick subnetworks where nodes are lncRNAs connected to enrichment terms from across selected lncRNA set libraries.
PMID: 40133794

sc2DAT

scRNA-seq 2 Drugs and Targets
Upload a single cell RNA-seq matrix or a bulk RNA-seq matrix with and select a corresponding single cell reference to identify cell-surface targets and LINCS L1000 compounds specific to automatically identified cell types.
PMID: 41079221

ChEA-KG

ChIP-X Enrichment Analysis – Knowledge Graph
Here we present a different approach to reconstruct the human gene regulatory network (GRN). By submitting thousands of gene sets from the RummaGEO resource for transcription factor enrichment analysis with ChEA3, we are able to distill signed and directed edges that connect all human transcription factors to construct a high quality human GRN. The GRN has 130,793 signed and directed edges between 703 source and 1,543 target transcription factors. The network is made accessible via an interactive web-based application called ChEA-KG. ChEA-KG enables users to query the GRN by searching for single or pairs of transcription factors, as well as by submitting gene sets to perform transcription factor enrichment analysis with ChEA3 and then place the enriched transcription factors in context of ChEA-KG..
PMID: 40832173