Data Portals | Ma'ayan Laboratory, Computational Systems Biology

The Ma’ayan Lab developed several NIH-funded data and information portals that integrate, standardize, and share diverse biomedical datasets. These portals lower the barrier to data access and promote collaborations across the scientific community.

Browse through the following categories to explore the various original resources we have developed:

Harmonizome

Biological Knowledge Engine
A biological knowledge engine built on top of information about genes and proteins from 114 datasets. To create the Harmonizome, we distilled information from original datasets into attribute tables that define significant associations between genes and attributes, where attributes could be genes, proteins, cell lines, tissues, experimental perturbations, diseases, phenotypes, or drugs, depending on the dataset. Gene and protein identifiers were mapped to NCBI Entrez Gene Symbols and attributes were mapped to appropriate ontologies. We also computed gene-gene and attribute-attribute similarity networks from the attribute tables. These attribute tables and similarity networks can be integrated to perform many types of computational analyses for knowledge discovery and hypothesis generation.
PMID: 27374120
PMID: 39565209

ARCHS4

All RNA-seq and ChIP-seq Signature Search Space
ARCHS4 provides access to gene counts from HiSeq 2000 and HiSeq 2500 platforms for human and mouse experiments from GEO and SRA. The website enables downloading of the data in H5 format for programmatic access as well as a 3-dimensional view of the sample and gene spaces. Search features allow browsing of the data by meta data annotation, ability to submit your own up and down gene sets, and explore matching samples enriched for annotated gene sets. Selected sample sets can be downloaded into a tab separated text file through auto-generated R scripts for further analysis. Reads are aligned with Kallisto using a custom cloud computing platform. Human samples are aligned against the GRCh38 human reference genome, and mouse samples against the GRCm38 mouse reference genome.
PMID: 29636450

CFDE Workbench

Data and Information Portals for the Common Fund Data Ecosystem
The CFDE Workbench provides data and information portals, that enables users to access Common Fund data, query biological entities, and engage with standardized, FAIR, AI-ready resources for biomedical research collaboration.
bioRxiv 2025.02.04.636535

LINCS

Information Portal for the Library of Integrated Network-based Cellular Signatures Consortium
This website serves as the entry point for researchers to access data and information about the LINCS program.
PMID: 29199020

D2H2

Diabetes Data and Hypothesis Hub
D2H2 is a platform that facilitates data-driven hypothesis generation for the diabetes and related metabolic disorder research community. It contains hundreds of high-quality curated transcriptomics datasets relevant to diabetes, accessible via a user-friendly web-based portal. The collected and processed datasets are curated from the Gene Expression Omnibus (GEO). Each curated study has a dedicated page that provides data visualization, differential gene expression analysis, and single-gene queries. To enable the investigation of these curated datasets and to provide easy access to bioinformatics tools that serve gene and gene set-related knowledge, we developed the D2H2 chatbot.
PMID: 38107655

BD2K LINCS DCIC

Data Coordination and Integration Center for LINCS
The BD2K-LINCS Data Coordination and Integration Center is part of the Big Data to Knowledge (BD2K) NIH initiative, and it is the data coordination center for the NIH Common Fund’s Library of Integrated Network-based Cellular Signatures (LINCS) program, which aims to characterize how a variety of human cells, tissues and the entire organism respond to perturbations by drugs and other molecular factors.
Video of Center Overview on YouTube

Datasets2Tools

Repository and Search Engine for Bioinformatics Datasets, Tools and Canned Analyses
Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles.
PMID: 29485625

Adhesome

Literature-based Protein-Protein Interaction Network
A comprehensive literature-derived biochemical network developed in collaboration with Benny Geiger’s Lab. The network is made of known interactions and cellular components composing the focal adhesion complex in mammalian cells. The Adhesome website provides a reference to and supporting materials for the analysis published in Nature Cell Biology.
PMID: 17671451

GEN3VA

Gene Expression and Enrichment Vector Analyzer
A web-based system that enables the integrative analysis of aggregated collections of tagged gene expression signatures identified and extracted from GEO. Each tagged collection of signatures is presented in a report that consists of heatmaps of the differentially expressed genes; principal component analysis of all signatures; enrichment analysis with several gene set libraries across all signatures, which we term enrichment vector analysis; and global mapping of small molecules that are predicted to reverse or mimic each signature in the aggregate.
PMID: 27846806