Ma’ayan Lab – Computational Systems Biology

Computational and Mathematical Methods to Study the Complexity of Regulatory Networks in Mammalian Cells

The Ma’ayan Laboratory applies machine learning and other statistical mining techniques to study how intracellular regulatory systems function as networks to control cellular processes such as differentiation, dedifferentiation, apoptosis and proliferation.

Our research team develops software systems to help experimental biologists form novel hypotheses from high-throughput data, while aiming to better understand the structure and function of regulatory networks in mammalian cellular and multi-cellular systems. Read More


Featured Publication


Rummagene: massive mining of gene sets from supporting materials of biomedical research publications
Many biomedical research publications contain gene sets in their supporting tables, and these sets are currently not available for search and reuse. By crawling PubMed Central, the Rummagene server provides access to hundreds of thousands of such mammalian gene sets. These sets are served for enrichment analysis, free text, and table title search. Investigating statistical patterns within the Rummagene database, we demonstrate that Rummagene can be used for transcription factor and kinase enrichment analyses, and for gene function predictions. By combining gene set similarity with abstract similarity, Rummagene can find surprising relationships between biological processes, concepts, and named entities. Overall, Rummagene brings to surface the ability to search a massive collection of published biomedical datasets that are currently buried and inaccessible. The Rummagene web application is available at rummagene.com

Citation: Clarke DJB, Marino GB, Deng EZ, Xie Z, Evangelista JE, Ma’ayan A. Rummagene: massive mining of gene sets from supporting materials of biomedical research publications. Communications Biology 2024 7(1):482.

Positions Available

The Ma’ayan Laboratory conducts multi-disciplinary NIH funded research that utilizes Big Data analytics to develop better understanding about drug action in human cells, build molecular regulatory networks from high-content genome-wide data, and predict optimized therapeutics for individual patients across several complex diseases.

Bioinformatics Software Engineer

Posted November 2025

A full-time position as a Bioinformatics Software Engineer is available in the Ma’ayan Laboratory of Computational Systems Biology and the Mount Sinai Center for Bioinformatics at the Icahn School of Medicine at Mount Sinai in New York.

What you’ll do:

The successful candidate will collaborate with an interdisciplinary team on developing, implementing, documenting and maintaining web-based software applications used by the larger scientific community. As a member of our team, we would like you to be able to:

    • Work independently to identify and define technical requirements for tasks and timelines
    • Design, build, test, and deploy scalable bioinformatics web-based applications in a cloud environment
    • Develop, document, and maintain version-controlled code
    • Mock, develop, and enhance interactive UI designs
    • Build Docker containers for various bioinformatics workflows
    • Maintain and enhance efficient solutions to reproducible workflow orchestrations on the cloud and local HPC
    • Author and manage technical documentation that concisely describes design and implementation details
    • Manage, publish and maintain code repository (eg. GitHub), container repository (eg. DockerHub)
    • Respond to new feature requests, assist with issues raised by userbase as needed
    • Report project status regularly to the Principal Investigator

What you’ll bring:

    • Bachelor’s or Master’s degree in Computer Science, Informatics, Mathematics, Statistics, Engineering or Biomedical Science
    • Knowledge of open-source bioinformatics tools and workflows
    • Experience working with high performance clusters and cloud technologies
    • Experience developing web-based applications with front ends utilizing frameworks such as React, NodeJS, RShiny, Flask, or Dash
    • Experience with building and orchestrating containers (Docker) using technologies like Kubernetes
    • Extensive experience with Git or other version control systems
    • Experience in more than one programming language such as Python, JavaScript, Java, C/C++, R
    • Working knowledge of relational and non-relational databases
    • Strong communication (written and verbal) and organizational skills

To apply, please e-mail your CV/resume and the contact information of three references to: sherry.jenkins@mssm.edu

Postdoctoral Fellow, Big Data Science and Computational Systems Biology

Posted November 2025

A full-time postdoctoral position is available in the Ma’ayan Laboratory of Computational Systems Biology and the Mount Sinai Center for Bioinformatics at the Icahn School of Medicine at Mount Sinai in New York.

What you’ll do:

The successful candidate will collaborate with an interdisciplinary team to develop tools and algorithms for the analysis, integration, and visualization of large scale biological omics datasets. The datasets include genomics, transcriptomics, epigenomics, proteomics, and metabolomics. In addition, the position involves the application of machine learning, including deep learning, to mining electronic medical records and combining such data with omics datasets.

What you’ll bring:

Candidates are required to have a recent PhD in Biomedical Science, Computer Science, Mathematics, Biostatistics, Statistics, Physics, Engineering, and relevant experience with applications to biology.

    • Experience with machine learning, multithread programming, and cloud computing
    • Experience developing and deploying web-based and mobile apps
    • Experience with bioinformatics research projects
    • Knowledge of Python, R, Java, JavaScript, Node.js, MongoDB, MySQL, Docker

To apply, please e-mail your CV, research statement, and contact information of three references to: sherry.jenkins@mssm.edu

Bioinformatician II

Posted November 2025

A full-time position is available in the Ma’ayan Laboratory of Computational Systems Biology and the Mount Sinai Center for Bioinformatics at the Icahn School of Medicine at Mount Sinai in New York.

What you’ll do:

The successful candidate will collaborate with an interdisciplinary team on projects related to bioinformatics, big data science, and systems biology including developing, implementing, documenting and maintaining web-based software applications used by the larger scientific community. You will work on various aspects of research and infrastructure projects. Your work will include:

  • Developing novel dynamic data visualizations
  • Applying machine learning to identify patterns in large and complex datasets
  • Harmonizing and abstracting data from a variety of sources
  • Developing novel statistical mining strategies and algorithms
  • Developing websites, databases, APIs and other data exchange protocols

What you’ll bring:

  • Master’s degree in Computer Science, Informatics, Mathematics, Statistics, Physics, Engineering or Biological Sciences and a strong interest in working on data-intensive biomedical problems.
  • Experience with machine learning, multithread programming, and cloud computing
  • Experience developing and deploying web-based and mobile apps
  • Experience with bioinformatics research projects
  • Knowledge of Python, R, Java, JavaScript, Node.js, MongoDB, MySQL, Docker
  • Knowledge of molecular and cell biology

To apply, please e-mail your CV, research statement, and contact information of three references to: sherry.jenkins@mssm.edu

Bioinformatician I

Posted November 2025

A full-time position is available in the Ma’ayan Laboratory of Computational Systems Biology and the Mount Sinai Center for Bioinformatics at the Icahn School of Medicine at Mount Sinai in New York.

What you’ll do:

The successful candidate will collaborate with an interdisciplinary team on projects related to bioinformatics, big data science, and systems biology including developing, implementing, documenting and maintaining web-based software applications used by the larger scientific community. You will work on various aspects of research and infrastructure projects. Your work will include:

  • Developing novel dynamic data visualizations
  • Applying machine learning to identify patterns in large and complex datasets
  • Harmonizing and abstracting data from a variety of sources
  • Developing novel statistical mining strategies and algorithms
  • Developing websites, databases, APIs and other data exchange protocols

What you’ll bring:

  • Bachelor’s degree in Computer Science, Informatics, Mathematics, Statistics, Physics, Engineering, or Biological Sciences and a strong interest in working on data-intensive biomedical problems.
  • Experience with machine learning, multithread programming, and cloud computing
  • Experience developing and deploying web-based and mobile apps
  • Experience with bioinformatics research projects
  • Knowledge of Python, R, Java, JavaScript, Node.js, MongoDB, MySQL, Docker
  • Knowledge of molecular and cell biology

To apply, please e-mail your CV, research statement, and contact information of three references to: sherry.jenkins@mssm.edu

Current Graduate Students

Posted July 2025

If you are interested in joining the lab as a graduate student, please email Dr. Ma’ayan at avi.maayan@mssm.edu. The Ma’ayan Laboratory accepts rotation students from all Multidisciplinary Training Areas (MTAs) within the ISMMS Graduate School of Biomedical Sciences.

Prospective Graduate Students

Posted July 2025

Prospective graduate students should apply to one of the programs at the ISMMS Graduate School of Biomedical Sciences.

News

Featured News

Mount Sinai Showcases Innovative Cancer Research at 2025 AACR Annual Meeting in Chicago
Mount Sinai Press Release

Powerful New Software Platform Could Reshape Biomedical Research by Making Data Analysis More Accessible
Mount Sinai Press Release

Mount Sinai Researchers Mined 800,000 Gene Sets by Scanning Supporting Materials of 6.4 Million Research Publications
Mount Sinai Physician’s Channel

Researchers Characterize the Immune Landscape in Cancer
Mount Sinai Press Release

Icahn School of Medicine at Mount Sinai and the University of California San Diego Receive $8.5 Million Award to Establish a Data Integration Hub for NIH Common Fund Supported Program
Mount Sinai Press Release

AI Spotlight: Mapping Out Links Between Drugs and Birth Defects
Research Feature in Mount Sinai Today

Researchers Develop AI Model to Better Predict Which Drugs May Cause Birth Defects
Mount Sinai Press Release

Genes to Potentially Diagnose Long-Term Lyme Disease Identified
Mount Sinai Press Release

Mount Sinai Designated as National Cancer Institute Proteogenomics Data Analysis Center
Mount Sinai Press Release

Mount Sinai Lab Creates Shared Database to Help Scientists Find Drugs That Can Be Used to Treat COVID-19
Mount Sinai Today Newsletter

2020 Presentation Session Featuring the Research Projects of the Summer Fellows
Ma’ayan Lab and Mount Sinai Center for Bioinformatics

FAIR Your Data
Nature Methods

Ten Renowned Mount Sinai Faculty Members Honored at Convocation
Inside Mount Sinai

Mount Sinai Researchers Develop Software to Measure the Findability, Accessibility, Interoperability, and Reusability of Biomedical Digital Research Objects
Mount Sinai Press Release

2020 Summer Research Training Program in Biomedical Big Data Science
Ma’ayan Lab and Mount Sinai Center for Bioinformatics

Smoke Signals – Study Shows Path Linking Nicotine Addiction to Increased Risk for Diabetes
Nature via Twitter

2019 Presentation Session Featuring the Research Projects of the BD2K-LINCS Fellows
BD2K-LINCS Data Coordination and Integration Center

Mount Sinai Researchers Develop Tool that Analyzes Biomedical Data within Minutes
Mount Sinai Press Release

2018 Presentation Session Featuring the Research Projects of the BD2K-LINCS Fellows
BD2K-LINCS Data Coordination and Integration Center

Big Data, Networks Identify Cell Signaling Pathways in Lung Cancer
Medical Press

Mount Sinai Researchers Receive NIH Grant to Develop New Ways to Share and Reuse Research Data
Mount Sinai Press Release

Students Harness Big Data to Help Solve Medical Challenges
ISMMS Fall 2017 Dean’s Report

BD2K Centers Open Doors to Discovery
Biomedical Computation Review

Gene Expression’s Big Rethink
GEN

Crowdsourcing for Scientific Discovery: Mount Sinai Researchers Find Novel Ways to Analyze Data for Drug and Target Discovery
Mount Sinai Press Release

twoXAR Collaborates with Researchers at Mount Sinai to Advance New Medicines for Diabetic Nephropathy
Business Wire

Back on the Road with Coursera
ASBMB Today

Genetics: Big Hopes for Big Data
Nature | Outlook

Center to Seek New Therapeutics by Integrating Gene, Protein Databases
Mount Sinai Press Release

Systems Pharmacology Approaches for Drug and Cancer Research
Podcast

Society of Toxicology 2013 Annual Meeting
Drug Discovery News

New Computational Method to Help Organize Scientific Data
News-Medical.net

Mount Sinai Researchers Develop New Computational Method to Find Novel Connections from Gene to Gene, Drug to Drug and Between Scientists
Science Daily

Mutations in 3 Genes Linked to Autism Spectrum Disorders
Newswise.com

HIPK2 Regulator Protein Plays a Crucial Role in Kidney Fibrosis
News-Medical.net

Researchers Discover Drug Target for Kidney Failure
Mount Sinai Press Release

Recovering Protein-Protein and Domain-Domain Interactions from Aggregation of IP-MS Proteomics of Coregulator Complexes
Mount Sinai Press Release

Expression2Kinases: mRNA Profiling Linked to Multiple Upstream Regulatory Layers
Mount Sinai Press Release

Mount Sinai Researchers Develop New Computational Method to Aid Analysis of Gene Expression Experiments
Mount Sinai Press Release

Systematic Tracking of Cell Fate Changes
Nature Biotechnology

Courses

maayan_contact1.fw

Big Data MOOCs on Coursera

Avi Ma’ayan PhD is the course director for two massive open online courses (MOOCs) on the Coursera platform. As of March 2023, we have over 267,600 unique visitors and a combined total of over 25,000 students enrolled for these two MOOCs.

Big Data Science with the BD2K-LINCS Data Coordination and Integration Center
The BD2K-LINCS Data Coordination and Integration Center (DCIC) was commissioned to organize, analyze, visualize and integrate LINCS data with other publicly available relevant resources. In this course, we introduce the various Centers that collect data for LINCS, describing the experimental data procedures and the various data types. We will then cover the design and collection of metadata and how metadata is linked to ontologies. Additionally, basic data processing and data normalization methods to clean and harmonize LINCS data will be presented. This will follow a discussion about how the data is served as RESTful APIs and JSON, and for this we will cover concepts from client-server computing. Most importantly, the course will focus on various bioinformatics methods of analysis including: unsupervised clustering, gene-set enrichment analyses, Bayesian integration, network visualization, and supervised machine learning applications to LINCS data and other relevant Big Data from molecular biomedicine.

Network Analysis in Systems Biology
An introduction to data integration and statistical methods used in contemporary Systems Biology, Bioinformatics and Systems Pharmacology research. The course covers methods to process raw data from genome-wide mRNA expression studies (microarrays and RNA-seq) including data normalization, differential expression, clustering, enrichment analysis and network construction. The course contains practical tutorials for using tools and setting up pipelines, but it also covers the mathematics behind the methods applied within the tools. The course is mostly appropriate for beginning graduate students and advanced undergraduates majoring in fields such as biology, math, physics, chemistry, computer science, biomedical and electrical engineering. The course should be useful for researchers who encounter large datasets in their own research. The course presents software, apps and tools developed by the Ma’ayan Laboratory, but also other freely available data analysis and visualization tools. The ultimate aim of the course is to enable participants to utilize the methods presented in this course for analyzing their own data for their own projects. For those participants who do not work in the field, the course introduces the current research challenges faced in the field of computational systems biology.

Big Data Courses at the Icahn School of Medicine at Mount Sinai

Avi Ma’ayan PhD is the course director for two graduate courses at the Icahn School of Medicine at Mount Sinai. The courses are delivered once in the Fall and once in the Spring. The Fall course is focused on data mining and the Spring course on computer programming.

BSR 6806: Programming for Big Data Biomedicine
The course covers computational methodologies applied to analyze data in the broad fields of bioinformatics and big data science. Topics covered include RNA-seq and proteomics data analysis, Machine Learning, Deep Learning, Text Mining, Python and Jupyter Notebooks, Appyters, cloud computing, data visualization, network analysis, version control, and Knowledge Graphs. Students are required to complete small programming assignments throughout the course. The course uses Jupyter Notebooks and Appyters to run most tutorials. [YouTube playlist of course lectures: Spring 2024, Spring 2023]

Data Mining and Network Analysis
This course covers methods that include machine learning applications in systems biology including unsupervised clustering and supervised learning; analysis of the topology of biological regulatory networks; and a survey of how these approaches are applied to study biological molecular networks; papers that combine computational predictions with experimental validation are highlighted; and use of software tools to analyze proteomics and genomics collected by the LINCS experimental expression data.

Research

The Ma’ayan Laboratory applies computational and mathematical methods to study the complexity of regulatory networks in mammalian cells. We apply machine learning and other statistical mining techniques to study how intracellular regulatory systems function as networks to control cellular processes such as differentiation, dedifferentiation, apoptosis and proliferation. We develop software systems to help experimental biologists form novel hypotheses from high-throughput data, while aiming to better understand the structure and function of regulatory networks in mammalian cellular and multi-cellular systems.

NIH-funded Centers

Largest and Most Diverse Collection of Annotated Gene Sets

Gene set enrichment analysis is central to many biological and biomedical projects that measure mRNA and protein expression at the whole-genome scale. Gene set enrichment analysis is typically limited to few literature-base background knowledge libraries such as those created from the Gene Ontology and from pathway databases such as KEGG, WikiPathways, and Reactome. We have demonstrated that enrichment analysis can be expanded to using data from many other biological domains. For developing the tools Enrichr, Enrichr-KG, Rummagene, Rummageo, kinase enrichment analysis (KEA), ChIP-seq enrichment analysis (ChEA), and Harmonizome, we have integrated data from many key biomedical resources into useful gene set libraries. These libraries better inform enrichment analyses from omics studies. So far, over 2 million unique users used these bioinformatics software applications with a current rate of ~4,000 unique users per day.

Original Methods to Identify Differentially Expressed Genes, Perform Gene Set Enrichment Analyses, and Benchmark these Data Analysis Methods

One of the key statistical tests in the fields of transcriptomics is the identification of differentially expressed genes. We developed a multivariate method called the Characteristic Direction to better identify the “correct” differentially expressed genes. The Characteristic Direction method was extended to also perform improved enrichment analysis using a similar concept. Using a unique benchmarking strategy, we can objectively evaluate the Characteristic Direction method and many other leading methods for differential expression and enrichment analyses such as limma, GSEA and DESeq.

Translational Computational Research in Cancer and Kidney Disease

In collaboration with other experimental and computational biology laboratories, we have made great strides in the past several years in studying kidney disease, diabetes, HIV, and cancer. We have developed unique computational methods that led to the identification of potential targets and drugs for attenuating kidney fibrosis, diabetic kidney disease, and HIVAN. Our collaborative work also proposed treatment combinations for early-stage kidney disease intervention. These advances were possible by applying the unique algorithms that we developed which include: Expression2Kinases, SigCom LINCS, and TargetRanger.

Innovative Bioinformatics Software Infrastructure

To lower the barrier of entry for bioinformaticians and to streamline the development of bioinformatics software applications, we developed Appyters. With Appyters bioinformaticians can rapidly develop full-stack web-based bioinformatics applications from their Jupyter Notebook. Currently over 100 Appyters are available from the Appyters Catalog. For a CFDE Partnership project, our team developed the Playbook Workflow Builder, a platform that facilitates the visual dynamic construction of bioinformatics workflows. Along these efforts, we also created FAIRshake, a flexible framework for performing manual and automated evaluation of digital objects for adherence to defined community established standards.