Alumni

 

Former Ma’ayan Lab PhD students, postdoctoral fellows, bioinformaticians, and computational systems biologists have authored or co-authored publications in leading journals, including Nature, Science, PNAS, Nature Genetics, Nature Medicine, Bioinformatics and Science Signaling. The Ma’ayan Lab alumni have gone to leadership positions in industry and have also been recruited to academic positions. Several of our bioinformatics and computational systems biology research trainees have entered PhD programs at top tier institutions, including Princeton, MIT, Columbia, Duke and Cornell.

PhD Students

wang_headshot
Zichen Wang, PhD
Training Period in Lab: 2013-2016
Thesis: Methods for Collecting, Improving and Exploring Mammalian Gene Expression Signatures
Current Position: Assistant Professor, Icahn School of Medicine at Mount Sinai


duan_headshot
Qiaonan Duan, PhD
Training Period in Lab: 2012-2016
Thesis: Harnessing the LINCS L1000 Data for Drug Discovery and Drug Response Analysis
Current Position: Computational Biology Scientist, NuMedii


yan_kou.fw
Yan Kou, PhD
Training Period in Lab: 2011-2015
Thesis: Big Data Analytics for Understanding Mechanism of Human Disease
Current Position: Program Director, Insight Data Science


h_xu.fw
Huilei Xu, PhD
Training Period in Lab: 2009-2012
Thesis: Analysis of Transcriptional Networks in mESCs
Current Position: Senior Bioinformatics Scientist, Genocea Biosciences


Postdoctoral Fellows


Nicolas Fernandez, PhD
Training Period in Lab: 2013-2017
Current Position: Computational Scientist, Icahn School of Medicine at Mount Sinai

rouillard_headshot
Andrew Rouillard, PhD
Training Period in Lab: 2013-2016
Summary of Research: Inferring Cell Signaling Pathways from LINCS
Current Position: Bioinformatics Scientist, GlaxoSmithKline


clark_headshot
Neil Clark, PhD
Training Period in Lab: 2011-2015
Summary of Research: Statistical Methods for Network Analysis in Systems Biology
Current Position: Research Fellow, University of Edinburgh (UK)


b_macarthur.fw
Ben MacArthur, PhD
Training Period in Lab: 2008-2011
Summary of Research: Systems Biology of mESCs
Current Position: Associate Professor, University of Southampton (UK)


a_mazloom.fw
Amin Mazloom, PhD
Training Period in Lab: 2008-2011
Summary of Research: Network Analysis of NURSA HT IP-MS Proteomics
Current Position: Vice President, Bioinformatics and Software, ProdermIQ, Inc.


Bioinformaticians and Software / Database Developers


Shai Magidi, MEng
Training Period in Lab: 2016-2017
Current Position: PhD Student, Tulane University



Anders Dohlman, BA
Training Period in Lab: 2015-2017
Current Position: PhD Student, Duke University


gundersen_headshot
Gregory Gundersen, MA
Training Period in Lab: 2014-2016
Current Position: PhD Student, Princeton University


mm.fw
Michael McDermott, BS
Training Period in Lab: 2014-2016
Current Position: Software Engineer, Lifion


m_jones.fw
Matthew Jones, BS
Training Period in Lab: 2014-2015
Current Position: Software Engineer, Orion Health (Auckland, New Zealand)


ed_chen.fw
Edward Chen, MS
Training Period in Lab: 2010-2013
Summary of Research: Enrichr: Interactive and Collaborative HTML5 Gene List Enrichment Analysis Tool
Current Position: Software Engineer, Spotify


ruth_d.fw
Ruth Dannenfelser, MSE
Training Period in Lab: 2010-2012
Summary of Research: Software Tools for Systems Biology
Current Position: PhD Student, Princeton University


simon_g.fw
Simon Gordonov, MS
Training Period in Lab: 2011-2012
Summary of Research: Software Tools for Systems Biology
Current Position: PhD Student, MIT


chris_tan.fw
Christopher Tan, BS
Training Period in Lab: 2011-2012
Summary of Research: Software Tools for Systems Biology
Current Position: Medical Student, Saint Louis University School of Medicine


caroline_baroukh.fw
Caroline Baroukh, PhD
Training Period in Lab: 2010-2011
Summary of Research: Embryonic Stem Cells Atlas of Pluripotency Evidence (ESCAPE) Database
Current Position: Postdoctoral Fellow, INRIA (France)


Undergraduate and Post-bac Research Trainees


Patrycja Krawczuk
Training Period in Lab: Summer 2017
Summary of Research: Automated Indexing of Bioinformatics Tools
Current Position: Undergraduate Student (Mathematics and Computer Science), Hunter College



Marina Latif
Training Period in Lab: Summer 2017
Summary of Research: Mining the Human Kinome
Current Position: Undergraduate Student (Ecology and Evolutionary Biology, Pre-Med), Princeton University



Joyce (Hyojin) Lee
Training Period in Lab: Summer 2017
Summary of Research: Predicting Gene Function and PPIs with Co-Expression Data
Current Position: Undergraduate Student (Mathematics), Princeton University



Ariel Leong
Training Period in Lab: Summer 2017
Summary of Research: ChEA3 – Improving Transcription Factor Enrichment Analysis by Data Integration
Current Position: Undergraduate Student (Biomedical Computation), Stanford University



Damon Pham
Training Period in Lab: Summer 2017
Summary of Research: Developing Novel Gene Set Enrichment Analysis Algorithms
Current Position: Undergraduate Student (Statistics), Indiana University



Christopher Tseng
Training Period in Lab: Summer 2017
Summary of Research: Analysis and Visualization of MCF10A Data for the LINCS Common Project
Current Position: Undergraduate Student (Biology and Computer Science), Emory University



Charlotte Zuber
Training Period in Lab: Summer 2017
Summary of Research: Visualization of the Space of 200,000 Annotated Gene Sets
Current Position: Undergraduate Student (Physics and Computer Science), Rutgers University


troy_goff.fw
Troy Goff, BS
Training Period in Lab: 2015-2017
Current Position: Master’s Student (Data Science), Columbia University


esther_chen.fw
Esther Chen
Training Period in Lab: Summer 2016
Summary of Research: Cite-D-Lite: Chrome Extension for Data and Paper Citations with Text Importance Highlighting
Current Position: Undergraduate Student (Biomedical Engineering), Cornell University



Axel Feldmann
Training Period in Lab: Summer 2016
Summary of Research: X2K-Web: an Updated Web-based Version of the Expression2Kinases Pipeline
Current Position: Major: Undergraduate Student (Computer Science), Carnegie Mellon University


clarke.fw
Daniel Clarke
Training Period in Lab: Summer 2016
Summary of Research: Project 1) Adhesome 2016: An Updated Adhesome Site with Predictions of New Member Components; Project 2) Genes2WordCloud: A Biology Oriented Interactive Word Cloud Generator
Current Position: Bioinformatician II, Ma’ayan Laboratory, Icahn School of Medicine at Mount Sinai


lin.fw
Jennifer Lin
Training Period in Lab: Summer 2016
Summary of Research: Predicting Potential Drugs for Diabetic Nephropathy using L1000 Data
Current Position: High School Student, Oceanside High School


katie_lin.fw
Katie Lin, BS, MS
Training Period in Lab: Summer 2016
Summary of Research: Visualization of the Multi-Layered Data from the LINCS MCF10A Dense Cube Project
Current Position: Graduate Student (Computer Science), Columbia University


malyack.fw
Colette Malyack, BS, MS
Training Period in Lab: Summer 2016
Summary of Research: Predicting Experimental Platforms by Examining Gene-Set Content
Current Position: PhD Student (Data Science), New Jersey Institute of Technology


sani.fw
Kevin Sani
Training Period in Lab: Summer 2016
Major:
Summary of Research: Dr. Gene Budger: Web App to Predict Drugs to Modulate the Expression of a Specific Gene
Current Position: Undergraduate Student (Chemistry/Economics), Harvard University


b_kaplan.fw
Benjamin Kaplan
Training Period in Lab: Summer 2015
Summary of Research: Integrative Analysis and Visualization of Gene Expression Signatures toward the Repurposing of FDA Approved Drugs as Antiviral Medications
Current Position: Undergraduate Student (Computer Science), Carnegie Mellon University


emily_k.fw
Emily Kuang
Training Period in Lab: Summer 2015
Summary of Research: Assessing the Dimensionality of MCF7 Cells Response to Perturbations
Current Position: Undergraduate Student (Biomedical Informatics), New York City College of Technology


azu_lee.fw
Azu Lee, MS
Training Period in Lab: Summer 2015
Summary of Research: Interactive Mobile App Game for Deconvolution of Gene Set Modules from Gene Set Enrichment Analyses
Current Position: Software Development Engineer II, Amazon


aditi.fw
Aditi Dandapani, BS
Training Period in Lab: 2009-2010
Summary of Research: Dynamical Model of Viral DI Particles
Current Position: PhD Student (Applied Mathematics), Columbia University


komosinski.fw
Michael Komosinski, BS
Training Period in Lab: Summer 2011
Summary of Research: Integrating, Predicting, and Visualizing Mammalian Protein-Protein Interaction Networks
Current Position: Software Developer, Google


john_z.fw
John Zhuang, BS
Training Period in Lab: 2010
Summary of Research: Regulatory Network Created from Loss of Function and Gain of Function Studies of Mouse Embryonic Stem Cells
Current Position: MD/PhD Student, Cornell


mariola_s.fw
Mariola Szenk, BS
Training Period in Lab: 2009
Summary of Research: PathwayGenerator2: Automated Visualization of Signaling Pathways using Flash and ActionScript 3
Current Position: PhD Student, Stony Brook University


Visiting High School Students

mounica.fw
Mounica Kamesam
Training Period in Lab: Summer 2014 and 2015
Summary of Research: Automated Data Integration and Data Mining to Improve Breast Cancer Classification
Current Position: Undergraduate Student (Computer Science), Northeastern University


axel.fw
Axel Feldmann
Training Period in Lab: Summer 2014 and 2016
Summary of Research: Hepatocellular Carcinoma Patient Classification with Enrichment Vectors
Current Position: Undergraduate Student (Computer Science), Carnegie Mellon University


jay.fw
Jayanath Krishnan
Training Period in Lab: Summer 2010
Summary of Research: Regulatory Signatures of Cancer Cell Lines Inferred from Expression Data
Current Position: Undergraduate (Biology), Yale University


Visiting Medical Students

barash.fw
Alexander Barash, MD
Training Period in Lab: 2007-2008
Summary of Research: Systems Pharmacology
Current Position: Vitreoretinal Fellow, New York Eye and Ear Infirmary of Mount Sinai


r_webb.fw
Ryan Logan Webb, MD
Training Period in Lab: 2007-2008
Summary of Research: Software Tools for Systems Biology
Current Position: Radiology Resident, Staten Island University Hospital

Positions Available

Summer Research Training Program in Biomedical Big Data Science

Posted March 2018

Our BD2K-LINCS DCIC Summer Research Training Program in Biomedical Big Data Science is a research intensive ten-week training program for undergraduate and graduate students. Summer fellows training in the Ma’ayan Laboratory conduct faculty-mentored independent research projects in the following areas: dynamic data visualization, machine learning and data harmonization.

Data Scientist – Bioinformatics Web Developer II

Posted March 2018

A full-time position is available in the Ma’ayan Laboratory of Computational Systems Biology and the BD2K-LINCS Data Coordination and Integration Center at the Icahn School of Medicine at Mount Sinai in New York. The Ma’ayan Laboratory conducts multi-disciplinary NIH-funded research that utilizes big data analytics to develop better understanding about drug action in human cells, build molecular regulatory networks from high-content genome-wide data, and predict optimized therapeutics for individual patients across several complex diseases.

What you’ll do:

The successful candidate will collaborate with an interdisciplinary team on projects related to bioinformatics, big data science, and systems biology including developing, implementing, documenting and maintaining web-based software applications used by the larger scientific community. You will work on various aspects of research and infrastructure projects. Your work will include:

  • Developing novel dynamic data visualizations
  • Applying machine learning to identify patterns in large and complex datasets
  • Harmonizing and abstracting data from a variety of sources
  • Developing novel statistical mining strategies and algorithms
  • Developing websites, databases, APIs and other data exchange protocols

What you’ll bring:

  • Master’s or PhD (PhD preferred) in Computer Science, Informatics, Mathematics, Statistics, Physics, Engineering or Biological Sciences and a strong interest in working on data-intensive biomedical problems.
  • Experience with machine learning, multithread programming, and cloud computing
  • Experience developing and deploying web-based and mobile apps
  • Experience with bioinformatics research projects
  • Knowledge of Python, R, Java, JavaScript, Node.js, MongoDB, MySQL, Docker

To apply, please e-mail your CV, research statement, and the names and contact information of three references to: sherry.jenkins@mssm.edu

Data Scientist – Bioinformatics Web Developer I

Posted March 2018

A full-time position is available in the Ma’ayan Laboratory of Computational Systems Biology and the BD2K-LINCS Data Coordination and Integration Center at the Icahn School of Medicine at Mount Sinai in New York. The Ma’ayan Laboratory conducts multi-disciplinary NIH-funded research that utilizes big data analytics to develop better understanding about drug action in human cells, build molecular regulatory networks from high-content genome-wide data, and predict optimized therapeutics for individual patients across several complex diseases.

What you’ll do:

The successful candidate will collaborate with an interdisciplinary team on projects related to bioinformatics, big data science, and systems biology including developing, implementing, documenting and maintaining web-based software applications used by the larger scientific community. You will work on various aspects of research and infrastructure projects. Your work will include:

  • Developing novel dynamic data visualizations
  • Applying machine learning to identify patterns in large and complex datasets
  • Harmonizing and abstracting data from a variety of sources
  • Developing novel statistical mining strategies and algorithms
  • Developing websites, databases, APIs and other data exchange protocols

What you’ll bring:

  • Bachelor’s degree in Computer Science, Informatics, Mathematics, Statistics, Physics, Engineering, or Biological Sciences and a strong interest in working on data-intensive biomedical problems.
  • Experience with machine learning, multithread programming, and cloud computing
  • Experience developing and deploying web-based and mobile apps
  • Experience with bioinformatics research projects
  • Knowledge of Python, R, Java, JavaScript, Node.js, MongoDB, MySQL, Docker
  • Knowledge of molecular and cell biology

To apply, please e-mail your CV, research statement, and the names and contact information of three references to: sherry.jenkins@mssm.edu

Current Graduate Students

Posted March 2018

If you are interested in joining the lab as a graduate student, please email Dr. Ma’ayan at avi.maayan@mssm.edu. The Ma’ayan Laboratory accepts rotation students from all Multidisciplinary Training Areas (MTAs) within the ISMMS Graduate School of Biomedical Sciences.

Prospective Graduate Students

Posted October 2017

Prospective graduate students should apply to one of the programs at the ISMMS Graduate School of Biomedical Sciences.

Mount Sinai Health System is an equal opportunity/affirmative action employer. We recognize the power and importance of a diverse employee population and strongly encourage applicants with various experiences and backgrounds. Mount Sinai Health System – An EEO/AA-D/V Employer

Courses

maayan_contact1.fw

Big Data MOOCs on Coursera

Avi Ma’ayan PhD is the course director for two massive open online courses (MOOCs) on the Coursera platform. As of March 2016, over 33,000 students registered for these two courses and 195,000 video lectures were viewed.

Big Data Science with the BD2K-LINCS Data Coordination and Integration Center
The BD2K-LINCS Data Coordination and Integration Center (DCIC) is commissioned to organize, analyze, visualize and integrate LINCS data with other publicly available relevant resources. In this course we will introduce the various Centers that collect data for LINCS, describing the experimental data procedures and the various data types. We will then cover the design and collection of metadata and how metadata is linked to ontologies. Additionally, basic data processing and data normalization methods to clean and harmonize LINCS data will be presented. This will follow a discussion about how the data is served as RESTful APIs and JSON, and for this we will cover concepts from client-server computing. Most importantly, the course will focus on various bioinformatics methods of analysis including: unsupervised clustering, gene-set enrichment analyses, Bayesian integration, network visualization, and supervised machine learning applications to LINCS data and other relevant Big Data from molecular biomedicine.

Network Analysis in Systems Biology
An introduction to data integration and statistical methods used in contemporary Systems Biology, Bioinformatics and Systems Pharmacology research. The course covers methods to process raw data from genome-wide mRNA expression studies (microarrays and RNA-seq) including data normalization, differential expression, clustering, enrichment analysis and network construction. The course contains practical tutorials for using tools and setting up pipelines, but it also covers the mathematics behind the methods applied within the tools. The course is mostly appropriate for beginning graduate students and advanced undergraduates majoring in fields such as biology, math, physics, chemistry, computer science, biomedical and electrical engineering. The course should be useful for researchers who encounter large datasets in their own research. The course presents software, apps and tools developed by the Ma’ayan Laboratory, but also other freely available data analysis and visualization tools. The ultimate aim of the course is to enable participants to utilize the methods presented in this course for analyzing their own data for their own projects. For those participants who do not work in the field, the course introduces the current research challenges faced in the field of computational systems biology.

Big Data Courses at the Icahn School of Medicine at Mount Sinai

Avi Ma’ayan PhD is the course director for two graduate courses at the Icahn School of Medicine at Mount Sinai. The courses are delivered once in the Fall and once in the Spring. The Fall course is focused on data mining and the Spring course on computer programming .

Programming for Big Data Biomedicine
This course covers computer programming methodologies applied to processing data and analysis of data in the broad fields of Bioinformatics and Systems Biology. Topics covered include an overview of data structures and algorithms, Python scripting for processing text files, computational platforms such Jupyter Notebooks as well as database technologies such as mySQL. Students are required to complete small programming assignments throughout the course. Spring 2018 Course Dates

BD2K-LINCS: Data Mining and Network Analysis
This course covers methods that include machine learning applications in systems biology including unsupervised clustering and supervised learning; analysis of the topology of biological regulatory networks; and a survey of how these approaches are applied to study biological molecular networks; papers that combine computational predictions with experimental validation are highlighted; and use of software tools to analyze proteomics and genomics collected by the LINCS experimental expression data. Fall 2016 Course Dates

Resources

The Ma’ayan Laboratory has developed an open-source bioinformatics pipeline to extract knowledge from typical RNA-Seq studies and generate interactive principal component analysis (PCA) plots.

Recently Released Resources

Datasets2Tools

Repository and Search Engine for Bioinformatics Datasets, Tools and Canned Analyses
Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles.
doi: 10.1038/sdata.2018.23

L1000FWD

Large-scale Visualization of Drug-induced Transcriptomic Signatures
L1000 fireworks display (L1000FWD) is a web application that provides interactive visualization of over 16,000 drug and small-molecule induced gene expression signatures. L1000FWD enables coloring of signatures by different attributes such as cell type, time point, concentration, as well as drug attributes such as MOA and clinical phase. Signature similarity search is implemented to enable the search for mimicking or opposing signatures given as input of up and down gene sets. Each point on the L1000FWD interactive map is linked to a signature landing page, which provides multifaceted knowledge from various sources about the signature and the drug. Notably such information includes most frequent diagnoses, co-prescribed drugs and age distribution of prescriptions as extracted from the Mount Sinai Health System electronic medical records (EMR). Overall, L1000FWD serves as a platform for identifying functions for novel small molecules using unsupervised clustering, as well as for exploring drug MOA.
doi: 10.1093/bioinformatics/bty060

ARCHS4

All RNA-seq and CHIP-seq Signature Search Space
ARCHS4 provides access to gene counts from HiSeq 2000 and HiSeq 2500 platforms for human and mouse experiments from GEO and SRA. The website enables downloading of the data in H5 format for programmatic access as well as a 3-dimensional view of the sample and gene spaces. Search features allow browsing of the data by meta data annotation, ability to submit your own up and down gene sets, and explore matching samples enriched for annotated gene sets. Selected sample sets can be downloaded into a tab separated text file through auto-generated R scripts for further analysis. Reads are aligned with Kallisto using a custom cloud computing platform. Human samples are aligned against the GRCh38 human reference genome, and mouse samples against the GRCm38 mouse reference genome.
bioRxiv 189092

LJP-BCNB

LINCS Joint Project – Breast Cancer Network Browser
LJP-BCNB visualizes thousands of signatures from six breast cancer cell lines treated with ~100 single molecule perturbations, mostly kinase inhibitors. These perturbations were applied in different concentrations while gene expression was measured at different time points using the L1000 technology. Under the same conditions, the cells were imaged for cell viability. The distance between nodes represents response similarity computed using the cosine distance between the Characteristic Direction vectors of perturbations compared with their appropriate controls.
PMID: 29084964


Explore More Ma’ayan Lab Resources…

For the purpose of organizing, visualizing, analyzing and modeling data from high-throughput molecular profiling experiments we develop computational approaches that can assist experimental systems-biologists to form rational hypotheses for further experimentation. We analyze high-dimensional data collected for projects integrating results from multiple layers of regulation (genomics, transcriptomics and proteomics). Algorithms and datatsets are delivered as software so that our methodologies can reach and impact the interested systems biology research community. Below are some of the software tools we developed:

Enrichr

Enrichr

Gene-List Enrichment Analysis Tool
An integrative web-based and mobile gene-list enrichment analysis tool that includes over 100 gene-set libraries, an alternative approach to rank enriched terms, and various interactive visualization approaches to display enrichment results using the JavaScript library Data-Driven Documents (D3). Enrichr is open source and freely available online. The software can also be embedded easily into any tools that perform gene list analysis.
PMID: 23586463
PMID: 27141961

GEO2Enrichr

GEO2Enrichr

Browser Extension for Extracting Differentially Expressed Gene Sets from GEO
A web application and two browser extensions (one for Chrome and another for Firefox) designed to facilitate the extraction of signatures from studies posted on the Gene Expression Omnibus (GEO) database. These signatures are then submitted to Enrichr for downstream functional analysis.
PMID: 25971742

CREEDS

CREEDS

Crowd Extracted Expression of Differential Signatures
Collections of processed gene, drug and disease signatures from GEO.
doi:10.1038/ncomms12846

GEN3VA

GEN3VA

Gene Expression and Enrichment Vector Analyzer
A web-based system that enables the integrative analysis of aggregated collections of tagged gene expression signatures identified and extracted from GEO. Each tagged collection of signatures is presented in a report that consists of heatmaps of the differentially expressed genes; principal component analysis of all signatures; enrichment analysis with several gene set libraries across all signatures, which we term enrichment vector analysis; and global mapping of small molecules that are predicted to reverse or mimic each signature in the aggregate.
PMID: 27846806

L1000CDS2

L1000CDS2

L1000 Characteristic Direction Signature Search Engine
Finds consensus signatures that match users’ input gene lists or input signatures. The underlying dataset is the LINCS L1000 small molecule expression profiles generated at the Broad Institute by the Connectivity Map team. The differentially expressed (DE) genes of these profiles were calculated using the Characteristic Direction method.
doi:10.1038/npjsba.2016.15

Harmonizome

Harmonizome

Biological Knowledge Engine
A biological knowledge engine built on top of information about genes and proteins from 114 datasets. To create the Harmonizome, we distilled information from original datasets into attribute tables that define significant associations between genes and attributes, where attributes could be genes, proteins, cell lines, tissues, experimental perturbations, diseases, phenotypes, or drugs, depending on the dataset. Gene and protein identifiers were mapped to NCBI Entrez Gene Symbols and attributes were mapped to appropriate ontologies. We also computed gene-gene and attribute-attribute similarity networks from the attribute tables. These attribute tables and similarity networks can be integrated to perform many types of computational analyses for knowledge discovery and hypothesis generation.
Harmonizome mobile app
PMID: 27374120

SEP L1000

SEP L1000

Side Effect Prediction Based on L1000 Data
Serves the results of the predicted ADRs for the drugs and small-molecule compounds profiled in the LINCS L1000 project. A network of predictive ADRs was constructed based on their drug similarity and visualized using a stacked bubble chart. Each drug and ADR has a dedicated page with a list of the relevant predictions and external links.
PMID: 27153606

PAEA

Principal Angle Enrichment Analysis

Dimensionally Reduced Multivariate Gene Set Enrichment Analysis Tool
Uses the geometrical concept of the principal angle to quantify gene-set enrichment. We find that PAEA outperforms a selection of commonly used gene set enrichment methods including GSEA. To benchmark PAEA with other enrichment methods we use real data. We examined the ranking of transcription factors by performing enrichment analysis on gene expression signatures from many studies that knocked-down, knocked-out or over-expressed transcription factors, and performed the enrichment analysis with a library of gene sets created from ChIP-Seq data profiling the same transcription factors.
PMID: 26848405

Slicr

Slicr

LINCS L1000 Slicr [GSE70138 data only]
A metadata search engine that searches for LINCS L1000 gene expression profiles and signatures matching users’ input parameters. It features download of selected search results as csv files in a zipped folder and visualization of selected results in a 3D scatter plot using PCA or MDS. Slicr consists of three views: the search view, the checkout view and the 3D scatter view.

Lincs Canvas Browser

LINCS Canvas Browser

LINCS L1000 Clustering, Visualization and Enrichment Analysis Tool
A web-based tool that enables users to explore thousands of genome-wide gene expression experiments applied to breast cancer cell lines. The browser visualizes results from L1000 experiments where drugs or endogenous ligands were applied to six human breast cancer cell lines in different concentrations and where expression was measured at different time points. The visualization of the results is organized by cell-line and batch where perturbations that induced similar responses are clustered together on a canvas.
PMID: 24906883

Drug/Cell-line Browser

Drug/Cell-line Browser

Data Visualization Tool
An interactive HTML5 data visualization tool for interacting with three of the recently published datasets of cancer cell lines/drug-viability studies. DCB uses clustering and canvas visualization of the drugs and the cell lines, as well as a bar graph that summarizes drug effectiveness for the tissue of origin or the cancer subtypes for single or multiple drugs. DCB can help in understanding drug response patterns and prioritizing drug/cancer cell line interactions by tissue of origin or cancer subtype.
PMID: 25100688

Cite-D-Lite

Chrome Extension for Data and Paper Citations with Text Importance Highlighting
Functions on specific pages of GEO, PubMed, and DataMed. It has two functions: (1) to create downloadable citations for GEO data and PubMed articles and (2) to highlight the most important sentences in PubMed abstracts in a graded manner (based on TextRank algorithm).

Drug Pair Seeker

Drug Pair Seeker

Predict and Prioritize Pairs of Drugs
A Java program that attempts to predict and prioritize pairs of drugs using the Connectivity Map dataset. Users can enter lists of up and down differentially expressed genes from their experiments to receive a ranked list of drug combinations that are predicted to either reverse or augment the gene expression state of the cells or tissue of interest using a simple formula.
PMID: 23559582

N2C-logo

Network2Canvas

Perform and Visualize Gene-set and Drug-set Enrichment Analyses
A web application that provides an alternative way to view networks. N2C visualizes networks by placing nodes on a square toroidal canvas. The network nodes are clustered on the canvas using simulated annealing to maximize local connections where a node’s brightness is made proportional to its local fitness. The interactive canvas is implemented in HyperText Markup Language (HTML)5 with the JavaScript library Data-Driven Documents (D3). We applied N2C to visualize 30 canvases made from human and mouse gene-set libraries and 6 canvases made from the Food and Drug Administration (FDA)-approved drug-set libraries.
PMID: 23749960

ChEA

ChEA

ChIP-X Enrichment Analysis
Database contains manually extracted datasets of transcription-factor/target-gene interactions from over 100 experiments such as ChIP-chip, ChIP-seq, ChIP-PET applied to mammalian cells. We use the database to analyze mRNA expression data where we perform gene-list enrichment analysis as the prior biological knowledge gene-list library. The system is delivered as web-based interactive software. With this software users can input lists of mammalian genes for which the program computes over-representation of transcription factor targets from the ChEA database.
PMID: 20709693

KEA

KEA

Kinase Enrichment Analysis
A web-based tool with an underlying database providing users with the ability to link lists of mammalian proteins/genes with the kinases that phosphorylate them. The system draws from several available kinase-substrate databases to compute kinase enrichment probability based on the distribution of kinase-substrate proportions in the background kinase-substrate database compared with kinases found to be associated with an input list of genes/proteins.
PMID: 19176546

gate

GATE

Grid Analysis of Time-series Expression
A computational software platform for integrated visualization and analysis of expression time-series. Given a high-dimensional time-series dataset, GATE employs a clustering algorithm that creates movies of expression dynamics by assigning individual genes/proteins to hexagons on a hexagonal array and dynamically coloring each hexagon according to the expression level of the molecular species with which it is associated. Additionally, in order to infer potential regulatory control mechanisms from patterns of time-series correlations, GATE allows interactive interrogation of the movies with a wide variety of background knowledge datasets.
PMID: 19892805

Genes2FANs

Genes2FANs

Utilizes FANs and a PPI Network to Build Subnetworks that Connect Lists of Human and Mouse Genes
A web-based tool and a database that utilizes 14 carefully constructed functional association networks (FANs) and a large-scale protein-protein interaction (PPI) network to build subnetworks that connect input lists of human and mouse genes. The FANs are created from mammalian gene set libraries where mouse genes are converted to their human orthologs. The tool takes as input a list of human or mouse Entrez gene symbols to produce a subnetwork and a ranked list of intermediate genes that are used to connect the query input list. In addition, users can enter any PubMed search term and then the system automatically converts the returned results to gene lists using GeneRIF. This gene list is then used as input to generate a subnetwork from the user’s PubMed query.
PMID: 22748121

Sets2Networks

Sets2Networks

Network Inference from Repeated Observations of Sets
A general method for network inference from repeated observations of sets of related entities. Given experimental observations of sets of related entities, S2N infers the underlying network of binary interactions between these entities by generating an ensemble of networks consistent with the data; the frequency of occurrence of a given interaction throughout this ensemble is interpreted as the probability that the interaction is present in the underlying real network.
PMID: 22824380

Expression2Kinases

Expression2Kinases

Gene Expression Data Analysis
A method to identify upstream regulators likely responsible for observed patterns in genome-wide gene expression. By integrating ChIP-seq/chip and position-weight-matrices (PWMs) data, protein-protein interactions, and kinase-substrate phosphorylation reactions, X2K can better identify regulatory mechanisms upstream of genome-wide differences in gene expression. X2K first infers the most likely transcription factors that regulate the differences in gene expression, then uses protein-protein interactions to connect the identified transcription factors using additional proteins for building transcriptional regulatory subnetworks centered on these factors, and finally uses kinase-substrate protein phosphorylation reactions, to identify and rank candidate protein-kinases that most likely regulate the formation of the identified transcriptional complexes.
PMID: 22080467

lists2networks

Lists2Networks

Integrated Analysis of Gene/Protein Lists
A web-based system that allows users to upload and analyze lists of mammalian gene-sets in a client-server software application. Within their workspace users can examine the overlap among the lists they upload, manipulate lists with different set operations, expand lists using existing mammalian networks of protein-protein, co-expression correlations, or background knowledge annotation correlations, and apply simple gene-set enrichment analyses on many gene lists at once against a plethora of prior knowledge datasets.
PMID: 20152038

Genes2Networks

Genes2Networks

Tool for Creating Subnetworks from Lists of Mammalian Genes or Proteins
A software tool that can be used to place lists of mammalian genes in the context of background mammalian signalome and interactome networks. The input to the program is a list of human Entrez Gene gene symbols and background networks in SIG format, while the output includes: (a) all identified interactions for the genes/proteins, (b) a subnetwork connecting the genes/proteins using intermediate components that are used to connect the genes, (c) ranking of the specificity of intermediate components to interact with the list of genes/proteins.
PMID: 17916244

Flash-based Network Viewer

Flash-based Network Viewer

Network Visualization
Visualization of small to moderately sized biological networks and pathways. FVN can also be used to embed pathways inside PDF files for the communication of pathways in soft publication materials.
PMID: 21349871

GenesWordCloud

Genes2WordCloud

Identify Biological Themes from Gene Lists
A word-cloud generator and a word-cloud viewer that is based on WordCram implemented using Java, Processing, AJAX, mySQL, and PHP. Text is fetched from several sources and then processed to extract the most relevant terms with their computed weights based on word frequency.
PMID: 21995939

Sig2BioPAX

Sig2BioPAX

Tool for Converting Flat Files to BioPAX
A command-line Java program that can be used to convert structured text files describing molecular interactions into the BioPAX Level 3 standard format.
PMID: 21418653

snavi

SNAVI

Desktop Application for Analysis and Visualization of Large-Scale Cell Signaling Networks
A Windows-based desktop application that implements standard network analysis methods to compute the clustering, connectivity distribution, and detection of network motifs, as well as provides means to visualize networks and network motifs. SNAVI is capable of generating linked web pages from network datasets loaded in text format. SNAVI can also create networks from lists of gene or protein names. SNAVI is a useful tool for analyzing, visualizing and sharing cell signaling data. SNAVI is open source free software.
PMID: 19154595

avis

AJAX Viewer for Signaling Networks

Web-based Viewer of Interactive Cell signaling Networks
A visualization tool for viewing and sharing intracellular signaling, gene regulation and protein interaction networks. AVIS is implemented as an AJAX-enabled syndicated Google gadget. It allows any webpage to render an image from a text file representation of signaling, gene regulatory, or protein interaction networks.
PMID: 17855420

PubMedAlertMe-icon

PubMed Alert Me!

PubMed SDI Software Application
A software utility that allows users to enter a list of PubMed queries. Once a list of queries is configured, the program runs either daily or weekly. It searches PubMed and if it finds new matching published papers, the program sends an e-mail notification with a list of links to the new articles.
PMID: 18402930

ESCAPE

ESCAPE

Database for Integrating High-content Data Collected from Human and Mouse Embryonic Stem Cells
A mammalian embryonic stem cell (ESC)-specific database created by collecting and integrating data reporting results from various published studies that profiled human and mouse ESCs including: protein-DNA binding interactions extracted from ChIP-seq/chip experiments, gene regulatory interactions from loss/gain-of-function studies followed by genome-wide mRNA expression profiling, protein interactions from immunoprecipitation followed by mass-spectrometry proteomics, a list of potential pluripotency regulators from RNA interference screens, ESC-specific proteins and phosphoproteins with specified phosphosites from proteomics and phosphoproteomics studies, time-course genome-wide mRNA microarray datasets from differentiating mouse ESCs, and histone modification status from genome-wide studies.
PMID: 23794736

Adhesome-icon

Adhesome

Literature-based Protein-Protein Interaction Network
A comprehensive literature-derived biochemical network developed in collaboration with Benny Geiger’s Lab. The network is made of known interactions and cellular components composing the focal adhesion complex in mammalian cells. The Adhesome website provides a reference to and supporting materials for the analysis published in Nature Cell Biology.
PMID: 17671451

NeuronalSignalome-icon

Neuronal Signalome

Model Representing Signaling Pathways and Cellular Machines in the Hippocampal CA1 Neuron
Consists of cell signaling interactions extracted from literature describing components and interactions in mammalian neurons. This network integrates cell signaling pathways specific to mammalian neurons.
PMID: 16099987

presynaptome

Presynaptome

PRE Literature-based Protein-Protein Interaction Network
Consists of literature-based protein-protein interactions extracted from low-throughput experimental studies reporting interactions in mammalian presynaptic nerve terminals.
PMID: 19562802

Network Datasets for Download: in SIG file format

Research

The Ma’ayan Laboratory applies computational and mathematical methods to study the complexity of regulatory networks in mammalian cells. We apply machine learning and other statistical mining techniques to study how intracellular regulatory systems function as networks to control cellular processes such as differentiation, dedifferentiation, apoptosis and proliferation. We develop software systems to help experimental biologists form novel hypotheses from high-throughput data, while aiming to better understand the structure and function of regulatory networks in mammalian cellular and multi-cellular systems.

We lead two NIH funded Centers: the BD2K-LINCS Data Coordination and Integration Center (DCIC), and the Knowledge Management Center for the Illuminating the Druggable Genome.

Assemble the Largest Novel Collection of Gene Set Libraries

Gene Set Enrichment Analysis and Gene Ontology analysis are central to all biological investigations that measure gene and protein expression at the global scale. Such analyses were limited until recently to pathways enrichment and/or gene ontology enrichment. We showed that enrichment analysis can be expanded to using data from many biological domains. By developing the tools: Kinase Enrichment Analysis (KEA), ChIP-X Enrichment Analysis (ChEA), Lists2Networks and Enrichr, we demonstrated that many resources can be converted to useful gene set libraries and these can better inform analyses from genome-wide expression studies. So far, over 100,000 unique users have utilized the enrichment analyses software tools we developed.

Develop Novel Methods to Identify Differentially Expressed Genes, Perform Gene Set Enrichment Analysis and Setup Benchmarks for Such Methods

One of the key statistical tests in the fields of systems biology and genomics sciences is the identification of differentially expressed genes and performing gene set enrichment analyses to identify biological themes from gene expression data. We develop multivariate methods to better identify the more “correct” differentially expressed genes from genomics studies, and to better perform enrichment analysis. Using a novel benchmarking strategy that we developed, we show we can fairly compare methods such as: limma, SAM and DESeq for differential expression, and GSEA for gene set enrichment analysis to evaluate these methods’ performance.

Understand the Structure and Dynamics of Cellular Regulatory Networks

Analysis of the gene sets and networks we have collected and analyzed can uncover design principles and modules that make up the complex organizational structure of mammalian cells. Analysis of Big Data in the field also points to the existence of various experimental biases and computational limitations. We aim to develop new theories and new algorithms to better extract knowledge from such complex data.  Many of the theoretical observations we extracted from the topologies of biological networks and gene sets are manifestations of general design principles observed in many complex systems, not just in biological networks, and we are interested in understanding how such principles emerge and are related.

About Us

 

Summary of Research Interests

The Ma’ayan Laboratory develops computational and mathematical methods to study the complexity of regulatory networks in mammalian cells. We apply machine learning and other statistical mining techniques to study how intracellular regulatory systems function as networks to control cellular processes such as differentiation, dedifferentiation, apoptosis and proliferation. We develop software systems to help experimental biologists form novel hypotheses from high-throughput data, while aiming to better understand the structure and function of regulatory networks in mammalian cellular and multi-cellular systems. Below are some of the software tools we designed and developed recently:

enrichr

Enrichr

Gene-List Enrichment Analysis Tool
An integrative web-based and mobile gene list enrichment analysis tool providing various types of visualization summaries of collective functions of gene lists.
PMID: 27141961

g2e

GEO2Enrichr

Browser Extension for Extracting Differentially Expressed Gene Sets from GEO
A browser extension and web application to extract gene sets from GEO and analyze these lists for common biological functions.
PMID: 25971742

creeds_logo.fw

CREEDS

Crowd Extracted Expression of Differential Signatures
Collections of processed gene, drug and disease signatures from GEO.
PMID: 27667448

gen3va_logo

GEN3VA

Gene Expression and Enrichment Vector Analyzer
Aggregates and analyzes gene expression signatures extracted from GEO by the crowd using GEO2Enrichr.
PMID: 27846806

l1000cds2

L1000CDS2

L1000 Characteristic Direction Signature Search Engine
Queries gene expression signatures against the LINCS L1000 to identify and prioritize small molecules that can reverse or mimic the observed input expression pattern.
doi:10.1038/npjsba.2016.15

harmonizome

Harmonizome

Biological Knowledge Engine
Built on top of information about genes and proteins from 114 datasets, the Harmonizome is a knowledge engine for a diverse set of integrated resources.
PMID: 27374120

sep

SEP L1000

Side Effect Prediction Based on L1000 Data
Web portal for searching and browsing predictive small-molecule/ADR connections.
PMID: 27153606

For a complete list of our software tools, databases and datasets please visit our Resources page. We apply these and other computational methods for the analysis of a variety of collaborative projects. The results from our analyses produce concrete suggestions and predictions for further functional experiments. The predictions are tested by our collaborators and our analyses methods are delivered as software tools and databases for the systems biology research community.