Positions Available

Data Scientist – Bioinformatics Web Developer II

Posted August 2017

A full-time position is available in the Ma’ayan Laboratory of Computational Systems Biology and the BD2K-LINCS Data Coordination and Integration Center at the Icahn School of Medicine at Mount Sinai in New York. The Ma’ayan Laboratory conducts multi-disciplinary NIH-funded research that utilizes big data analytics to develop better understanding about drug action in human cells, build molecular regulatory networks from high-content genome-wide data, and predict optimized therapeutics for individual patients across several complex diseases.

What you’ll do:

The successful candidate will collaborate with an interdisciplinary team on projects related to bioinformatics, big data science, and systems biology including developing, implementing, documenting and maintaining web-based software applications used by the larger scientific community. You will work on various aspects of research and infrastructure projects. Your work will include:

  • Developing novel dynamic data visualizations
  • Applying machine learning to identify patterns in large and complex datasets
  • Harmonizing and abstracting data from a variety of sources
  • Developing novel statistical mining strategies and algorithms
  • Developing websites, databases, APIs and other data exchange protocols

What you’ll bring:

  • Master’s or PhD (PhD preferred) in Computer Science, Informatics, Mathematics, Statistics, Physics, Engineering or Biological Sciences and a strong interest in working on data-intensive biomedical problems.
  • Experience with machine learning, multithread programming, and cloud computing
  • Experience developing and deploying web-based and mobile apps
  • Experience with bioinformatics research projects
  • Knowledge of Python, R, Java, JavaScript, Node.js, MongoDB, MySQL, Docker

To apply, please e-mail your CV, research statement, and the names and contact information of three references to: sherry.jenkins@mssm.edu

Data Scientist – Bioinformatics Web Developer I

Posted August 2017

A full-time position is available in the Ma’ayan Laboratory of Computational Systems Biology and the BD2K-LINCS Data Coordination and Integration Center at the Icahn School of Medicine at Mount Sinai in New York. The Ma’ayan Laboratory conducts multi-disciplinary NIH-funded research that utilizes big data analytics to develop better understanding about drug action in human cells, build molecular regulatory networks from high-content genome-wide data, and predict optimized therapeutics for individual patients across several complex diseases.

What you’ll do:

The successful candidate will collaborate with an interdisciplinary team on projects related to bioinformatics, big data science, and systems biology including developing, implementing, documenting and maintaining web-based software applications used by the larger scientific community. You will work on various aspects of research and infrastructure projects. Your work will include:

  • Developing novel dynamic data visualizations
  • Applying machine learning to identify patterns in large and complex datasets
  • Harmonizing and abstracting data from a variety of sources
  • Developing novel statistical mining strategies and algorithms
  • Developing websites, databases, APIs and other data exchange protocols

What you’ll bring:

  • Bachelor’s degree in Computer Science, Informatics, Mathematics, Statistics, Physics, Engineering, or Biological Sciences and a strong interest in working on data-intensive biomedical problems.
  • Experience with machine learning, multithread programming, and cloud computing
  • Experience developing and deploying web-based and mobile apps
  • Experience with bioinformatics research projects
  • Knowledge of Python, R, Java, JavaScript, Node.js, MongoDB, MySQL, Docker
  • Knowledge of molecular and cell biology

To apply, please e-mail your CV, research statement, and the names and contact information of three references to: sherry.jenkins@mssm.edu

Current Graduate Students

Posted August 2017

If you are interested in joining the lab as a graduate student, please email Dr. Ma’ayan at avi.maayan@mssm.edu. The Ma’ayan Laboratory accepts rotation students from all Multidisciplinary Training Areas (MTAs) with the ISMMS Graduate School of Biomedical Sciences.

Prospective Graduate Students

Posted August 2017

Prospective graduate students should apply to one of the programs at the ISMMS Graduate School of Biomedical Sciences.

Mount Sinai Health System is an equal opportunity/affirmative action employer. We recognize the power and importance of a diverse employee population and strongly encourage applicants with various experiences and backgrounds. Mount Sinai Health System – An EEO/AA-D/V Employer

Courses

maayan_contact1.fw

Big Data MOOCs on Coursera

Avi Ma’ayan PhD is the course director for two massive open online courses (MOOCs) on the Coursera platform. As of March 2016, over 33,000 students registered for these two courses and 195,000 video lectures were viewed.

Big Data Science with the BD2K-LINCS Data Coordination and Integration Center
The BD2K-LINCS Data Coordination and Integration Center (DCIC) is commissioned to organize, analyze, visualize and integrate LINCS data with other publicly available relevant resources. In this course we will introduce the various Centers that collect data for LINCS, describing the experimental data procedures and the various data types. We will then cover the design and collection of metadata and how metadata is linked to ontologies. Additionally, basic data processing and data normalization methods to clean and harmonize LINCS data will be presented. This will follow a discussion about how the data is served as RESTful APIs and JSON, and for this we will cover concepts from client-server computing. Most importantly, the course will focus on various bioinformatics methods of analysis including: unsupervised clustering, gene-set enrichment analyses, Bayesian integration, network visualization, and supervised machine learning applications to LINCS data and other relevant Big Data from molecular biomedicine.

Network Analysis in Systems Biology
An introduction to data integration and statistical methods used in contemporary Systems Biology, Bioinformatics and Systems Pharmacology research. The course covers methods to process raw data from genome-wide mRNA expression studies (microarrays and RNA-seq) including data normalization, differential expression, clustering, enrichment analysis and network construction. The course contains practical tutorials for using tools and setting up pipelines, but it also covers the mathematics behind the methods applied within the tools. The course is mostly appropriate for beginning graduate students and advanced undergraduates majoring in fields such as biology, math, physics, chemistry, computer science, biomedical and electrical engineering. The course should be useful for researchers who encounter large datasets in their own research. The course presents software, apps and tools developed by the Ma’ayan Laboratory, but also other freely available data analysis and visualization tools. The ultimate aim of the course is to enable participants to utilize the methods presented in this course for analyzing their own data for their own projects. For those participants who do not work in the field, the course introduces the current research challenges faced in the field of computational systems biology.

Big Data Courses at the Icahn School of Medicine at Mount Sinai

Avi Ma’ayan PhD is the course director for two graduate courses at the Icahn School of Medicine at Mount Sinai. The courses are delivered once in the Fall and once in the Spring. The Fall course is focused on data mining and the Spring course on computer programming .

BD2K-LINCS: Data Mining and Network Analysis
This course covers methods that include machine learning applications in systems biology including unsupervised clustering and supervised learning; analysis of the topology of biological regulatory networks; and a survey of how these approaches are applied to study biological molecular networks; papers that combine computational predictions with experimental validation are highlighted; and use of software tools to analyze proteomics and genomics collected by the LINCS experimental expression data. Fall 2016 Course Dates

Programming for Big Data Biomedicine
This course covers computer programming methodologies applied in the broad fields of Bioinformatics, Systems Biology and Complex Systems Theory. Topics covered include scripting, processing text files, converting data to figures with MATLAB and R, building Agent-Based Models, as well as learning how to use web technologies such as HTML, JavaScript, PHP, Bootstrap and D3. Students will be required to complete small programming assignments throughout the course.

Research

The Ma’ayan Laboratory applies computational and mathematical methods to study the complexity of regulatory networks in mammalian cells. We apply machine learning and other statistical mining techniques to study how intracellular regulatory systems function as networks to control cellular processes such as differentiation, dedifferentiation, apoptosis and proliferation. We develop software systems to help experimental biologists form novel hypotheses from high-throughput data, while aiming to better understand the structure and function of regulatory networks in mammalian cellular and multi-cellular systems.

We lead two NIH funded Centers: the BD2K-LINCS Data Coordination and Integration Center (DCIC), and the Knowledge Management Center for the Illuminating the Druggable Genome.

Assemble the Largest Novel Collection of Gene Set Libraries

Gene Set Enrichment Analysis and Gene Ontology analysis are central to all biological investigations that measure gene and protein expression at the global scale. Such analyses were limited until recently to pathways enrichment and/or gene ontology enrichment. We showed that enrichment analysis can be expanded to using data from many biological domains. By developing the tools: Kinase Enrichment Analysis (KEA), ChIP-X Enrichment Analysis (ChEA), Lists2Networks and Enrichr, we demonstrated that many resources can be converted to useful gene set libraries and these can better inform analyses from genome-wide expression studies. So far, over 100,000 unique users have utilized the enrichment analyses software tools we developed.

Develop Novel Methods to Identify Differentially Expressed Genes, Perform Gene Set Enrichment Analysis and Setup Benchmarks for Such Methods

One of the key statistical tests in the fields of systems biology and genomics sciences is the identification of differentially expressed genes and performing gene set enrichment analyses to identify biological themes from gene expression data. We develop multivariate methods to better identify the more “correct” differentially expressed genes from genomics studies, and to better perform enrichment analysis. Using a novel benchmarking strategy that we developed, we show we can fairly compare methods such as: limma, SAM and DESeq for differential expression, and GSEA for gene set enrichment analysis to evaluate these methods’ performance.

Understand the Structure and Dynamics of Cellular Regulatory Networks

Analysis of the gene sets and networks we have collected and analyzed can uncover design principles and modules that make up the complex organizational structure of mammalian cells. Analysis of Big Data in the field also points to the existence of various experimental biases and computational limitations. We aim to develop new theories and new algorithms to better extract knowledge from such complex data.  Many of the theoretical observations we extracted from the topologies of biological networks and gene sets are manifestations of general design principles observed in many complex systems, not just in biological networks, and we are interested in understanding how such principles emerge and are related.

About Us

 

Summary of Research Interests

The Ma’ayan Laboratory develops computational and mathematical methods to study the complexity of regulatory networks in mammalian cells. We apply machine learning and other statistical mining techniques to study how intracellular regulatory systems function as networks to control cellular processes such as differentiation, dedifferentiation, apoptosis and proliferation. We develop software systems to help experimental biologists form novel hypotheses from high-throughput data, while aiming to better understand the structure and function of regulatory networks in mammalian cellular and multi-cellular systems. Below are some of the software tools we designed and developed recently:

enrichr

Enrichr

Gene-List Enrichment Analysis Tool
An integrative web-based and mobile gene list enrichment analysis tool providing various types of visualization summaries of collective functions of gene lists.
PMID: 27141961

g2e

GEO2Enrichr

Browser Extension for Extracting Differentially Expressed Gene Sets from GEO
A browser extension and web application to extract gene sets from GEO and analyze these lists for common biological functions.
PMID: 25971742

creeds_logo.fw

CREEDS

Crowd Extracted Expression of Differential Signatures
Collections of processed gene, drug and disease signatures from GEO.
PMID: 27667448

gen3va_logo

GEN3VA

Gene Expression and Enrichment Vector Analyzer
Aggregates and analyzes gene expression signatures extracted from GEO by the crowd using GEO2Enrichr.
PMID: 27846806

l1000cds2

L1000CDS2

L1000 Characteristic Direction Signature Search Engine
Queries gene expression signatures against the LINCS L1000 to identify and prioritize small molecules that can reverse or mimic the observed input expression pattern.
doi:10.1038/npjsba.2016.15

harmonizome

Harmonizome

Biological Knowledge Engine
Built on top of information about genes and proteins from 114 datasets, the Harmonizome is a knowledge engine for a diverse set of integrated resources.
PMID: 27374120

sep

SEP L1000

Side Effect Prediction Based on L1000 Data
Web portal for searching and browsing predictive small-molecule/ADR connections.
PMID: 27153606

For a complete list of our software tools, databases and datasets please visit our Resources page. We apply these and other computational methods for the analysis of a variety of collaborative projects. The results from our analyses produce concrete suggestions and predictions for further functional experiments. The predictions are tested by our collaborators and our analyses methods are delivered as software tools and databases for the systems biology research community.