Big Data MOOCs on Coursera

Avi Ma’ayan PhD is the course director for two massive open online courses (MOOCs) on the Coursera platform. As of March 2023, we have over 267,600 unique visitors and a combined total of over 25,000 students enrolled for these two MOOCs.

Big Data Science with the BD2K-LINCS Data Coordination and Integration Center
The BD2K-LINCS Data Coordination and Integration Center (DCIC) was commissioned to organize, analyze, visualize and integrate LINCS data with other publicly available relevant resources. In this course, we introduce the various Centers that collect data for LINCS, describing the experimental data procedures and the various data types. We will then cover the design and collection of metadata and how metadata is linked to ontologies. Additionally, basic data processing and data normalization methods to clean and harmonize LINCS data will be presented. This will follow a discussion about how the data is served as RESTful APIs and JSON, and for this we will cover concepts from client-server computing. Most importantly, the course will focus on various bioinformatics methods of analysis including: unsupervised clustering, gene-set enrichment analyses, Bayesian integration, network visualization, and supervised machine learning applications to LINCS data and other relevant Big Data from molecular biomedicine.

Network Analysis in Systems Biology
An introduction to data integration and statistical methods used in contemporary Systems Biology, Bioinformatics and Systems Pharmacology research. The course covers methods to process raw data from genome-wide mRNA expression studies (microarrays and RNA-seq) including data normalization, differential expression, clustering, enrichment analysis and network construction. The course contains practical tutorials for using tools and setting up pipelines, but it also covers the mathematics behind the methods applied within the tools. The course is mostly appropriate for beginning graduate students and advanced undergraduates majoring in fields such as biology, math, physics, chemistry, computer science, biomedical and electrical engineering. The course should be useful for researchers who encounter large datasets in their own research. The course presents software, apps and tools developed by the Ma’ayan Laboratory, but also other freely available data analysis and visualization tools. The ultimate aim of the course is to enable participants to utilize the methods presented in this course for analyzing their own data for their own projects. For those participants who do not work in the field, the course introduces the current research challenges faced in the field of computational systems biology.

Big Data Courses at the Icahn School of Medicine at Mount Sinai

Avi Ma’ayan PhD is the course director for two graduate courses at the Icahn School of Medicine at Mount Sinai. The courses are delivered once in the Fall and once in the Spring. The Fall course is focused on data mining and the Spring course on computer programming.

BSR 6806: Programming for Big Data Biomedicine
The course covers computational methodologies applied to analyze data in the broad fields of bioinformatics and big data science. Topics covered include RNA-seq and proteomics data analysis, Machine Learning, Deep Learning, Text Mining, Python and Jupyter Notebooks, Appyters, cloud computing, data visualization, network analysis, version control, and Knowledge Graphs. Students are required to complete small programming assignments throughout the course. The course uses Jupyter Notebooks and Appyters to run most tutorials. [YouTube playlist of course lectures: Spring 2024, Spring 2023]

Data Mining and Network Analysis
This course covers methods that include machine learning applications in systems biology including unsupervised clustering and supervised learning; analysis of the topology of biological regulatory networks; and a survey of how these approaches are applied to study biological molecular networks; papers that combine computational predictions with experimental validation are highlighted; and use of software tools to analyze proteomics and genomics collected by the LINCS experimental expression data.