Tools

* We are in the process of migrating our old softwares to github. Contact us if you need help.

RIMBANET (Reconstructing Integrative Molecular Bayesian Network): RIMBANet is a software package for reconstructing integrative molecular Bayesian networks. There are multiple sources of perturbations (eg. genetic mutations, copy number variations, methylations and etc.) that may contribute aberrant behaviors of biological systems such as cancer cells. Cells employ multiple levels of regulation that enable them to respond to genetic and environmental perturbations. At the transcriptional level, abundance of mRNA can be affected by the rate of transcription, a complex process regulated by transcription factors and enhancers, and by the rate of degradation of transcripts, a process regulated by RNA binding proteins and, in many organisms, microRNAs. Protein abundances are determined by protein degradation and protein synthesis rates, where protein synthesis can be regulated by translation initiation factors and microRNAs. Protein activity depends on a number of factors in addition to protein abundance, including protein localization, phosphorylation states and other post-translational modifications, and protein-protein interactions. In addition to transcript and protein levels, the abundance of small-molecule metabolites is also tuned in response to changes in a cell’s physiological state.

One of major goals of systems biology is to understand how these genetic and environment variations drive transcriptional networks, protein-protein interaction networks, metabolite networks and etc. to give arise to complex phenotypes. The integration of genetic variation and intermediate observations such as mRNA variations into probabilistic causal models that can dissect genetic pathways and provide mechanisms connecting DNA to clinical outcomes. We developed a computation framework centered around Bayesian network and implemented it in RIMBANet (BN4Distribution.tgz), which is freely available for download. We have previously used RIMBANet to discover causal relationships in complex human diseases such as diabetes and obesity and yeast model.

We applied RIMBANet to investigate how genetic variations regulate transcriptional and metabolite level changes in yeast. The full data set used in the study is available here (Yeast_4_Distribution.tgz).

For questions related to the RIMBANet package or the yeast data set, please contact Dr. Jun Zhu.  Some compiled tips can be found helps.

 

MODMacher (Multi-Omics Data Matcher): Errors in sample annotation or labeling occur frequently in large-scale genetic or genomic studies and are difficult to be completely avoided in the process of data generation and management. Identifying and correcting these errors are critical for integrative genomic studies. Different types of genetic and genomic data are inter-connected by cis-regulations.  Based on these cis-regulations among different types of data, we develop a computational approach, named Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors in the multiple types of molecular data that can be subsequently used in further integrative analysis. Our results indicate that inspection of sample annotation and labeling error is an indispensable data quality assurance step.   Application to a large lung genomic study identified greatly increased statistically significant genetic associations and genomic correlations, a more than two-fold improvement. A simulation study shows that MODMatcher using three types of omics data is more robust than MODMatcher using two types of omics data.  Details are described in Yoo et al (PLoS Comp. Biol, 2014).

 

ActMiR (Activity of miRNAs): MicroRNAs post-transcriptionally regulate a large number of mRNAs and play a key role in regulating cell growth, differentiation, and apoptosis.  However, miRNA expression level is not equivalent to its functional activity (Mullokandov et al., 2012). We developed a computational approach to explicitly infer the activity of miRNAs based on the change in the expression levels of target genes. We showed in multiple cancer types (such as breast cancers, ovarian cancers, and GBM) that our estimated miRNA activities were consistently associated with clinical data in multiple independent data sets while the associations based on miRNA expression level itself couldn’t be replicated.  The result is published in Lee et al. (Bioinformatics, 2015).

 

DDSClassifier (Deconvoluted Disease-Specific Classifier): Diagnostic and prognostic models based on peripheral blood gene expression have been reported for various types of disease. However, whole blood gene expression represents a mixture of hematopoietic cells, and is greatly influenced by the cell type frequency. Multiple common pathological and physiological changes result in similar blood cell type frequency change, which affects blood-based biomarkers’ specificity.To address these issues, we carried out a meta-analysis of 46 whole blood gene expression datasets covering a wide range of diseases or physiological conditions. Our analysis shows a striking overlap of signature genes shared by multiple diseases, which is driven by the underlying common patterns of cell component change. These observations suggest the necessity to develop disease-specific classifiers that can distinguish different disease types as well as normal controls. To build such models, we develop a new classification strategy that can take into consideration of both cell component changes and cell molecular stage changes. Particularly, we deconvoluted the original gene expression profile into a cell component profile and a residual expression profile for each sample, and built classifiers based on these deconvoluted features. Testing independent datasets, we show that the classifiers with cell component profiles and residual expression profiles incorporated performed significantly better than those without. Both the assembled datasets and the algorithms developed can be found in the R package. A detailed document can be found here. The result is published in Wang et al. (Scientific Reports, 2016)

 

DeClust: A reference-free deconvolution method to infer cancer cell-intrinsic subtypes and tumor-type-specific stromal profiles

 

HBVIntegrationPipeline : A robust data analysis pipeline was developed by modifying several key steps in VirusFinder2 for identifying HBV integration sites in both DNA and RNA sequencing data.  A detailed description of the pipeline and proof-of-concept study results are published in Yoo S, Wang W, et al (BMC Medicine, 2017).

 

Multi-polynomial Temporal Genetic Association (MPTGA) and Temporal Genetic Causality Test (TGCT)
Methods to leverage both temporal and genetic information in association and causality test.

 

HBVIntegrationPipeline_single_cell: A specific pipeline for single cell HBV-HCC

 

proMODMatcher (probabibilistic Multi-omics data matching method): Data errors, including sample swapping and mis-labeling are inevitable in the process of large-scale omics data generation. Data errors need to be identified and corrected before integrative data analyses where different types of data are merged based on the annotated labels. We developed a robust probabilistic multi-omics data matching procedure, proMODMatcher, to curate data, identify and correct data annotation and errors in large databases. The proMODMathcer can be used to check potential labeling errors in profiles where the number of cis-relationships is small, such as miRNA and RPPA profiles.