Featured Publication

Predictive modelling of gene expression from transcriptional regulatory elements

Predictive modelling is a powerful in silico framework for exploring the regulatory interactions between the epigenetic and transcriptomic layers. Unlike network-based analyses that model genes as variables in an underdetermined system, these models treat genes as observations to allow statistically-significant insights to be gained regarding the complementary roles of histone modifications, transcription factors, DNA methylation and other key regulators. This study provides the first critical review of several epigenetic feature extraction and predictive modelling techniques across multiple cell lines and organisms.

Budden, D. M., Hurley, D. G., & Crampin, E. J. (2014). Predictive modelling of gene expression from transcriptional regulatory elements. *Briefings in Bioinformatics*, bbu034. doi:10.1093/bib/bbu034

####Abstract ####

Predictive modelling of gene expression provides a powerful framework for exploring the regulatory logic underpinning transcriptional regulation. Recent studies have demonstrated the utility of such models in identifying dysregulation of gene and miRNA expression associated with abnormal patterns of transcription factor (TF) binding or nucleosomal histone modifications (HMs). Despite the growing popularity of such approaches, a comparative review of the various modelling algorithms and feature extraction methods is lacking. We define and compare three methods of quantifying pairwise gene-TF/HM interactions and discuss their suitability for integrating the heterogeneous chromatin immunoprecipitation (ChIP)-seq binding patterns exhibited byTFs and HMs. We then construct log-linear and e-support vector regression models from various mouse embryonic stem cell (mESC) and human lymphoblastoid (GM12878) data sets, considering both ChIP-seq- and position weight matrix- (PWM)-derived in silico TF-binding. The two algorithms are evaluated both in terms of their modelling prediction accuracy and ability to identify the established regulatory roles of individual TFs and HMs.

Figure 1

Our results demonstrate that TF-binding and HMs are highly predictive of gene expression as measured by mRNA transcript abundance, irrespective of algorithm or cell type selection and considering both ChIP-seq and PWM-derived TF-binding. As we encourage other researchers to explore and develop these results, our framework is implemented using open-source software and made available as a preconfigured bootable virtual environment.

####How can I reproduce your results? ####

We suggest you use the Virtual Reference Environment for the project, built using Vagrant. To do this, first install Vagrant and VirtualBox, then enter the following terminal commands:

git clone https://github.com/uomsystemsbiology/budden2014predictive.git
cd budden2014predictive
vagrant up

This will download the code for this project and build a Virtual Reference Environment containing all the scripts necessary to reproduce our computational results and figures.