Featured Publication

Modelling the conditional regulatory activity of methylated and bivalent promoters

This project builds upon our previous work modelling gene expression from epigenetic data to model 'conditional' regulatory activity; we also provide the first genome-wide integration of DNA methylation bisulfite-sequencing data within such a model to substantially improve prediction accuracy and the reliability of downstream biological inference.

Budden DM, Hurley DG, Crampin EJ. Modelling the conditional regulatory activity of methylated and bivalent promoters. Epigenetics Chromatin. 2015;8(1):21. doi:10.1186/s13072-015-0013-9

####Abstract ####

Predictive modelling of gene expression is a powerful framework for the in silico exploration of transcriptional regulatory interactions through the integration of high-throughput -omics data. A major limitation of previous approaches is their inability to handle conditional interactions that emerge when genes are subject to different regulatory mechanisms. Although chromatin immunoprecipitation-based histone modification data are often used as proxies for chromatin accessibility, the association between these variables and expression often depends upon the presence of other epigenetic markers (e.g. DNA methylation or histone variants). These conditional interactions are poorly handled by previous predictive models and reduce the reliability of downstream biological inference.

####Results ####

We have previously demonstrated that integrating both transcription factor and histone modification data within a single predictive model is rendered ineffective by their statistical redundancy. In this study, we evaluate four proposed methods for quantifying gene-level DNA methylation levels and demonstrate that inclusion of these data in predictive modelling frameworks is also subject to this critical limitation in data integration. Based on the hypothesis that statistical redundancy in epigenetic data is caused by conditional regulatory interactions within a dynamic chromatin context, we construct a new gene expression model which is the first to improve prediction accuracy by unsupervised identification of latent regulatory classes. We show that DNA methylation and H2A.Z histone variant data can be interpreted in this way to identify and explore the signatures of silenced and bivalent promoters, substantially improving genome-wide predictions of mRNA transcript abundance and downstream biological inference across multiple cell lines.

Figure 1

####Conclusions ####

Previous models of gene expression have been applied successfully to several important problems in molecular biology, including the discovery of transcription factor roles, identification of regulatory elements responsible for differential expression patterns and comparative analysis of the transcriptome across distant species. Our analysis supports our hypothesis that statistical redundancy in epigenetic data is partially due to conditional relationships between these regulators and gene expression levels. This analysis provides insight into the heterogeneous roles of H3K4me3 and H3K27me3 in the presence of the H2A.Z histone variant (implicated in cancer progression) and how these signatures change during lineage commitment and carcinogenesis.

####How can I reproduce your results? ####

We suggest you use the Virtual Reference Environment for the project, built using Vagrant. To do this, first install Vagrant and VirtualBox, then enter the following terminal commands:

git clone https://github.com/uomsystemsbiology/budden2015treeome.git
cd budden2015treeome
vagrant up

This will download the code for this project and build a Virtual Reference Environment containing all the scripts necessary to reproduce our computational results and figures.