Learning on Weighted Hypergraphs to Integrate Protein Interactions and Gene Expressions for Cancer Outcome Prediction

 

TAEHYUN HWANG1, ZE TIAN1,  JEAN-PIERRE KOCHER2, AND RUI KUANG1

 

1Department of Computer Science and Engineering, University of Minnesota Twin Cities

2Bioinformatics Core, Mayo Clinic College of Medicine


Abstract

 

Building reliable models from multiple complimentary genomic data for cancer study is a crucial step towards successful cancer prognosis and a full understanding of the underlying biological principles. To tackle this challenging data integration problem, we propose a hypergraph-based machine learning algorithm called HyperGene to integrate microarray gene expressions and protein-protein interactions for cancer outcome prediction and biomarker identification. HyperGene is a robust two-step iterative method that alternatively finds the optimal outcome prediction and the optimal weighting of the marker genes guided by a protein-protein interaction network. Under the hypothesis that cancer-related genes tend to interact with each other, the HyperGene algorithm uses a protein-protein interaction network as prior knowledge by imposing a consistent weighting of interacting genes. Our experimental results on two large-scale breast cancer gene expression datasets show that HyperGene utilizing a curated protein-protein interaction network achieves significantly improved cancer outcome prediction. Moreover, HyperGene can also retrieve many known cancer genes as highly weighted marker genes.


Full Paper [PDF]

Supplementary Information and Source Code

Compbio Home