Learning
on Weighted Hypergraphs to Integrate Protein
Interactions and Gene Expressions for Cancer Outcome Prediction
TAEHYUN
HWANG1, ZE TIAN1,
JEAN-PIERRE KOCHER2, AND RUI KUANG1
1Department of Computer Science and Engineering, University of Minnesota Twin Cities
2Bioinformatics Core, Mayo Clinic College of Medicine
Abstract
Building reliable models from multiple
complimentary genomic data for cancer study is a crucial step towards
successful cancer prognosis and a full understanding of the underlying
biological principles. To tackle this challenging data integration problem, we
propose a hypergraph-based machine learning algorithm
called HyperGene to integrate microarray gene
expressions and protein-protein interactions for cancer outcome prediction and
biomarker identification. HyperGene is a robust
two-step iterative method that alternatively finds the optimal outcome
prediction and the optimal weighting of the marker genes guided by a
protein-protein interaction network. Under the hypothesis that cancer-related
genes tend to interact with each other, the HyperGene
algorithm uses a protein-protein interaction network as prior knowledge by
imposing a consistent weighting of interacting genes. Our experimental results
on two large-scale breast cancer gene expression datasets show that HyperGene utilizing a curated
protein-protein interaction network achieves significantly improved cancer
outcome prediction. Moreover, HyperGene can also
retrieve many known cancer genes as highly weighted marker genes.
Supplementary Information and Source Code