Large-scale integrative network-based analysis identifies common
pathways disrupted by copy number alterations across cancers
TaeHyun Hwang†1,
Gowtham Atluri2, Rui Kuang2, Vipin Kumar2, Timothy
Starr1, Peter M Haverty3, Zemin Zhang3,
Jinfeng Liu†3
1Masonic Cancer
Center, 2Department of Computer Science and Engineering, University
of Minnesota - Twin Cities, Minneapolis, Minnesota, United States of America; 3Department of Bioinformatics
and Computational Biology, Genentech Inc., South San Francisco, California,
United States of America
†
To whom correspondence should be
addressed. E-mail: (hwang071@umn.edu, liu.jinfeng@gene.com)
Identification of pathways that are
essential to the development and progression of human cancers is one of key challenges
in cancer genomics. Many large-scale cancer studies using high-throughput
genomic data reported altered activities in cancer-related pathways in specific
types of cancer, but these studies have not been extended to provide a
comprehensive analysis of pathways disrupted by copy number alterations across
different human cancers. To address this problem, we propose a network-based
method to integrate copy number alteration data with human protein-protein
interaction networks and pathway databases to identify pathways that are
disrupted by copy number alterations from 2172 patients across 16 different
types of cancer. We discovered a set of common pathways (e.g. telomerase,
TGF-beta signaling, and NTRK1 signaling pathways) that are disrupted in most of
cancer types, likely essential pathways in tumor formation, while these
pathways cannot be readily identified by conventional overrepresentation-based
and pathway-based methods. In addition, we identified pathways disrupted in a
particular type(s) of cancers, suggesting molecular heterogeneity among human
cancers. Additional analysis of independent microarray gene expression datasets
demonstrates that commonly disrupted pathways can be used to identify patient
subgroups that have significantly different survival outcomes, and thus suggest
the potential for a guide to targeted therapy in a subgroup of patients. We
also provide network views of disrupted pathways to explain how copy number
alterations affect pathways that regulate cell growth, cycle, and differentiation
for tumorigenesis.
Supplementary Information and Source
Code
Source Code
· Matlab Source Code [download]
This zip file includes matlab source code for NetPathID,
and datasets (e.g. copy number alterations in each type of cancers, q-value and
frequency of amplifications and deletions across 16 types of cancers,
protein-protein interaction networks, pathways (Biocarta,
KEGG, and Reactome) and conserved subnetwork
database* in .mat files) to reproduce results reported in the paper.
Disclaimer
This
software is free only for non-commercial use. It must not be distributed
without prior permission of the author. The author is not responsible for
implications from the use of this software.
*Conserved
subnetwork dataset is obtained from S. Suthram and et. al.,
“Network-Based Elucidation of Human Disease Similarities Reveals Common
Functional Modules Enriched for Pluripotent Drug Targets”, PLoS
Computational Biology 6: e1000662. To use the conserved subnetwork
dataset, please contact the author of the paper.
Note:
Currently, NetPathID is only available for Matlab, but we will provide R implementation soon.
Additional Dataset
· GISTIC results with default setting using Copy Number Alteration
Dataset in 16 Cancer Types [download]
This dataset includes GISTIC results
containing q-value, and frequency of amplifications and deletions across 16
types of cancers.
We downloaded segmented copy number
alteration dataset from Broad Institute Tumorspace
website (http://www.broadinstitute.org/tumorscape/pages/portalHome.jsf),
and applied GISTIC (http://www.broadinstitute.org/igv/GISTIC)
with default settings to find frequently altered copy number regions.
· GISTIC results with cutoff amplification > 0.3 &
deletion < -0.3 using Copy Number Alteration Dataset in 16 Cancer Types [download]
· GISTIC results with cutoff amplification > 0.5 &
deletion < -0.5 using Copy Number Alteration Dataset in 16 Cancer Types [download]
· GISTIC results using Copy Number Alteration Dataset in
Pooled Analysis [download]
This dataset includes GISTIC results
using pooled analysis (e.g. put all 16 cancer types as one big dataset, and run
GISTIC to find frequently altered copy number regions).
· A List of Cancer-related Genes in Cancer Gene Census from
the Sanger Institute (Sept. 2010 version) [download]
This xls
file includes 427 known cancer related genes in cancer gene census database
from the Sanger Institute used in cancer-related gene enrichment analysis.
Detailed Results
· Biocarta Pathway Activity View
[download]
This pdf
file includes 217 Biocrata pathway activity view
across 16 types of cancers.
· A List of Top 20 Disrupted Pathways from Biocarta,
KEGG, and Reactome datasets in 16 types of
cancers [download]
This xls
file includes top 20 disrupted pathways with adjusted p-value using BH method
in 16 types of cancers.
· A List of Member Genes in Commonly Disrupted Subnetworks [download]
This xls
file includes 332 member genes in 42 commonly disrupted subnetworks
(e.g. 332 member genes used in patient subgroup identification, functional gene
set enrichment analysis, and cancer-related gene enrichment analysis).
· Network View of Commonly Disrupted Subnetworks
[download]
This zip file includes the network
view of four commonly disrupted subnetworks that are
ranked within top 2% in each cancer study, as well as present in more than 10
types of cancer studies in pdf and cys (cytoscape file format)
files.
· HotNet results [download]
This excel file includes the
sub-network identified by HotNet method.
(Last
update 07/02/2012)