Large-scale integrative network-based analysis identifies common pathways disrupted by copy number alterations across cancers

TaeHyun Hwang†1, Gowtham Atluri2, Rui Kuang2, Vipin Kumar2, Timothy Starr1, Peter M Haverty3, Zemin Zhang3, Jinfeng Liu3

1Masonic Cancer Center, 2Department of Computer Science and Engineering, University of Minnesota - Twin Cities, Minneapolis, Minnesota, United States of America; 3Department of Bioinformatics and Computational Biology, Genentech Inc., South San Francisco, California, United States of America

To whom correspondence should be addressed. E-mail: (hwang071@umn.edu, liu.jinfeng@gene.com)

Abstract

Identification of pathways that are essential to the development and progression of human cancers is one of key challenges in cancer genomics. Many large-scale cancer studies using high-throughput genomic data reported altered activities in cancer-related pathways in specific types of cancer, but these studies have not been extended to provide a comprehensive analysis of pathways disrupted by copy number alterations across different human cancers. To address this problem, we propose a network-based method to integrate copy number alteration data with human protein-protein interaction networks and pathway databases to identify pathways that are disrupted by copy number alterations from 2172 patients across 16 different types of cancer. We discovered a set of common pathways (e.g. telomerase, TGF-beta signaling, and NTRK1 signaling pathways) that are disrupted in most of cancer types, likely essential pathways in tumor formation, while these pathways cannot be readily identified by conventional overrepresentation-based and pathway-based methods. In addition, we identified pathways disrupted in a particular type(s) of cancers, suggesting molecular heterogeneity among human cancers. Additional analysis of independent microarray gene expression datasets demonstrates that commonly disrupted pathways can be used to identify patient subgroups that have significantly different survival outcomes, and thus suggest the potential for a guide to targeted therapy in a subgroup of patients. We also provide network views of disrupted pathways to explain how copy number alterations affect pathways that regulate cell growth, cycle, and differentiation for tumorigenesis.

 

Supplementary Information and Source Code

 

Source Code

·      Matlab Source Code [download]

This zip file includes matlab source code for NetPathID, and datasets (e.g. copy number alterations in each type of cancers, q-value and frequency of amplifications and deletions across 16 types of cancers, protein-protein interaction networks, pathways (Biocarta, KEGG, and Reactome) and conserved subnetwork database* in .mat files) to reproduce results reported in the paper.

Disclaimer

This software is free only for non-commercial use. It must not be distributed without prior permission of the author. The author is not responsible for implications from the use of this software.

*Conserved subnetwork dataset is obtained from S. Suthram and et. al., “Network-Based Elucidation of Human Disease Similarities Reveals Common Functional Modules Enriched for Pluripotent Drug Targets”, PLoS Computational Biology 6: e1000662. To use the conserved subnetwork dataset, please contact the author of the paper.

 

Note: Currently, NetPathID is only available for Matlab, but we will provide R implementation soon.

 

Additional Dataset

·      GISTIC results with default setting using Copy Number Alteration Dataset in 16 Cancer Types [download]

This dataset includes GISTIC results containing q-value, and frequency of amplifications and deletions across 16 types of cancers.

We downloaded segmented copy number alteration dataset from Broad Institute Tumorspace website (http://www.broadinstitute.org/tumorscape/pages/portalHome.jsf), and applied GISTIC (http://www.broadinstitute.org/igv/GISTIC) with default settings to find frequently altered copy number regions.

 

·      GISTIC results with cutoff amplification > 0.3 & deletion < -0.3 using Copy Number Alteration Dataset in 16 Cancer Types [download]

 

·      GISTIC results with cutoff amplification > 0.5 & deletion < -0.5 using Copy Number Alteration Dataset in 16 Cancer Types [download]

 

·      GISTIC results using Copy Number Alteration Dataset in Pooled Analysis [download]

This dataset includes GISTIC results using pooled analysis (e.g. put all 16 cancer types as one big dataset, and run GISTIC to find frequently altered copy number regions).

 

·      A List of Cancer-related Genes in Cancer Gene Census from the Sanger Institute (Sept. 2010 version) [download]

This xls file includes 427 known cancer related genes in cancer gene census database from the Sanger Institute used in cancer-related gene enrichment analysis.

 

Detailed Results

·      Biocarta Pathway Activity View  [download]

This pdf file includes 217 Biocrata pathway activity view across 16 types of cancers.

 

·      A List of Top 20 Disrupted Pathways from Biocarta, KEGG, and Reactome datasets in 16 types of cancers  [download]

This xls file includes top 20 disrupted pathways with adjusted p-value using BH method in 16 types of cancers.

 

·      A List of Member Genes in Commonly Disrupted Subnetworks [download]

This xls file includes 332 member genes in 42 commonly disrupted subnetworks (e.g. 332 member genes used in patient subgroup identification, functional gene set enrichment analysis, and cancer-related gene enrichment analysis).

 

·      Network View of Commonly Disrupted Subnetworks [download]

This zip file includes the network view of four commonly disrupted subnetworks that are ranked within top 2% in each cancer study, as well as present in more than 10 types of cancer studies in pdf and cys (cytoscape file format) files.

 

·      HotNet results [download]

This excel file includes the sub-network identified by HotNet method.

 

(Last update 07/02/2012)