Transfer Learning Across Cancers on DNA Copy Number Variation Analysis

Huanan Zhang1, Ze Tian2 and Rui Kuang1

1. Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, MN, 55455
2. Microsoft Corporation, Redmond, WA, 98052

ABSTRACT

DNA copy number variations (CNVs) are prevalent in all types of tumors. It is still a challenge to study how CNVs play a role in driving tumorgenic mechanisms that are either universal or specific in different cancer types. To address the problem, we introduce a transfer learning framework to discover common CNVs shared across different tumor types as well as CNVs specific to each tumor type from genomewide CNV data measured by arrayCGH and SNP genotyping array. The proposed model, namely Transfer Learning with Fused Lasso (TLFL), detects latent CNV components from multiple CNV datasets of different tumor types to distinguish the CNVs that are common across the datasets and those that are specific in each dataset. Both the common and type-specific CNVs are detected as latent components in matrix factorization coupled with fused lasso on adjacent CNV probe features. TLFL considers the common latent components underlying the multiple datasets to transfer knowledge across different tumor types. In simulations and experiments on real cancer CNV datasets, TLFL detected better latent components that can be used as features to improve classification of patient samples in each individual dataset compared with the model without the knowledge transfer. In cross-dataset analysis on bladder cancer and cross-domain analysis on breast cancer and ovarian cancer, TLFL also learned latent CNV components that are both predictive of tumor stages and correlate with known cancer genes.

Availability: coming soon

Contact: kuang@cs.umn.edu

Funding: NSF-III1117153: Small: Network Learning for Integrative Cancer Genomics; NSF-IIS1149697.

Full paper: 10 pages