Background: DNA copy-number variations (CNVs) are genome aberrations that could disrupt
normal biological functions and lead to abnormal cell growth and tumor genesis.
Identifying causal copy-number variations of cancer is an important
step in understanding the molecular mechanisms of cancer and developing effective
treatment. Existing copy-number variation discovery methods are statistical approaches
based on calculating a positional summary statistic for the copy-number variation across
all patients in the dataset and thus tend to miss large aberrant copy-number variation
regions in patient subsets. Little previous work has focused on customized identification of copy-number
variations that only exhibit in subsets of patients.
Results: We introduce a tool for mining CNV subspace patterns
(SubPatCNV), which is able to identify all aberrant CNV regions
specific to arbitrary patient subsets larger than a support threshold. SubPatCNV is an approximate association
pattern mining algorithm under a spatial constraint on the positional CNV probe
features. In the experiments on a large-scale bladder cancer dataset,
SubPatCNV discovered many large aberrant CNV events in patient subgroups
and also reported CNV regions highly specific
to clinical variables such as tumor grade or stage and enriched with more known oncogenes
compared with other existing CNV discovery methods.
Conclusions: Identifying causal CNVs
driving cancer development is a difficult problem.
SubPatCNV is an easy to use, open-source software tool that provides the flexibility of identifying aberrant copy-number variation
regions specific to patient subgroups of different sizes.