Publications – Kuang Lab

2017

Zhang, Wei; Chien, Jeremy; Yong, Jeongsik; Kuang, Rui

Network-based Machine Learning and Graph Theory Algorithms for Precision Oncology Journal Article

In: NPJ Precision Oncology, no. 25, 2017.

Abstract | Links | BibTeX | Tags: Cancer Genomics, Network-based Learning, Phenome-genome Association, Protein-Protein Interaction Network, Semi-supervised Learning

2015

Zhang, Wei; Chang, Jae-Woong; Lin, Lilong; Minn, Kay; Wu, Baolin; Chien, Jeremy; Yong, Jeongsik; Zheng, Hui; Kuang, Rui

Network-based Isoform Quantification with RNA-Seq Data for Cancer Transcriptome Analysis Journal Article

In: PLoS Computational Biology, vol. e1004465, 2015.

Abstract | Links | BibTeX | Tags: Cancer Genomics, Isoform Quantification, Network-based Learning

Chien, Jeremy; Sicotte, Hugues; Fan, Jian-Bing; Humphray, Sean; Cunningham, Julie M; Kalli, Kimberly R; Oberg, Ann L; Hart, Steven N; Li, Ying; Davila, Jaime I; others,

TP53 mutations, tetraploidy and homologous recombination repair defects in early stage high-grade serous ovarian cancer Journal Article

In: Nucleic acids research, pp. gkv111, 2015.

Abstract | Links | BibTeX | Tags: Cancer Genomics

Johnson, Nicholas; Zhang, Huanan; Fang, Gang; Kumar, Vipin; Kuang, Rui

SubPatCNV: approximate subspace pattern mining for mapping copy-number variations Journal Article

In: BMC bioinformatics, vol. 16, no. 1, pp. 1, 2015, ISSN: 1471-2105.

Abstract | Links | BibTeX | Tags: Cancer Genomics, DNA Copy Number Variation

@article{johnson2015subpatcnv,

title = {SubPatCNV: approximate subspace pattern mining for mapping copy-number variations},

author = {Nicholas Johnson and Huanan Zhang and Gang Fang and Vipin Kumar and Rui Kuang},

url = {https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-014-0426-7},

doi = {10.1186/s12859-014-0426-7},

issn = {1471-2105},

year  = {2015},

date = {2015-01-16},

journal = {BMC bioinformatics},

volume = {16},

number = {1},

pages = {1},

publisher = {BioMed Central},

abstract = {Background 

Many DNA copy-number variations (CNVs) are known to lead to phenotypic variations and pathogenesis. While CNVs are often only common in a small number of samples in the studied population or patient cohort, previous work has not focused on customized identification of CNV regions that only exhibit in subsets of samples with advanced data mining techniques to reliably answer questions such as “Which are all the chromosomal fragments showing nearly identical deletions or insertions in more than 30% of the individuals?”. 

 

Results 

We introduce a tool for mining CNV subspace patterns, namely SubPatCNV, which is capable of identifying all aberrant CNV regions specific to arbitrary sample subsets larger than a support threshold. By design, SubPatCNV is the implementation of a variation of approximate association pattern mining algorithm under a spatial constraint on the positional CNV probe features. In benchmark test, SubPatCNV was applied to identify population specific germline CNVs from four populations of HapMap samples. In experiments on the TCGA ovarian cancer dataset, SubPatCNV discovered many large aberrant CNV events in patient subgroups, and reported regions enriched with cancer relevant genes. In both HapMap data and TCGA data, it was observed that SubPatCNV employs approximate pattern mining to more effectively identify CNV subspace patterns that are consistent within a subgroup from high-density array data. 

 

Conclusions 

SubPatCNV available through http://sourceforge.net/projects/subpatcnv/is a unique scalable open-source software tool that provides the flexibility of identifying CNV regions specific to sample subgroups of different sizes from high-density CNV array data.},

keywords = {Cancer Genomics, DNA Copy Number Variation},

pubstate = {published},

tppubtype = {article}

}

2013

Zhang, Huanan; Tian, Ze; Kuang, Rui

Transfer learning across cancers on DNA copy number variation analysis Proceedings Article

In: 2013 IEEE 13th International Conference on Data Mining, pp. 1283–1288, IEEE IEEE, 2013, ISBN: 978-0-7695-5108-1.

Abstract | Links | BibTeX | Tags: Cancer Genomics, DNA Copy Number Variation, Transfer Learning

@inproceedings{zhang2013transfer,

title = {Transfer learning across cancers on DNA copy number variation analysis},

author = {Huanan Zhang and Ze Tian and Rui Kuang},

url = {http://compbio.cs.umn.edu/wp-content/uploads/2017/10/TLFL-10Page.pdf},

doi = {10.1109/ICDM.2013.58},

isbn = {978-0-7695-5108-1},

year  = {2013},

date = {2013-12-07},

booktitle = {2013 IEEE 13th International Conference on Data Mining},

pages = {1283--1288},

publisher = {IEEE},

organization = {IEEE},

abstract = {Abstract: 

DNA copy number variations (CNVs) are prevalent in all types of tumors. It is still a challenge to study how CNVs play a role in driving tumorgenic mechanisms that are either universal or specific in different cancer types. To address the problem, we introduce a transfer learning framework to discover common CNVs shared across different tumor types as well as CNVs specific to each tumor type from genome-wide CNV data measured by array CGH and SNP genotyping array. The proposed model, namely Transfer Learning with Fused LASSO (TLFL), detects latent CNV components from multiple CNV datasets of different tumor types to distinguish the CNVs that are common across the datasets and those that are specific in each dataset. Both the common and type-specific CNVs are detected as latent components in matrix factorization coupled with fused LASSO on adjacent CNV probe features. TLFL considers the common latent components underlying the multiple datasets to transfer knowledge across different tumor types. In simulations and experiments on real cancer CNV datasets, TLFL detected better latent components that can be used as features to improve classification of patient samples in each individual dataset compared with the model without the knowledge transfer. In cross-dataset analysis on bladder cancer and cross-domain analysis on breast cancer and ovarian cancer, TLFL also learned latent CNV components that are both predictive of tumor stages and correlate with known cancer genes.},

keywords = {Cancer Genomics, DNA Copy Number Variation, Transfer Learning},

pubstate = {published},

tppubtype = {inproceedings}

}

Chien, Jeremy; Kuang, Rui; Landen, Charles; Shridhar, Viji

Platinum-sensitive recurrence in ovarian cancer: the role of tumor microenvironment Journal Article

In: Frontiers in oncology, vol. 3, pp. 251, 2013.

Abstract | Links | BibTeX | Tags: Cancer Genomics

Hwang, TaeHyun; Atluri, Gowtham; Kuang, Rui; Kumar, Vipin; Starr, Timothy; Silverstein, Kevin AT; Haverty, Peter M; Zhang, Zemin; Liu, Jinfeng

Large-scale integrative network-based analysis identifies common pathways disrupted by copy number alterations across cancers Journal Article

In: BMC genomics, vol. 14, no. 1, pp. 440, 2013.

Abstract | Links | BibTeX | Tags: Cancer Genomics, DNA Copy Number Variation, Network-based Learning

Zhang, Wei; Ota, Takayo; Shridhar, Viji; Chien, Jeremy; Wu, Baolin; Kuang, Rui

Network-based survival analysis reveals subnetwork signatures for predicting outcomes of ovarian cancer treatment Journal Article

In: PLoS Comput Biol, vol. 9, no. 3, pp. e1002975, 2013.

Abstract | Links | BibTeX | Tags: Cancer Genomics, Network-based Learning, Survival Analysis, Transcriptome

@article{zhang2013network,

title = {Network-based survival analysis reveals subnetwork signatures for predicting outcomes of ovarian cancer treatment},

author = {Wei Zhang and Takayo Ota and Viji Shridhar and Jeremy Chien and Baolin Wu and Rui Kuang},

url = {http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002975},

doi = {10.1371/journal.pcbi.1002975},

year  = {2013},

date = {2013-03-21},

journal = {PLoS Comput Biol},

volume = {9},

number = {3},

pages = {e1002975},

publisher = {Public Library of Science},

abstract = {Cox regression is commonly used to predict the outcome by the time to an event of interest and in addition, identify relevant features for survival analysis in cancer genomics. Due to the high-dimensionality of high-throughput genomic data, existing Cox models trained on any particular dataset usually generalize poorly to other independent datasets. In this paper, we propose a network-based Cox regression model called Net-Cox and applied Net-Cox for a large-scale survival analysis across multiple ovarian cancer datasets. Net-Cox integrates gene network information into the Cox's proportional hazard model to explore the co-expression or functional relation among high-dimensional gene expression features in the gene network. Net-Cox was applied to analyze three independent gene expression datasets including the TCGA ovarian cancer dataset and two other public ovarian cancer datasets. Net-Cox with the network information from gene co-expression or functional relations identified highly consistent signature genes across the three datasets, and because of the better generalization across the datasets, Net-Cox also consistently improved the accuracy of survival prediction over the Cox models regularized by L1-norm or L2-norm. This study focused on analyzing the death and recurrence outcomes in the treatment of ovarian carcinoma to identify signature genes that can more reliably predict the events. The signature genes comprise dense protein-protein interaction subnetworks, enriched by extracellular matrix receptors and modulators or by nuclear signaling components downstream of extracellular signal-regulated kinases. In the laboratory validation of the signature genes, a tumor array experiment by protein staining on an independent patient cohort from Mayo Clinic showed that the protein expression of the signature gene FBN1 is a biomarker significantly associated with the early recurrence after 12 months of the treatment in the ovarian cancer patients who are initially sensitive to chemotherapy. Net-Cox toolbox is available at http://localhost/~raphaelpetegrosso/wpcb/Net-Cox/.},

keywords = {Cancer Genomics, Network-based Learning, Survival Analysis, Transcriptome},

pubstate = {published},

tppubtype = {article}

}

Cox regression is commonly used to predict the outcome by the time to an event of interest and in addition, identify relevant features for survival analysis in cancer genomics. Due to the high-dimensionality of high-throughput genomic data, existing Cox models trained on any particular dataset usually generalize poorly to other independent datasets. In this paper, we propose a network-based Cox regression model called Net-Cox and applied Net-Cox for a large-scale survival analysis across multiple ovarian cancer datasets. Net-Cox integrates gene network information into the Cox's proportional hazard model to explore the co-expression or functional relation among high-dimensional gene expression features in the gene network. Net-Cox was applied to analyze three independent gene expression datasets including the TCGA ovarian cancer dataset and two other public ovarian cancer datasets. Net-Cox with the network information from gene co-expression or functional relations identified highly consistent signature genes across the three datasets, and because of the better generalization across the datasets, Net-Cox also consistently improved the accuracy of survival prediction over the Cox models regularized by L1-norm or L2-norm. This study focused on analyzing the death and recurrence outcomes in the treatment of ovarian carcinoma to identify signature genes that can more reliably predict the events. The signature genes comprise dense protein-protein interaction subnetworks, enriched by extracellular matrix receptors and modulators or by nuclear signaling components downstream of extracellular signal-regulated kinases. In the laboratory validation of the signature genes, a tumor array experiment by protein staining on an independent patient cohort from Mayo Clinic showed that the protein expression of the signature gene FBN1 is a biomarker significantly associated with the early recurrence after 12 months of the treatment in the ovarian cancer patients who are initially sensitive to chemotherapy. Net-Cox toolbox is available at http://localhost/~raphaelpetegrosso/wpcb/Net-Cox/.

2010

Zhang, Wei; Hwang, Baryun; Wu, Baolin; Kuang, Rui

Network propagation models for gene selection Proceedings Article

In: 2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS), IEEE, 2010, ISBN: 978-1-61284-791-7.

Abstract | Links | BibTeX | Tags: Cancer Genomics, Gene Expression, Semi-supervised Learning

2009

Gupta, Rohit; Agrawal, Smita; Rao, Navneet; Tian, Ze; Kuang, Rui; Kumar, Vipin

Integrative Biomarker Discovery for Breast Cancer Metastasis from Gene Expression and Protein Interaction Data Using Error-tolerant Pattern Mining Proceedings Article

In: Citeseer, 2009.

Abstract | Links | BibTeX | Tags: Cancer Genomics, Gene Expression

2008

Hwang, TaeHyun; Tian, Ze; Kuang, Rui; Kocher, Jean-Pierre

Learning on weighted hypergraphs to integrate protein interactions and gene expressions for cancer outcome prediction Proceedings Article

In: 2008 Eighth IEEE International Conference on Data Mining, pp. 293–302, IEEE 2008, ISBN: 978-0-7695-3502-9.

Abstract | Links | BibTeX | Tags: Cancer Genomics, Gene Expression, Protein-Protein Interaction Network, Semi-supervised Learning

Hwang, TaeHyun; Kuang, Rui

A Comparative Study of Breast Cancer Microarray Gene Expression Profiles using Label Propagation Proceedings Article

In: Proceedings of the Workshop on Data Mining for Biomedical Informatics, held in conjunction with SIAM International Conference on Data Mining (SDM), 2008.

Abstract | Links | BibTeX | Tags: Cancer Genomics, Semi-supervised Learning

Hwang, TaeHyun; Sicotte, Hugues; Tian, Ze; Wu, Baolin; Kocher, Jean-Pierre; Wigle, Dennis A; Kumar, Vipin; Kuang, Rui

Robust and efficient identification of biomarkers by classifying features on graphs Journal Article

In: Bioinformatics, vol. 24, no. 18, pp. 2023–2029, 2008, ISBN: 1460-2059.

Abstract | Links | BibTeX | Tags: Cancer Genomics, Gene Expression, Semi-supervised Learning

@article{hwang2008robustb,

title = {Robust and efficient identification of biomarkers by classifying features on graphs},

author = {Hwang, TaeHyun and Sicotte, Hugues and Tian, Ze and Wu, Baolin and Kocher, Jean-Pierre and Wigle, Dennis A and Kumar, Vipin and Kuang, Rui},

url = {http://bioinformatics.oxfordjournals.org/content/24/18/2023.short},

doi = {10.1093/bioinformatics/btn383},

isbn = {1460-2059},

year  = {2008},

date = {2008-01-01},

journal = {Bioinformatics},

volume = {24},

number = {18},

pages = {2023--2029},

publisher = {Oxford Univ Press},

abstract = {Motivation: A central problem in biomarker discovery from large-scale gene expression or single nucleotide polymorphism (SNP) data is the computational challenge of taking into account the dependence among all the features. Methods that ignore the dependence usually identify non-reproducible biomarkers across independent datasets. We introduce a new graph-based semi-supervised feature classification algorithm to identify discriminative disease markers by learning on bipartite graphs. Our algorithm directly classifies the feature nodes in a bipartite graph as positive, negative or neutral with network propagation to capture the dependence among both samples and features (clinical and genetic variables) by exploring bi-cluster structures in a graph. Two features of our algorithm are: (1) our algorithm can find a global optimal labeling to capture the dependence among all the features and thus, generates highly reproducible results across independent microarray or other high-thoughput datasets, (2) our algorithm is capable of handling hundreds of thousands of features and thus, is particularly useful for biomarker identification from high-throughput gene expression and SNP data. In addition, although designed for classifying features, our algorithm can also simultaneously classify test samples for disease prognosis/diagnosis. 

 

Results: We applied the network propagation algorithm to study three large-scale breast cancer datasets. Our algorithm achieved competitive classification performance compared with SVMs and other baseline methods, and identified several markers with clinical or biological relevance with the disease. More importantly, our algorithm also identified highly reproducible marker genes and enriched functions from the independent datasets. 

 

Availability: Supplementary results and source code are available at http://localhost/~raphaelpetegrosso/wpcb/Feature_Class.},

keywords = {Cancer Genomics, Gene Expression, Semi-supervised Learning},

pubstate = {published},

tppubtype = {article}

}