Sandeep Chandana1, Henry Leung1 and Kiril Trpkov2
1Department of Electrical and Computer Engineering, University of Calgary, ICT-402, 2500 University Drive NW, Calgary, Alberta, T2N 1N4 Canada. 2Department of Pathology and Laboratory Medicine, Calgary Laboratory Services, Calgary, Alberta T2V 1P9 Canada.
A novel technique of automatically selecting the best pairs of features and sampling techniques to predict the stage of prostate cancer is proposed in this study. The problem of class imbalance, which is prominent in most medical data sets is also addressed here. Three feature subsets obtained by the use of principal components analysis (PCA), genetic algorithm (GA) and rough sets (RS) based approaches were also used in the study. The performance of under-sampling, synthetic minority over-sampling technique (SMOTE) and a combination of the two were also investigated and the performance of the obtained models was compared. To combine the classifier outputs, we used the Dempster-Shafer (DS) theory, whereas the actual choice of combined models was made using a GA. We found that the best performance for the overall system resulted from the use of under sampled data combined with rough sets based features modeled as a support vector machine (SVM).
PDF (790.53 KB PDF FORMAT)
RIS citation (ENDNOTE, REFERENCE MANAGER, PROCITE, REFWORKS)
BibTex citation (BIBDESK, LATEX)
Compared with other journals we considered for publishing, Cancer Informatics provided extremely rapid but quality turnaround from draft submission to a flawlessly typeset final publication. Moreover, sharing the article is now as easy as sharing a link with no subscriptions required, and additional code and data files are equally accessible, supporting reproducible research. Because it has published many of our references we feel confident that our target readership must follow the journal. This is further ...