This study aims to describe a model that will apply image processing and traditional machine learning techniques specifically Support Vector Machines, Naïve-Bayes, and k-Nearest Neighbors to identify whether or not a given breast histopathological image has Invasive Ductal Carcinoma (IDC). The dataset consisted of 54,811 breast cancer image patches of size 50px x 50px, consisting of 39,148 IDC negative and 15,663 IDC positive. Feature extraction was accomplished using Oriented FAST and Rotated BRIEF (ORB) descriptors. Feature scaling was performed using Min-Max Normalization while K-Means Clustering on the ORB descriptors was used to generate the visual codebook. Automatic hyperparameter tuning using Grid Search Cross Validation was implemented although it can also accept user supplied hyperparameter values for SVM, Naïve Bayes, and K-NN models should the user want to do experimentation. Aside from computing for accuracy, the AUPRC and MCC metrics were used to address the dataset imbalance. The results showed that SVM has the best overall performance, obtaining accuracy = 0.7490, AUPRC = 0.5536, and MCC = 0.2924.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com