Combining Clustering and Classification Ensembles: A Novel Pipeline to Identify Breast Cancer Profiles

Agrawal, U., Soria, D., Wagner, C., Garibaldi, J.M., Ellis, I.O., Bartlett, J.M.S., Cameron, D., Rakha, E.A. and Green, A.R. 2019. Combining Clustering and Classification Ensembles: A Novel Pipeline to Identify Breast Cancer Profiles. Artificial Intelligence in Medicine. 97, pp. 27-37. https://doi.org/10.1016/j.artmed.2019.05.002

TitleCombining Clustering and Classification Ensembles: A Novel Pipeline to Identify Breast Cancer Profiles
TypeJournal article
AuthorsAgrawal, U.
Soria, D.
Wagner, C.
Garibaldi, J.M.
Ellis, I.O.
Bartlett, J.M.S.
Cameron, D.
Rakha, E.A.
Green, A.R.
Abstract

Breast Cancer is one of the most common causes of cancer death in women, representing a very complex disease with varied molecular alterations. To assist breast cancer prognosis, the classification of patients into biological groups is of great significance for treatment strategies. Recent studies have used an ensemble of multiple clustering algorithms to elucidate the most characteristic biological groups of breast cancer. However, the combination of various clustering methods resulted in a number of patients remaining unclustered. Therefore, a framework still needs to be developed which can assign as many unclustered (i.e. biologically diverse) patients to one of the identified groups in order to improve classification. Therefore, in this paper we develop a novel classification framework which introduces a new ensemble classification stage after the ensemble clustering stage to target the unclustered patients. Thus, a step-by-step pipeline is introduced which couples ensemble clustering with ensemble classification for the identification of core groups, data distribution in them and improvement in final classification results by targeting the unclustered data. The proposed pipeline is employed on a novel real world breast cancer dataset and subsequently its robustness and stability are examined by testing it on standard datasets. The results show that by using the presented framework, an improved classification is obtained. Finally, the results have been verified using statistical tests, visualisation techniques, cluster quality assessment and interpretation from clinical experts.

KeywordsEnsemble Clustering; Ensemble Classification; Class level fusion; Refining cluster results; Breast Cancer; Pipeline
JournalArtificial Intelligence in Medicine
Journal citation97, pp. 27-37
ISSN0933-3657
Year2019
PublisherElsevier
Accepted author manuscript
Digital Object Identifier (DOI)https://doi.org/10.1016/j.artmed.2019.05.002
Publication dates
Published online15 May 2019
Published in printJun 2019
LicenseCC BY-NC-ND 4.0

Related outputs

A systematic review of the applications of Expert Systems (ES) and machine learning (ML) in clinical urology.
Salem, H., Soria, D., Lund, J. and Awwad, A. 2021. A systematic review of the applications of Expert Systems (ES) and machine learning (ML) in clinical urology. BMC Medical Informatics and Decision Making. 21 (1) 223. https://doi.org/10.1186/s12911-021-01585-9

Machine Learning Prediction of Susceptibility to Visceral Fat Associated Diseases
Aldraimli, M., Soria, D., Parkinson, J., Thomas, E.L., Bell, J.D., Dwek, M. and Chaussalet, T.J. 2020. Machine Learning Prediction of Susceptibility to Visceral Fat Associated Diseases. Health and Technology. 10, pp. 925-944. https://doi.org/10.1007/s12553-020-00446-1

Machine Learning Classification of Females Susceptibility to Visceral Fat Associated Diseases
Aldraimli, M., Soria, D., Parkinson, J., Whitcher, B., Thomas, E.L., Bell, J.D., Chaussalet, T.J. and Dwek, M. 2019. Machine Learning Classification of Females Susceptibility to Visceral Fat Associated Diseases. MEDICON 2019: XV Mediterranean Conference on Medical and Biological Engineering and Computing. Coimbra, Portugal 26 - 28 Sep 2019 Springer. https://doi.org/10.1007/978-3-030-31635-8_81

Fuzzy Integral Driven Ensemble Classification using A Priori Fuzzy Measures
Agrawal, U., Wagner, C., Garibaldi, J.M. and Soria, D. 2019. Fuzzy Integral Driven Ensemble Classification using A Priori Fuzzy Measures. International Conference on Fuzzy Systems (FUZZ-IEEE 2019). 23 - 26 Jun 2019 IEEE . https://doi.org/10.1109/FUZZ-IEEE.2019.8858821

The combined expression of solute carriers is associated with a poor prognosis in highly proliferative ER+ breast cancer
El Ansari, R., Craze, M.L., Alfarsi, L., Soria, D., Diez-Rodriguez, M., Nolan, C.C., Ellis, I.O., Rakha, E.A. and Green, A.R. 2019. The combined expression of solute carriers is associated with a poor prognosis in highly proliferative ER+ breast cancer. Breast Cancer Research and Treatment. 175 (1), pp. 27-38. https://doi.org/10.1007/s10549-018-05111-w

Identifying Heavy Goods Vehicle Driving Styles in the United Kingdom
Figueredo, G.P., Agrawal, U., Mase, J.M.M., Mesgarpour, M., Wagner, C., Soria, D., Garibaldi, J.M., Siebers, P.O. and John, R.I. 2019. Identifying Heavy Goods Vehicle Driving Styles in the United Kingdom. IEEE Transactions on Intelligent Transportation Systems. 20 (9), pp. 3324-3336. https://doi.org/10.1109/TITS.2018.2875343

An End-to-End Deep Learning Histochemical Scoring System for Breast Cancer TMA
Liu, J., Xu, B., Zheng, C., Gong, Y., Garibaldi, J.M., Soria, D., Green, A., Ellis, I.O., Zou, W. and Qiu, G. 2019. An End-to-End Deep Learning Histochemical Scoring System for Breast Cancer TMA. IEEE Transactions on Medical Imaging. 38 (2), pp. 617-628. https://doi.org/10.1109/TMI.2018.2868333

Interpretability and Complexity of Design in the Creation of Fuzzy Logic Systems - A User Study
Razak, T.R., Garibaldi, J.M., Wagner, C., Pourabdollah, A. and Soria, D. 2018. Interpretability and Complexity of Design in the Creation of Fuzzy Logic Systems - A User Study. 2018 IEEE Symposium Series on Computational Intelligence. Bengaluru, India 18 - 21 Nov 2018 IEEE . https://doi.org/10.1109/SSCI.2018.8628924

Comparison of Fuzzy Integral-Fuzzy Measure based Ensemble Algorithms with the State-of-the-art Ensemble Algorithms
Agrawal, U., Pinar, A.J., Wagner, C., Havens, T.C., Soria, D. and Garibaldi, J.M. 2018. Comparison of Fuzzy Integral-Fuzzy Measure based Ensemble Algorithms with the State-of-the-art Ensemble Algorithms. Medina, J., Ojeda-Aciego, M., Verdegay, J.L., Perfilieva, I., Bouchon-Meunier, B. and Yager, R.R. (ed.) 17th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems. Cadiz, Spain 11 - 15 Jun 2018 Springer. https://doi.org/10.1007/978-3-319-91479-4

Development of a pre-operative scoring system for predicting risk of post-operative paediatric cerebellar mutism syndrome
Liu, J.-F., Dineen, R.A., Avula, S., Chambers, T., Dutta, M., Jaspan, T., MacArthur, D.C., Howarth, S., Soria, D., Quinlan, P., Harave, S., Ong, C.C., Mallucci, C.L., Kumar, R., Pizer, B. and Walker, D.A. 2018. Development of a pre-operative scoring system for predicting risk of post-operative paediatric cerebellar mutism syndrome. British Journal of Neurosurgery. 32 (1), pp. 18-27. https://doi.org/10.1080/02688697.2018.1431204

A multicentre integration of a computer-led follow-up of prostate cancer is valid and safe
Salem, H., Caddeo, G., McFarlane, J., Patel, K., Cochrane, L., Soria, D., Henley, M. and Lund, J. 2018. A multicentre integration of a computer-led follow-up of prostate cancer is valid and safe. BJU international. 122 (3), pp. 418-426 BJU14157. https://doi.org/10.1111/bju.14157

MYC regulation of Glutamine-Proline regulatory axis is key in Luminal B breast cancer
Craze, M.L., Cheung, H., Jewa, N., Coimbra, N.D.M., Soria, D., El-Ansari, R., Aleskandarany, M.A., Cheng, K.W., Diez-Rodriguez, M., Nolan, C.C., Ellis, I.O., Rakha, E. and Green, A.R. 2018. MYC regulation of Glutamine-Proline regulatory axis is key in Luminal B breast cancer. British Journal of Cancer. 118 (2), pp. 258-265. https://doi.org/10.1038/bjc.2017.387

Interpretability indices for hierarchical fuzzy systems
Razak, T.R., Garibaldi, J.M., Wagner, C., Pourabdollah, A. and Soria, D. 2017. Interpretability indices for hierarchical fuzzy systems. IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2017). Naples, Italy 09 - 12 Jul 2017 IEEE . https://doi.org/10.1109/FUZZ-IEEE.2017.8015616

Validation of a quantifier-based fuzzy classification system for breast cancer patients on external independent cohorts
Soria, D. and Garibaldi, J.M. 2016. Validation of a quantifier-based fuzzy classification system for breast cancer patients on external independent cohorts. IEEE International Conference on Machine Learning and Applications (ICMLA2016). Anaheim, California, USA 18 - 20 Dec 2016 IEEE . https://doi.org/10.1109/ICMLA.2016.0101

Nottingham Prognostic Index Plus: Validation of a clinical decision making tool in breast cancer in an independent series
Green, A.R., Soria, D., Stephen, J., Powe, D.G., Nolan, C.C., Kunkler, I., Thomas, J., Kerr, G.R., Jack, W., Cameron, D., Piper, T., Ball, G.R., Garibaldi, J.M., Rakha, E.A., Bartlett, J.M.S. and Ellis, I.O. 2016. Nottingham Prognostic Index Plus: Validation of a clinical decision making tool in breast cancer in an independent series. The Journal of Pathology: Clinical Research. 1 (2), pp. 32-40. https://doi.org/10.1002/cjp2.32

Illness Beliefs Predict Mortality in Patients with Diabetic Foot Ulcers
Vedhara, K., Dawe, K., Miles, J.N.V., Wetherell, M.A., Cullum, N., Dayan, C., Drake, N., Price, P., Tarlton, J., Weinman, J., Day, A., Campbell, R., Reps, J. and Soria, D. 2016. Illness Beliefs Predict Mortality in Patients with Diabetic Foot Ulcers. PLoS ONE. 11 (4) e0153315. https://doi.org/10.1371/journal.pone.0153315

Nottingham prognostic index plus (NPI+) predicts risk of distant metastases in primary breast cancer
Green, A.R., Soria, D., Powe, G., Nolan, C.C., Aleskandarany, N.M., Szász, M.A., Tőkés, A.M., Ball, G.R., Garibaldi, J.M., Rakha, E.A., Kulka, J. and Ellis, I.O. 2016. Nottingham prognostic index plus (NPI+) predicts risk of distant metastases in primary breast cancer. Breast Cancer Research and Treatment. 157 (1), pp. 65-75. https://doi.org/10.1007/s10549-016-3804-1

KI67 and DLX2 predict increased risk of metastasis formation in prostate cancer–a targeted molecular approach
Green, W.J.F., Ball, G., Hulman, G., Johnson, C., Van Schalwyk, G., Ratan, H.L., Soria, D., Garibaldi, J.M., Parkinson, R., Hulman, J., Rees, R. and Powe, D.G. 2016. KI67 and DLX2 predict increased risk of metastasis formation in prostate cancer–a targeted molecular approach. British Journal of Cancer. 115 (2), pp. 236-242. https://doi.org/10.1038/bjc.2016.169

Cancer subtype identification pipeline: A classifusion approach
Agrawal, U., Soria, D. and Wagner, C. 2016. Cancer subtype identification pipeline: A classifusion approach. Evolutionary Computation (CEC), 2016 IEEE Congress on. 24 - 29 Jul 2016 IEEE . https://doi.org/10.1109/CEC.2016.7744150

Markers of Progression in Early-Stage Invasive Breast Cancer: a Predictive Immunohistochemical Panel Algorithm for Distant Recurrence Risk Stratification
Aleskandarany, M.A., Soria, D., Green, A.R., Nolan, C., Diez-Rodriguez, M., Ellis, I.O. and Rakha, E.A. 2015. Markers of Progression in Early-Stage Invasive Breast Cancer: a Predictive Immunohistochemical Panel Algorithm for Distant Recurrence Risk Stratification. Breast Cancer Research and Treatment. 151 (2), pp. 325-333. https://doi.org/10.1007/s10549-015-3406-3

Practical detection of a definitive biomarker panel for Alzheimer’s Disease; comparisons between plasma and cerebrospinal fluid
Richens, J.L., Vere, K.-A., Light, R.A., Soria, D., Garibaldi, J.M., Smith, A.D., Warden, D., Wilcock, G., Bajaj, N., Morgan, K. and O’Shea, P. 2014. Practical detection of a definitive biomarker panel for Alzheimer’s Disease; comparisons between plasma and cerebrospinal fluid. International Journal of Molecular Epidemiology and Genetics. 5 (2), pp. 53-70.

Signalling Paediatric Side Effects using an Ensemble of Simple Study Designs
Reps, J.M., Aickelin, U., Garibaldi, J.M., Soria, D., Gibson, J.E. and Hubbard, R.B. 2014. Signalling Paediatric Side Effects using an Ensemble of Simple Study Designs. Drug Safety. 37 (3), pp. 163-170. https://doi.org/10.1007/s40264-014-0137-z

A Novel Semisupervised Algorithm for Rare Prescription Side Effect Discovery
Reps, J.M., Garibaldi, J.M., Aickelin, U., Soria, D., Gibson, J.E. and Hubbard, R.B. 2014. A Novel Semisupervised Algorithm for Rare Prescription Side Effect Discovery. IEEE Journal of Biomedical and Health Informatics. 18 (2), pp. 537-547. https://doi.org/10.1109/JBHI.2013.2281505

Guest Editorial: Data Mining in Bioinformatics
Reps, J.M., Garibaldi, J.M., Aickelin, U., Soria, D., Gibson, J.E. and Hubbard, R.B. 2014. Guest Editorial: Data Mining in Bioinformatics. IEEE Journal of Biomedical and Health Informatics. 18 (2), p. 483. https://doi.org/10.1109/JBHI.2014.2306988

Nottingham Prognostic Index Plus (NPI+): A Modern Clinical Decision Making Tool in Breast Cancer
Rakha, E., Soria, D., Green, A.R., Lemetre, C., Powe, D.G., Nolan, C.C., Garibaldi, J.M., Ball, G.R. and Ellis, I.O. 2014. Nottingham Prognostic Index Plus (NPI+): A Modern Clinical Decision Making Tool in Breast Cancer. British Journal of Cancer. 110 (7), pp. 1688-1697. https://doi.org/10.1038/bjc.2014.120

A methodology for automatic classification of breast cancer immunohistochemical data using semi-supervised Fuzzy c-Means
Lai, D.T.C., Garibaldi, J.M., Soria, D. and Roadknight, C.M. 2014. A methodology for automatic classification of breast cancer immunohistochemical data using semi-supervised Fuzzy c-Means. Central European Journal of Operations Research. 22 (3), pp. 475-499. https://doi.org/10.1007/s10100-013-0318-3

Permalink - https://westminsterresearch.westminster.ac.uk/item/qv3y1/combining-clustering-and-classification-ensembles-a-novel-pipeline-to-identify-breast-cancer-profiles


Share this

Usage statistics

161 total views
127 total downloads
These values cover views and downloads from WestminsterResearch and are for the period from September 2nd 2018, when this repository was created.