Abstract | Machine learning techniques in medicine have been at the forefront addressing challenges such as diagnosis, prognosis prediction, or precision medicine. In this field, the data are sometimes abundant but comes from different data sources or lack assigned labels. The process of manually labeling these data when conforming to a curated dataset for supervised classification can be costly. Semisupervised classification offers a wide range of methods for leveraging unlabeled data when learning prediction models. However, these classifiers are commonly deep or ensemble learning structures that often result in black boxes. The requirement of interpretable models for medical settings led us to propose the self-labeling gray box classifier, which outperforms other semisupervised classifiers on benchmarking datasets while providing interpretability. In this chapter, we illustrate the applications of the self-labeling gray box on the omics and clinical datasets from the cancer genome atlas. We show that the self-labeling gray box is accurate in predicting cancer stages of rare cancers by leveraging the unlabeled instances from more common cancer types. We discuss insights, the features influencing prediction, and a global representation of the knowledge through decision trees or rule lists, which can aid clinicians and researchers. |
---|