Abstract | The problem of classifying subjects into risk categories is a common challenge in medical research. Machine Learning (ML) methods are widely used in the areas of risk prediction and classification. The primary objective of these algorithms is to predict dichotomous responses (e.g. healthy/at risk) based on several features. Similarly to statistical inference models, also ML models are subject to the common problem of class imbalance. Therefore, they are affected by the majority class increasing the false-negative rate. In this paper, we built and evaluated eighteen ML models classifying approximately 4300 female participants from the UK Biobank into three categorical risk statuses based on responses for the discretised visceral adipose tissue values from magnetic resonance imaging. We also examined the effect of sampling techniques on classification modelling when dealing with class imbalance. Results showed that the use of sampling techniques had a significant impact. They not only drove an improvement in predicting patients risk status but also facilitated an increase in the information contained within each variable. Based on domain experts criteria, the three best models for classification were finally identified. These encouraging results will guide further developments of classification models for predicting visceral adipose tissue without the need for a costly scan. |
---|