Leveraging large language models for medical text classification: a hospital readmission prediction case

Nazyrova, N., Chahed, S., Chaussalet, T. and Dwek, M. 2024. Leveraging large language models for medical text classification: a hospital readmission prediction case. The IEEE 14th International Conference on Pattern Recognition Systems. London, United Kingdom 15 - 18 Jul 2024 IEEE .

TitleLeveraging large language models for medical text classification: a hospital readmission prediction case
AuthorsNazyrova, N., Chahed, S., Chaussalet, T. and Dwek, M.
TypeConference paper
Abstract

In recent years, the intersection of natural language processing (NLP) and healthcare informatics has witnessed a revolutionary transformation. One of the most groundbreaking developments in this realm is the advent of large language models (LLM), which have emonstrated remarkable capabilities in analysing clinical data. This paper aims to explore the potential of large language models in medical text classification, shedding light on their ability to discern subtle patterns, grasp domain-specific terminology, and adapt to the dynamic nature of medical information. This research focuses on the application of transformer-based models, such as Bidirectional Encoder Representations from Transformers (BERT), on hospital discharge summaries to predict 30-day readmissions among older adults. In particular, we explore the role of transfer learning in medical text classification and compare domain-specific transformer models, such as SciBERT, BioBERT and ClinicalBERT. We also analyse how data preprocessing techniques affect the performance of language models. Our comparative analysis shows that removing parts of text with a large proportion of out-of-vocabulary words improves the classification results. We also investigate how the input sequence length affects the model performance, varying sequence length from 128 to 512 for BERT-based models and 4096 sequence length for the Longformers. The results of the investigation showed that among compared models SciBERT yields the best performance when applied in the medical domain, improving current hospital readmission predictions using clinical notes on MIMIC data from 0.714 to 0.735 AUROC. Our next step is pretraining a model with a large corpus of clinical notes to potentially improve the adaptability of a language model in the medical domain and achieve better results in downstream tasks.

Keywordshospital readmission prediction, domain-specific transformer models, BERT, ClinicalBERT, SciBERT, BioBERT, large language models.
Year2024
ConferenceThe IEEE 14th International Conference on Pattern Recognition Systems
PublisherIEEE
Accepted author manuscript
License
CC BY 4.0
File Access Level
Open (open metadata and files)

Related outputs

Discovering Drug-Drug Interactions Using Association Rule Mining from Electronic Health Records
Nazyrova, N., Chahed, S., Dwek, M., Getting, S. and Chaussalet, T. 2023. Discovering Drug-Drug Interactions Using Association Rule Mining from Electronic Health Records. The 17th International Conference on Innovations in Intelligent Systems and Applications. Hammamet, Tunisia 20 - 23 Sep 2023 IEEE . https://doi.org/10.1109/inista59065.2023.10310637

Machine Learning models for predicting 30-day readmission of elderly patients using custom target encoding approach
Nazyrova, N., Chaussalet, T.J. and Chahed, S. 2022. Machine Learning models for predicting 30-day readmission of elderly patients using custom target encoding approach. International Conference on Computational Science ICCS 2022. London, UK 21 - 23 Jun 2022 Springer. https://doi.org/10.1007/978-3-031-08757-8_12

A Data Science Approach for Early-Stage Prediction of Patient’s Susceptibility to Acute Side Effects of Advanced Radiotherapy
Aldraimli, M., Soria, D., Grishchuck, D., Ingram, S., Lyon, R., Mistry, A., Oliveira, J., Samuel, R., Shelley, L.E.A., Osman, S., Dwek, M., Azria, D., Chang-Claude, J., Gutiérrez-Enríquez, S., De Santis, M.C., Rosenstein, B.S., De Ruysscher, D., Sperk, E., Symonds, R.P., Stobart, H., Vega, A., Veldeman, L., Webb, A, Christopher, J.T., West, C.M., Rattay, T., REQUITE consortium and Chaussalet, T.J. 2021. A Data Science Approach for Early-Stage Prediction of Patient’s Susceptibility to Acute Side Effects of Advanced Radiotherapy. Computers in Biology and Medicine. 135 104624. https://doi.org/10.1016/j.compbiomed.2021.104624

Temporal Comorbidity-Adjusted Risk of Emergency Readmission (T-CARER): A Tool for Comorbidity Risk Assessment
Mesgarpour, M., Chaussalet, T.J. and Chahed, S. 2019. Temporal Comorbidity-Adjusted Risk of Emergency Readmission (T-CARER): A Tool for Comorbidity Risk Assessment. Applied Soft Computing. 79, pp. 163-185. https://doi.org/10.1016/j.asoc.2019.03.015

Emergency Readmission for Integrated Care (ERIC) Model: Using an Automated Feature Generation & a Multi-Task Learner
Chaussalet, T.J., Mesgarpour, M., Worrall, P. and Chahed, S. 2017. Emergency Readmission for Integrated Care (ERIC) Model: Using an Automated Feature Generation & a Multi-Task Learner. Operational Research Applied to Health Services 2017. Bath, UK 24 - 28 Jul 2017

Ensemble Risk Model of Emergency Admissions (ERMER)
Mesgarpour, M., Chaussalet, T.J. and Chahed, S. 2017. Ensemble Risk Model of Emergency Admissions (ERMER). International Journal of Medical Informatics. 103, pp. 65-77. https://doi.org/10.1016/j.ijmedinf.2017.04.010

Risk Modelling Framework for Emergency Hospital Readmission, Using Hospital Episode Statistics Inpatient Data
Mesgarpour, M., Chaussalet, T.J. and Chahed, S. 2016. Risk Modelling Framework for Emergency Hospital Readmission, Using Hospital Episode Statistics Inpatient Data. IEEE 29th International Symposium on Computer-Based Medical Systems. Dublin and Belfast 20 - 23 Jun 2016 IEEE . https://doi.org/10.1109/CBMS.2016.21

Predictive Risk Modelling for Integrated Care: a Structured Review
Mesgarpour, M., Chaussalet, T.J., Worrall, P. and Chahed, S. 2016. Predictive Risk Modelling for Integrated Care: a Structured Review. IEEE 29th International Symposium on Computer-Based Medical Systems. Dublin and Belfast 20 - 23 Jun 2016 IEEE . https://doi.org/10.1109/CBMS.2016.34

Is the NHS in England too big to fail?
Dalton, S., Chahed, S. and Chaussalet, T.J. 2016. Is the NHS in England too big to fail? 8th Institute of Mathematics and Its Applications. Asia House, London 21 Mar 2016

Toward simulating the english neonatal unit
Dalton, S., Chahed, S. and Chaussalet, T.J. 2015. Toward simulating the english neonatal unit. 27th European Conference on Operational Reasearch (EURO). University of Strathclyde, Glasgow 13 Jul 2015

A review of dynamic Bayesian network techniques with applications in healthcare risk modelling
Mesgarpour, M., Chaussalet, T.J. and Chahed, S. 2014. A review of dynamic Bayesian network techniques with applications in healthcare risk modelling. 4th Student Conference on Operational Research (SCOR14). Nottingham, UK May 2–4, 2014 Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik. https://doi.org/10.4230/OASIcs.SCOR.2014.89

Modelling home care organisations from an operations management perspective
Matta, A., Chahed, S., Sahin, E. and Dallery, Y. 2014. Modelling home care organisations from an operations management perspective. Flexible Services and Manufacturing Journal. 26 (3), pp. 295-319. https://doi.org/10.1007/s10696-012-9157-0

What is a normal patient pathway?
Dalton, S. and Chahed, S. 2013. What is a normal patient pathway? 7th IMA Conference on Quantitative Modelling in the Management of Health and Social Care. Woburn House, London 26 Mar 2013

Using data mining and simulation for health system understanding and capacity planning: an application to urgent care
Tadjer, M., Chaussalet, T.J., Fouladinajed, F. and Chahed, S. 2012. Using data mining and simulation for health system understanding and capacity planning: an application to urgent care. High Tech Human Touch: Proceedings of the 38th ORAHS conference. University of Twente, The Netherlands. 16-20 July 2012

A decision support tool for health service re-design
Demir, E., Chahed, S., Chaussalet, T.J., Toffa, S.E. and Fouladinajed, F. 2012. A decision support tool for health service re-design. Journal of Medical Systems. 36 (2), pp. 621-630. https://doi.org/10.1007/s10916-010-9526-8

How to predict high dependency cot demand in upcoming days
Dalton, S., Chahed, S. and Chaussalet, T.J. 2012. How to predict high dependency cot demand in upcoming days. ORAHS 2012 Conference: High Tech Human Touch. University of Twente Enschede, The Netherlands 15-20 July 2012

The Anti-Cancer Drug Supply Chain: A Coupled Production-Distribution Problem
Chahed, S., Feillet, D., Sahin, E. and Dallery, Y. 2011. The Anti-Cancer Drug Supply Chain: A Coupled Production-Distribution Problem. Supply Chain Forum: an international journal. 12 (1), pp. 22-30. https://doi.org/10.1080/16258312.2011.11517251

Towards a full implementation of collaborative care plan. OR Informing National Health Policy
Tadjer, M., Chaussalet, T.J., Fouladinejad, F., Chahed, S., Saiyed, S., Redzanovic, S. and Fouladinajed, F. 2011. Towards a full implementation of collaborative care plan. OR Informing National Health Policy. in: Operational Research Information National Health Policy: proceedings of the 37th ORAHS conference School of Mathematics, Cardiff University.

The impact of temperature disparity on emergency readmissions and patient flows
Islam, M.S., Chaussalet, T.J., Balta-Ozkan, N., Chahed, S., Demir, E. and Sarran, C. 2011. The impact of temperature disparity on emergency readmissions and patient flows. in: Olive, M. and Solomonides, T. (ed.) Proceedings of CMBS: the 24th International Symposium on Computer-Based Medical Systems, June 27th – 30th, 2011, Bristol, United Kingdom IEEE .

Measuring and modelling occupancy time in NHS continuing healthcare
Chahed, S., Demir, E., Chaussalet, T.J., Millard, P.H. and Toffa, S.E. 2011. Measuring and modelling occupancy time in NHS continuing healthcare. BMC Health Services Research. 11 (155), p. 1. https://doi.org/10.1186/1472-6963-11-155

Analysis of variability in neonatal care units: a retrospective analysis
Adeyemi, S., Demir, E., Chahed, S. and Chaussalet, T.J. 2010. Analysis of variability in neonatal care units: a retrospective analysis. in: IEEE Workshop on Health Care Management (WHCM), Venice, 18-20 February 2010 IEEE . pp. 1-6

Towards effective capacity planning in a perinatal network centre
Asaduzzaman, M., Chaussalet, T.J., Adeyemi, S., Chahed, S., Hawdon, J., Wood, D. and Robertson, N.J. 2010. Towards effective capacity planning in a perinatal network centre. Archives of Disease in Childhood. Fetal and Neonatal Edition. 95 (4), pp. F283-F287. https://doi.org/10.1136/adc.2009.161661

Exploring new operational research opportunities within the Home Care context: the chemotherapy at home
Chahed, S., Marcon, E., Sahin, E., Feillet, D. and Dallery, Y. 2009. Exploring new operational research opportunities within the Home Care context: the chemotherapy at home. Health Care Management Science. 12 (2), pp. 179-191. https://doi.org/10.1007/s10729-009-9099-6

Couplage production – distribution des médicaments anti-cancéreux
Chahed, S., Feillet, D., Sahin, E. and Dallery, Y. 2008. Couplage production – distribution des médicaments anti-cancéreux. Proceedings of 4ème Conférence Francophone en Gestion et Ingénierie des Systemes Hospitaliers, GISEH 2008. Lausanne, Switzerland 04 - 06 Sep 2008

What about OR opportunities in the home care domain?
Chahed, S., Marcon, E., Sahin, E. and Dallery, Y. 2007. What about OR opportunities in the home care domain? in: Proceedings of the 33rd International Conference on Operational Research Applied to Health Services, ORAHS2007, Saint-Etienne, France, 15-20 July ORAHS. pp. 409-421

Improving operations management practices in home health care structures by using patients' activity projects
Chahed, S., Sahin, E., Dallery, Y. and Garcin, H. 2006. Improving operations management practices in home health care structures by using patients' activity projects. in: International Conference on Service Systems and Service Management, Troyes, France, Ocrtober 2006 IEEE . pp. 410-415

Modélisation et comparaison du fonctionnement de trois établissements d'hospitalisation à domicile en France
Chahed, S., Sahin, E. and Dallery, Y. 2006. Modélisation et comparaison du fonctionnement de trois établissements d'hospitalisation à domicile en France. Proceedings of 3ème conférence Francophone en gestion et ingénierie des systemes hospitaliers, GISEH 2006. Luxembourg 02 - 06 Oct 2006

Operations management related activities for home health care providers
Chahed, S., Matta, A., Sahin, E. and Dallery, Y. 2006. Operations management related activities for home health care providers. in: Dolgui, A., Morel, G. and Pereira, C.E. (ed.) Information control problems in manufacturing 2006 : a proceedings volume from the 12th IFAC Conference, 17-19th May 2006, Saint-Etienne, France Oxford Published for the International Federation of Automatic Control.

L'hospitalisation à domicile: quel(s) intérêt(s) et quelle(s) organisation(s)?
Chahed, S., Sahin, E. and Dallery, Y. 2004. L'hospitalisation à domicile: quel(s) intérêt(s) et quelle(s) organisation(s)? Proceedings of 2ème conférence Francophone en gestion et ingénierie des systemes hospitaliers. Mons, Belgium 9 - 11 Sep 2004

Permalink - https://westminsterresearch.westminster.ac.uk/item/w9w7v/leveraging-large-language-models-for-medical-text-classification-a-hospital-readmission-prediction-case


Share this

Usage statistics

68 total views
48 total downloads
These values cover views and downloads from WestminsterResearch and are for the period from September 2nd 2018, when this repository was created.