As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
The application of Natural Language Processing (NLP) to medical data has revolutionized different aspects of health care. The benefits obtained from the implementation of this technique spill over into several areas, including in the implementation of chatbots, which can provide medical assistance remotely. Every possible application of NLP depends on one first main step: the pre-processing of the corpus retrieved. The raw data must be prepared with the aim to be used efficiently for further analysis. Considerable progress has been made in this direction for the English language but for other languages, such as Italian, the state of the art is not equivalently advanced, especially for texts containing technical medical terms. The aim of this work is to identify and develop a preprocessing pipeline suitable for medical data written in Italian. The pipeline has been developed in Python environment, employing Enchant, ntlk modules and Hugging Face’s BERT and BART-based models. Then, it has been tested on real conversations typed between patients and physicians regarding medical questions. The algorithm has been developed within the MULTI-SITA project of the Italian Society of Anti-Infective Therapy (SITA), but shows a flexible structure that can adapt to a large variety of data.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.