Ebook: Syntactic-Semantic Tagging of Medical Texts: The Multi-TALE Project
This book summarises the achievements of the Multi-TALE project, which aims at developing a syntactic-semantic tagger-lemmatiser that assist lexicolo-gist in a multilingual environment to automatically classify specialised terms in a standardisation perspective. Its intended audience consists of (mainly) two distinct groups: on the one hand the computational linguists, who seek an appropriate application involving real life data for their implemented prototypes, and on the other hand, medical E.D.P. managers (doctors or computer scientists), who are looking for methods that would facilitate their information retrieval and knowledge processing tasks. The book contains a concise introduction to the relevant linguistic and medical topics required for an overall comprehension.
The present initiative of the Multi-TALE consortium - to have their work printed in a high standard publication - is an important event and we should recognise it as such. Indeed, not so many books have been published which are mainly about Natural Language Processing (NLP) in the medical domain. This is due to different reasons related to the history of NLP over the past few decades
The first computers spread into universities at the beginning of the sixties, and despite their relatively low power, they were already expected to play a role in text analysis and understanding. Computational linguistics, as well as artificial intelligence, were very early advocated for their true power to interact in human research and business. In the seventies, the promises were so great that many people “sold the moon”: natural language understanding was promised to be ready for practical applications like automatic translation.
These great expectations were followed by an abrupt “return to earth”. After what can be called the Chomsky's decade came the eighties with a move from a somewhat generalist point of view – grammar-based approaches were largely seen as domain-independent and therefore universal – to a more specific point of view, where domain-orientation took a more important place. The need to complement the existing tools by semantically oriented techniques was increasingly recognised by computational linguists.
As a consequence, instead of pursuing a universal solution for NLP, there was a shift to domain-dependent solutions where the specificity of the semantics of the domain played a major role. The medical domain was soon recognised as quite profitable for multiple investigations. The reasons amongst others were:
• that the medical language is a scientific language;
• there are strong constrains in this domain to document any actions by written texts;
• there is a bottleneck of activities for all care providers and a need for computer tools to alleviate their tasks;
• that the medical domain is universal and that it encompasses some fifty per cent of human language;
• that a multilingual solution is the only one to be largely accepted.
The present decade claims to search for practical desktop solutions. The necessary computing power is presently there, if not for the final solution which may require much more power, at least to enable good and beneficial applications to be run today or in the near future. In the medical domain, quite a number of signs show evidence of the coming presence of NLP tools. Different authors have shown us the way. In the short review below we will restrict ourselves to the groups working specifically on the processing of medical texts (and we apologise in advance to all the others not mentioned).
The first to be presented in this list is Naomi Sager and we want to really thank her as “the mother of medical linguistics” due to her pioneering work started in the seventies. In the US leading developments are now conducted by J. Cimino, C. Friedmann and S. Johnson at Columbia University in New York about controlled medical vocabularies. F. Massarie and R. Miller worked on the representation of clinical findings. The aspect of indexation and retrieval has been covered either by J. Evans or by M. Tuttle. In Europe an early experiment came from P. Zweigenbaum as well as from R. Baud, A-M. Rassinoux and J. Wagner working on analysis of medical texts, generation and knowledge representation using conceptual graphs. C. Lovis explored the morpho-semantic decomposition of words in order to improve their knowledge representation. W. Ceusters brings tools and techniques from computational linguistics into the medical domain and is exploring automatic knowledge acquisition techniques from annotated corpora. All of them played and still play a dominant role. The list of references at the end of this book gives a better credit to more authors than this short introduction has the space to do.
This book appears at the right moment. It exemplifies the emerging benefit when computational linguists and medical experts meet. The first brings the basics of linguistics, the second are the experts of the domain. They are dependent on each other and they are constrained to work together. If they succeed, they will inevitably be in advance compared to other groups, because they are jointly in a position to solve more problems with a lower consumption of resources. The Multi-TALE project is a perfect example of this fact.
The Multi-TALE approach is unrestricted in the sense that the goal is to deal with natural language as used and produced by clinicians in their daily practice, including medical “jargon” and non-scholarly texts. This is a fact of life in our European hospitals and software techniques have to cope with it.
We appreciate the clarity of the second chapter. In particular the third section of it presents a clear picture and distinction between syntax, semantics and pragmatics. The dedicated chapter on the CEN standards illustrates the fact that standards are possible in this matter: this is not a frequent situation and the authors of Multi-TALE are right to remind us that it has been fruitful in this project. The presentation of the implementation in two distinct languages is a typically European add-on: by its culture the European continent is definitely multilingual and our software tools have to be adapted accordingly.
Though Multi-TALE is a fully independent project, the co-relation with the Galen model is present in multiple places. This is not a surprise because some actors have been also involved in the Galen developments. This is particularly shown in the chapter on the CEN standards for surgical procedures and in the last chapter about further works and perspectives. Multi-TALE has shown its capabilities to help the “feeding” of a large and deep semantic model like Galen using independent textual knowledge sources. In return Galen is in a position to provide a reference typology and other basic information thought to be helpful when analysing new medical texts.
If the reader finds the time to read all this book, or parts of it, he or she will not be disappointed. The structure of the book follows the principle of making the reader's life easy. The first chapters are self-explanatory and have their own value in a stand-alone point of view. Nevertheless, when more about a specific aspect is required, the reader is advised to continue with the more specific and deeper chapters. Throughout this volume, the abundance of simple and good examples also gives evidence of the solid ground on which this project was realised. The richness and especially the pertinence of the references is an added value. May you all enjoy reading this very valuable scientific contribution.
Chairman of EFMI WG8 on Natural Language Processing