

Genomic testing is becoming increasingly used within the UK National Health Service (NHS) in an effort to deliver precision medicine that can directly benefit patient management and care. Typically, in hospital settings clinical scientists interpret genomic results and store the output in reports (unstructured format). Recent advances in the field of Natural Language Processing (NLP) techniques have significant advantage over the common time-consuming and laborious manual extraction of the relevant information from the reports. There are a plethora of open-source NLP models available, but limited evidence of their performance on real-world healthcare tasks, specifically for paediatric data. In this paper, we describe the development of an automated pipeline that uses a hybrid approach of combining rules and pretrained NLP models to extract gene variants, related information of these variants and any gene panel information present within the genomic test reports. We evaluated the performance of the pipeline against a manually-curated, expert-annotated, in-house data containing 372 reports. Our results and evaluation highlights the advantages and limitations of using existing pretrained models in a real-world setting, and in particular, when there are computation and resource constraints within the hospital setting.