Up to 80% of medical information is documented by unstructured data such as clinical reports written in natural language. Such data is called unstructured because the information it contains cannot be retrieved automatically as straightforward as from structured data. However, we assume that the use of this flexible kind of documentation will remain a substantial part of a patient’s medical record, so that clinical information systems have to deal appropriately with this type of information description. On the other hand, there are efforts to achieve semantic interoperability between clinical application systems through information modelling concepts like HL7 FHIR or openEHR. Considering this, we propose an approach to transform unstructured documented information into openEHR archetypes. Furthermore, we aim to support the field of clinical text mining by recognizing and publishing the connections between openEHR archetypes and heterogeneous phrasings. We have evaluated our method by extracting the values to three openEHR archetypes from unstructured documents in English and German language.