Verifying Meaning Equivalence in Bilingual International Treaties

Tang, Linyuan; Kageura, Kyo

doi:10.3233/FAIA190311

Abstract

This paper examines to what extent distributional approaches to induce bilingual lexica can capture correspondences between bilingual terms in international treaties. Recent developments in bilingual distributional representation learning methods have improved bilingual textual processing performances, and the application of these methods to processing specialised texts and technical terms has increased, including in the legal domain. Here we face at least two issues. Firstly, whether technical terms follow the distributional hypothesis or not is both theoretically and practically a critical concern. Theoretically, corresponding technical terms in different languages are the labels of the same concept and thus their equivalence is independent of the textual context. From this point of view, the distributional hypothesis holds only when the terms totally bind the context. This leads to the second issue, i.e. to verify the extent to which word embedding models trained on texts with different levels of specialisation are useful in capturing cross-lingual equivalences of terms. This paper examines these issues by conducting experiments in which different models trained on the texts with different degree of specialisations are evaluated against three different sets of equivalent bilingual pairs in the legal domain, i.e. of legal terms, of sub-technical terms and of general words. The results show that models learned on large-scale general texts fall far behind models learned on specialised texts in representing equivalent bilingual terms, while the former models have better performances for sub-technical terms and general words than the latter.

This website uses cookies

This website uses cookies