This paper examines to what extent distributional approaches to induce bilingual lexica can capture correspondences between bilingual terms in international treaties. Recent developments in bilingual distributional representation learning methods have improved bilingual textual processing performances, and the application of these methods to processing specialised texts and technical terms has increased, including in the legal domain. Here we face at least two issues. Firstly, whether technical terms follow the distributional hypothesis or not is both theoretically and practically a critical concern. Theoretically, corresponding technical terms in different languages are the labels of the same concept and thus their equivalence is independent of the textual context. From this point of view, the distributional hypothesis holds only when the terms totally bind the context. This leads to the second issue, i.e. to verify the extent to which word embedding models trained on texts with different levels of specialisation are useful in capturing cross-lingual equivalences of terms. This paper examines these issues by conducting experiments in which different models trained on the texts with different degree of specialisations are evaluated against three different sets of equivalent bilingual pairs in the legal domain, i.e. of legal terms, of sub-technical terms and of general words. The results show that models learned on large-scale general texts fall far behind models learned on specialised texts in representing equivalent bilingual terms, while the former models have better performances for sub-technical terms and general words than the latter.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com