BRECS: Enhanced Binary Representation of Word Embeddings via Cosine Similarity

Sarkar, Rajdeep; Dutta, Sourav; McCrae, John

doi:10.3233/FAIA240966

Abstract

Word representations like GloVe and Word2Vec encapsulate semantic and syntactic attributes and constitute the fundamental building block in diverse Natural Language Processing (NLP) applications. Such vector embeddings are typically stored in float32 format, and for a substantial vocabulary size, they impose considerable memory and computational demands due to the resource-intensive float32 operations. Thus, representing words via binary embeddings has emerged as a promising but challenging solution.

In this paper, we introduce BRECS, an autoencoder-based Siamese framework for the generation of enhanced binary word embeddings (from the original embeddings). We propose the use of the novel Binary Cosine Similarity (BCS) regularisation in BRECS, which enables it to learn the semantics and structure of the vector space spanned by the original word embeddings, leading to better binary representation generation. We further show that our framework is tailored with independent parameters within the various components, thereby providing it with better learning capability. Extensive experiments across multiple datasets and tasks demonstrate the effectiveness of BRECS, compared to existing baselines for static and contextual binary word embedding generation. The source code is available at unmapped: uri https://github.com/rajbsk/brecs.

This website uses cookies

This website uses cookies