

One of the most remarkable findings in the literature on sentence embeddings has been that simple word vector averaging can compete with state-of-the-art models in many tasks. While counter-intuitive, a convincing explanation has been provided by Arora et al., who showed that the bag-of-words representation of a sentence can be recovered from its word vector average with almost perfect accuracy. Beyond word vector averaging, however, most sentence embedding models are essentially black boxes: while there is abundant empirical evidence about their strengths and weaknesses, it is not clear why and how different embedding strategies are able to capture particular properties of sentences. In this paper, we focus in particular on how sentence embedding models are able to capture word order. For instance, it seems intuitively puzzling that simple LSTM autoencoders are able to learn sentence vectors from which the original sentence can be reconstructed almost perfectly. With the aim of elucidating this phenomenon, we show that to capture word order, it is in fact sufficient to supplement standard word vector averages with averages of bigram and trigram vectors. To this end, we first study the problem of reconstructing bags-of-bigrams, focusing in particular on how suitable bigram vectors should be encoded. We then show that LSTMs are capable, in principle, of learning our proposed sentence embeddings. Empirically, we find that our embeddings outperform those learned by LSTM autoencoders on the task of sentence reconstruction, while needing almost no training data.