Most people need textual or visual interfaces in order to make sense of Semantic Web data. In this thesis, we investigate the problem of generating natural language summaries for structured data encoded as triples using deep neural networks.
We propose an end-to-end trainable architecture that encodes the information from a set of triples into a vector of fixed dimensionality and generates a textual summary by conditioning the output on this encoded vector. In order to both train and evaluate the performance of our approach, we explore different methodologies for building the required data-to-text corpora. We initially focus our attention on the generation of biographies. Using methods for both automatic and human evaluation, we demonstrated that our technique is capable of scaling to domains with challenging vocabulary sizes of over 400k words.
Given the promising results of our approach in biographies, we explore its applicability in the generation of open-domain Wikipedia summaries in two under-resourced languages, Arabic and Esperanto. We propose an adaptation of our original encoder-decoder architecture that outperforms a set of strong baselines of different nature. Furthermore, we conducted a set of community studies in order to measure the usability of the generated content by Wikipedia readers and editors. The targeted communities ranked our generated text close to the expected standards of Wikipedia. In addition, we found that the editors are likely to reuse a large portion of the generated summaries, thus, emphasizing the usefulness of our approach to the involved communities.
Finally, we extend the original model with a pointer mechanism that enables it to jointly learn to verbalise in a different number of ways the content from the triples while retaining the ability to generate regular words from a fixed target vocabulary. We evaluate performance with a dataset encompassing the entirety of English Wikipedia. Results from both automatic and human evaluation highlight the superiority of the latter approach compared to our original encoder-decoder architecture and a set of competitive baselines.