

This research explores the development of a knowledge graph statistical question answering system for Statistics Netherlands. Aimed at efficiently retrieving single statistical values from their extensive database, which encompasses over a billion values across more than 4,000 tables, we propose a comprehensive three-component framework consisting of: (1) a data augmentation method to generate synthetic data, (2) an entity retrieval system that leverages various encoder networks along with different hard negative mining techniques for the effective retrieval of tables, measures, and dimensions, and (3) an innovative large language model-based query generator. A central innovation of our research is the introduction of a dynamic prompting technique for query generation, which creates prompts specifically for a certain phase of the token generation. This approach ensures that the model is supplied with information relevant for generating specific tokens in a symbolic query. With this approach, we propose a novel system that can help find relevant information in official statistics and similar systems, which is vital for governmental decision making and all fields of research utilising and relying on these statistics.