

Diverse opinion summarization aims to generate a summary that captures multiple opinions in texts. Although large language models (LLMs) have become the main choice for this task, the performance is highly depend on prompts. In this paper, we propose a self-evaluation based prompt calibration framework to stimulate LLM for generating high quality summary. It adopts the reinforcement learning mechanism to calibrate prompts for maximizing the reward of summary. The framework contains three parts. In the prompt construction part, we design the prompt that contains topic, task instruction and key opinion reference. The topic indicates the main focus of documents, the instruction describes the task with natural language and the key opinion reference is the explicit constraint on the expected opinions. In the reward part, for each summary, its coverage score and diversity score are used to represent the semantic coverage to the source documents and the inter opinion differences, respectively. The prompt calibration part selects the sentences in generated summaries to calibrate the prompts for the next iteration. With this framework, we use a LLM with 7B parameters to generate summaries, which outperforms large GPT-4 and multiple strong baselines. The ablation studies indicate the effectiveness of the iterative calibration process. We analyze the opinion difference in terms of the tendencies of sentences in summaries and use the Natural Language Inference (NLI)-based method to evaluate the faithfulness of summaries. Experiment results show that our method generates summaries with high opinion difference and faithfulness.