Sub-SA: Strengthen In-Context Learning via Submodular Selective Annotation

Qian, Jian; Sun, Miao; Zhou, Sifan; Zhao, Ziyu; Hun, Ruizhi; Chiang, Patrick

doi:10.3233/FAIA240720

Abstract

In-context learning (ICL) leverages in-context examples as prompts for the predictions of Large Language Models (LLMs). These prompts play a crucial role in achieving strong performance. However, the selection of suitable prompts from a large pool of labeled examples often entails significant annotation costs. To address this challenge, we propose Sub-SA (Submodular Selective Annotation), a submodule-based selective annotation method. The aim of Sub-SA is to reduce annotation costs while improving the quality of in-context examples and minimizing the time consumption of the selection process. In Sub-SA, we design a submodular function that facilitates effective subset selection for annotation and demonstrates the characteristics of monotonically and submodularity from the theoretical perspective. Specifically, we propose RPR (Reward and Penalty Regularization) to better balance the diversity and representativeness of the unlabeled dataset attributed to a reward term and a penalty term, respectively. Consequently, the selection for annotations can be effectively addressed with a simple yet effective greedy search algorithm based on the submodular function. Finally, we apply the similarity prompt retrieval to get the examples for ICL. Compared to existing selective annotation approaches, Sub-SA offers two main advantages. (1.) Sub-SA operates in an end-to-end, unsupervised manner, and significantly reduces the time consumption of the selection process (from hours-level to millisecond-level). (2.) Sub-SA enables a better balance between data diversity and representativeness and obtains state-of-the-art performance. Meanwhile, the theoretical support guarantees their reliability and scalability in practical scenarios. Extensive experiments conducted on diverse models and datasets demonstrate the superiority of Sub-SA over previous methods, achieving millisecond(ms)-level time selection and remarkable performance gains. The efficiency and effectiveness of Sub-SA make it highly suitable for real-world ICL scenarios. Our codes are available at unmapped: ext-link https://github.com/JamesQian11/SubSA

Contact

IOS Press Copyright 2025

Contact

IOS Press Copyright 2025

This website uses cookies

This website uses cookies