Scalable Semi-Supervised Support Vector Machine Based on Adaptive Sampling

Zhao, XinYue; Sun, Xiaoyang; Zhang, Jing; Song, Yunsheng

doi:10.3233/FAIA231021

Abstract

Semi-supervised support vector machine (S3VM) algorithms can effectively deal with the problem of a few labeled instance and a large number of unlabeled instances due to its good performance. The solution of the existing semi-supervised support vector machine algorithms requires the use of many types of optimization strategies because it takes all the training data as parameters to participate in iterative optimization, which makes it difficult to efficiently process large-scale data. Although simple random sampling is an effective means to consider efficient modeling from the perspective of data preprocessing, the problem that it determines the sample size in advance is difficult to process for the existence of sampling randomness and sample difference. To fully characterize the original unlabeled data and ensure the robustness of the model, we have proposed an adaptive sampling to train the model on the labeled set and the sampled unlabeled set. The fixed size unlabeled instances are continually sampled from the original unlabeled set until the proposed statistics on the obtained sample meet the stopping condition, where the statistics and stopping condition are generated by the density estimation. This method solves the problem of subjectively determining the sample size in advance, the robustness of the proposed algorithm has been proved with the probably approximately correct learning theory.

This website uses cookies

This website uses cookies