

Recently, the Non-AutoRegressive (NAR) decoding mechanism, effectively reducing the inference latency of text generation, has been applied to Sign Language Translation (SLT). Typically, the current best NAR SLT model using a Curriculum-based Non-autoregressive Decoder (CND) outperforms AutoRegressive (AR) baselines in speed and performance. Although it has been proven that AutoRegressive Pre-trained Language Models (AR-PLMs) further boost the performance of AR SLT models, combining NAR Pretrained Language Models (NAR-PLMs) with NAR SLT model remains challenge due to (1) existing NAR-PLMs’ inability to model token dependencies between decoder layers, crucial for NAR SLT models using CND; (2) the modality gap between the decoder’s inputs of the NAR-PLMs and NAR SLT models. To address these, we propose a Random Ordering Progressive Prediction Pre-training task for NAR SLT models using CND, enabling the decoder to predict target sequences in diverse orderings and enhancing the modeling of target token dependencies between layers. Moreover, we propose a CTC-enhanced Soft Copy method to incorporate target-side information in the decoder’s inputs, alleviating the modality gap. Experimental results on PHOENIX-2014T and CSL-Daily demonstrate that our model consistently outperforms all strong baselines and achieves competitive performance with AR SLT models equipped with AR-PLMs.