Data Augmentation for Pipeline-Based Speech Translation

Alves, Diego; Salimbajevs, Askars; Pinnis, Mārcis

doi:10.3233/FAIA200605

Data Augmentation for Pipeline-Based Speech Translation

Authors

Diego Alves, Askars Salimbajevs, Mārcis Pinnis

Pages

73 - 79

DOI

10.3233/FAIA200605

Category

Research Article

Series

Frontiers in Artificial Intelligence and Applications

Ebook

Volume 328: Human Language Technologies – The Baltic Perspective

Abstract

Pipeline-based speech translation methods may suffer from errors found in speech recognition system output. Therefore, it is crucial that machine translation systems are trained to be robust against such noise. In this paper, we propose two methods for parallel data augmentation for pipeline-based speech translation system development. The first method utilises a speech processing workflow to introduce errors and the second method generates commonly found suffix errors using a rule-based method. We show that the methods in combination allow significantly improving speech translation quality by 1.87 BLEU points over a baseline system.

This website uses cookies

This website uses cookies