The amount of training data in statistical machine translation critically affects translation quality. In this paper, we demonstrate how to increase translation quality for one language pair by introducing parallel data from a closely related language. Specifically, we improve English→Slovak translation using a large Czech-English parallel corpus and a shallow MT system for Czech→Slovak translation. Several options are explored to identify the best possible configuration.
We also present our two contributions to available data resources, namely the English-Slovak parallel corpus and the Slovak variant of the WMT 2011 test set.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com