In this paper, we investigate using different types of neural networks for age and gender identification from children's speech, based on the Corpus of Estonian Adolescent Speech. Feed-forward deep neural networks using i-vectors as input are compared with recurrent neural networks using MFCCs as input. Results show that feed-forward neural networks outperform recurrent neural networks for gender classification, while a model that combines both i-vectors and MFCC via feed-forward and recurrent branches achieve the best performance for age group classification. We also show that for age group classification, it is beneficial to first identify gender and then use a gender-specific age identification model. Experiments with human listeners show that the neural network models outperform humans on both tasks by a big margin.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 email@example.com
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 firstname.lastname@example.org