Bilingual Speaker Recognition for Afaan Oromo and Amharic Language Speakers
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
Speaker recognition is identifying “who is talking?” it helps the machines to identify who is
speaking. Identifying a speaker is useful to provide easy interaction between humans and
computers. It also provides secure interaction since speech is a biometric identity
verification tool. However, this is an easy task for humans but it is not for computer machines
unless they are aided with deep learning models. Various studies have been done in this area
and achieved promising results. Yet, speaker recognition is highly dependent on the
language spoken and other factors. Therefore preparing a model to handle different
languages is relevant, few studies have been done on Amharic speaker recognition these
researches are done only in one language. This study prepared a deep learning bilingual
speaker recognition model for two Ethiopian languages that have a vast number of speakers.
The study was inspired by the current growth of voice assistance and call systems where
many security issues are raised but remained unresolved, the scarcity of research in the field,
and the interest of the researcher. In this study, an open set text independent bilingual Afaan
Oromo and Amharic speaker recognition model was developed using MFCC by Librosa to
extract speaker feature vectors with CNN, LSTM, and MLP. Keras model API is used to
build the neural network. To train and test the model we prepared a new bilingual dataset
that has a total of 960 speech samples by taking a 30-sec utterance of 16 different Afaan
Oromo and Amharic speakers. The speech is taken carefully from YouTube by taking into
account every similar condition of a speaker for both languages to reduce the effect of
recording device variation, speaker mood, and speaker health status. In the end after
training the model in the two languages, 95.93% average accuracy is achieved with MLP.
The proposed model achieved 96.59% best accuracy when training and testing is done in
Amharic and 95.45% trained and tested with only Afaan Oromo. The model resulted in
61.86% performance drop for cross lingual test. Therefore, this study provided a novel
approach to model bilingual speaker recognition and presented a clear insight into the effect
of language mismatch in the speaker recognition model and the necessity of bilingual
speaker recognition.
