Bilingual Speaker Recognition for Afaan Oromo and Amharic Language Speakers

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

ASTU

Abstract

Speaker recognition is identifying “who is talking?” it helps the machines to identify who is speaking. Identifying a speaker is useful to provide easy interaction between humans and computers. It also provides secure interaction since speech is a biometric identity verification tool. However, this is an easy task for humans but it is not for computer machines unless they are aided with deep learning models. Various studies have been done in this area and achieved promising results. Yet, speaker recognition is highly dependent on the language spoken and other factors. Therefore preparing a model to handle different languages is relevant, few studies have been done on Amharic speaker recognition these researches are done only in one language. This study prepared a deep learning bilingual speaker recognition model for two Ethiopian languages that have a vast number of speakers. The study was inspired by the current growth of voice assistance and call systems where many security issues are raised but remained unresolved, the scarcity of research in the field, and the interest of the researcher. In this study, an open set text independent bilingual Afaan Oromo and Amharic speaker recognition model was developed using MFCC by Librosa to extract speaker feature vectors with CNN, LSTM, and MLP. Keras model API is used to build the neural network. To train and test the model we prepared a new bilingual dataset that has a total of 960 speech samples by taking a 30-sec utterance of 16 different Afaan Oromo and Amharic speakers. The speech is taken carefully from YouTube by taking into account every similar condition of a speaker for both languages to reduce the effect of recording device variation, speaker mood, and speaker health status. In the end after training the model in the two languages, 95.93% average accuracy is achieved with MLP. The proposed model achieved 96.59% best accuracy when training and testing is done in Amharic and 95.45% trained and tested with only Afaan Oromo. The model resulted in 61.86% performance drop for cross lingual test. Therefore, this study provided a novel approach to model bilingual speaker recognition and presented a clear insight into the effect of language mismatch in the speaker recognition model and the necessity of bilingual speaker recognition.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By