Bilingual Speaker Recognition for Afaan Oromo and Amharic Language  Speakers

Million, Sime

Bilingual Speaker Recognition for Afaan Oromo and Amharic Language Speakers

Files

Million Sime.pdf (2.27 MB)

Date

2022-09

Authors

Million, Sime

Publisher

ASTU

Abstract

Speaker recognition is identifying “who is talking?” it helps the machines to identify who is speaking. Identifying a speaker is useful to provide easy interaction between humans and computers. It also provides secure interaction since speech is a biometric identity verification tool. However, this is an easy task for humans but it is not for computer machines unless they are aided with deep learning models. Various studies have been done in this area and achieved promising results. Yet, speaker recognition is highly dependent on the language spoken and other factors. Therefore preparing a model to handle different languages is relevant, few studies have been done on Amharic speaker recognition these researches are done only in one language. This study prepared a deep learning bilingual speaker recognition model for two Ethiopian languages that have a vast number of speakers. The study was inspired by the current growth of voice assistance and call systems where many security issues are raised but remained unresolved, the scarcity of research in the field, and the interest of the researcher. In this study, an open set text independent bilingual Afaan Oromo and Amharic speaker recognition model was developed using MFCC by Librosa to extract speaker feature vectors with CNN, LSTM, and MLP. Keras model API is used to build the neural network. To train and test the model we prepared a new bilingual dataset that has a total of 960 speech samples by taking a 30-sec utterance of 16 different Afaan Oromo and Amharic speakers. The speech is taken carefully from YouTube by taking into account every similar condition of a speaker for both languages to reduce the effect of recording device variation, speaker mood, and speaker health status. In the end after training the model in the two languages, 95.93% average accuracy is achieved with MLP. The proposed model achieved 96.59% best accuracy when training and testing is done in Amharic and 95.45% trained and tested with only Afaan Oromo. The model resulted in 61.86% performance drop for cross lingual test. Therefore, this study provided a novel approach to model bilingual speaker recognition and presented a clear insight into the effect of language mismatch in the speaker recognition model and the necessity of bilingual speaker recognition.

Keywords

Text-independent Speaker Recognition, CNN, MFCC, Feature Extraction, Keras Model API.

URI

http://10.240.1.28:4000/handle/123456789/1557

Collections

Thesis

Full item page

Bilingual Speaker Recognition for Afaan Oromo and Amharic Language Speakers

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By