Developing Afaan Oromoo Text Reading Model for Visually Impaired People using Deep Learning Approach

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

ASTU

Abstract

This study aims to develop an Afaan Oromoo Text Reading model for visually impaired individuals by integrating Tesseract Optical Character Recognition (OCR) with Recurrent Neural Network based Tacotron2 model using deep neural network approaches. The research addresses the challenges faced by visually impaired people in accessing written text and proposes a solution to enable them to hear Afaan Oromoo text through synthesized speech. The existing Afaan Oromoo Text-To-Speech studies have limitations in terms of intelligibility and naturalness. Most of these methods rely on traditional techniques such as concatenative synthesis, while the latest study utilizes Griffin-Lim algorithm as a vocoder, which is traditional compared to utilization of more advanced flow-based generative models. This study addresses the need for improved text-to-speech synthesis for the Afaan Oromoo language. To extract Afaan Oromoo text from images, we employ Tesseract OCR engine, which demonstrates accurate text extraction capabilities. To convert extracted text into speech, we used a high-quality text-to speech synthesis Tacotron2 model. Additionally, we employ WaveGlow and High Fidelity Generative Adversarial Network (HIFI-GAN) vocoders to improve the quality and naturalness of the synthesized speech. Dataset for model training is collected from various sources, including news, religious books, and benchmark dataset is utilized to enhance the Model training with 9000 sentences and transcribed audio datasets. The developed model was evaluated using subjective evaluation mean opinion score (MOS) and objective evaluation Mel Cepstral Distortion (MCD) using dynamic range of root mean square error (RMSE) assessments. Tacotron2 model with WaveGlow vocoder achieved MOS scores of 4.11 for intelligibility and 3.87 for naturalness and MCD evaluation scored 5.29%. On the other hand, Tacotron2 model with HIFI-GAN vocoder achieved MOS scores of 4.39 for intelligibility and 4.35 for naturalness and MCD evaluation scored 3.16%. These results indicate the high quality speech synthesis of Afaan Oromoo Text when using Tacotron2 model with flow-based generative vocoders. This study contributes to the field of assistive technologies by integrating Tesseract OCR, Tacotron2 model, and HIFI-GAN vocoder, which allows visually impaired as well as individuals with reading difficulties to access Afaan Oromoo written text through synthesized speech.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By