Amharic Print Documents Transcription To Audio  Using Machine Learning Techniques

Sinishaw, Getachew

Amharic Print Documents Transcription To Audio Using Machine Learning Techniques

Date

2022-09

Authors

Sinishaw, Getachew

Publisher

ASTU

Abstract

Printed Document Recognition Of Remains A Largely Unsolved Problem In The Field Of Pattern Recognition. This Thesis Examines How Modern Deep Learning Techniques Can Improve Printed Document Recognition In The Context Of The Amharic Language.Although Amharic Was A Literary Language In Ethiopia, It Is Underrepresented In The Research Field Of Document Image Recognition And Analysis. Therefore, A Printed Document Recognition Based On Real-World Large-Scale Digitization Scenarios Is Proposed. Its Architecture Is Made Up Of Functions: Pre-Processing (Binary And Discriminant Estimation), Page Layout Analysis, Model Recognition, And Post-Processing. For Each Task, A Test Setup Is Prepared. In The Binarization Task, Four Binarization Methods (Otsu's Global Method, Otsu's Local Method, Adaptive_Thresh_Mean_Method, And Adaptive_Thresh_Gaussian Method) Were Investigated Using FM, P-FM, PSNR, And DRD Evaluation Parameters. Adaptive_Thresh_Mean_Method Is Superior To All Other Methods In All Dimensions. In The Document Image Bias Estimation Task, A Method Based On The Howe Transform Was Investigated By Testing And Investigating Its Effect On The Dataset. The Evaluation Parameters CE, AED And TOP80 Were Used And Values Equal To 86.00,0.4215and 0.068 Were Obtained. The Performance Of The Leptonica Open-Source C Library Was Investigated In The Page Layout Analysis Function, And The Results Were Found To Be Highly Successful In Various Formats At The Region And Line Level.Amharic Language Page Layouts. The Final Test Setup Was Designed To Build A Recognition Model Using The Tesseract OCR Engine. Since It Was Difficult To Prepare Large Training Data With Realistically Accurate Printed Documents, A Calibration Method Based On The Amharic Language Was Proposed And Implemented. A Total Of 286 Images Of 200 Lines Of Text Collected From 15 Different Pages Were Processed And A Recognition Model With A Character Error Rate Of 2.632% Was Constructed. Overall, The Experiments Conducted With The Prototype Approach Have Yielded Encouraging Results, So The Full Development Of An OCR System For Amharic Printed Document Recognition Is Applicable And Vital For Multi-Purpose Task Specially For Education Purpose.