DesignandDevelopmentof AutomaticUnlexicalizedStatisticalConstituencyParserforTigrigna
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
Syntax Parsing is known to be an intermediate step for higher components of Natural Language Processing like Machine Translation. The research on the design and developmentofautomaticunlexicalizedstatisticalconstituencyparserforTigrignalanguage is never attempted. As a result, in this research, a tree bank containing a total of 250 syntactically parsed corpus is made with the help of linguistic professional. Viterbi based bottom up probabilistic context free parsing tool is used for parsing using automatic probabilistic context free grammar induction model. Maximum likelihood estimation technique is used to extract and learn probabilities automatically from context free grammar repositories. Tigrigna word and Affix Segementer, Transliteration system and the Trigram ’n’ tags which is efficient language independent tagger, are integrated as inputs in to the parser. The segementer splits morphological affixes as well as complex words into their representative base forms. While the primary role of the parser becomes structural disambiguation, where as the role of Trigram ’n’ Tags tagger is to handle lexical or word category disambiguation together with the word and Morphological segementer. After all, the parser is evaluated with standard parser evaluation scoring tool called Evaluate Bracketing. Accordingly, the parser has achieved state of the art accuracy with F-Score 95.12% on on 75-25 percentage split.
