Text Information Retrieval System For Silt’e Language
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The main aim of an information retrieval system is to extract appropriate information from an enormous collection of data based on user’s need. The basic concept of the information retrieval system is that when a user sends out a query, the system would try to generate a list of related documents ranked in order, according to their degree of relevance. Most of the present information retrieval systems assign numeric scores by weighting functions to certain documents and put them in rank based on the scores. The BM25 weighting function has been one of the most efficient and widely-used information retrieval weighting models in the past three decades. It has a good term weighting is based on three principles inverse document frequency, term frequency, and document length normalization. Digital unstructured Silt’e text documents increase from time to time. The growth of digital text information makes the utilization and access of the right information difficult. Thus, developing an information retrieval system for Silt’e language allows searching and retrieving relevant documents that satisfy information need of users. In this research, we design probabilistic information retrieval system for Silt’e language. The system has both indexing and searching part was created. In these modules, different text operations such as tokenization, stemming, stop word removal and synonym is included. For the experimenting purpose, we have used 134 Silt’e text documents and 10 queries were used to test the system. Which are collected from the sources of different government organization in Silt’e zone. We Use Apache solr for prototype development and the schema similarity for weighting the document used probabilistic weighting function, BM25. According to the experimentation, the performance of the system after stemming and synonym register 88% mean average precision. The result is promising to develop Silt’e text information retrieval systems and Silt’e search engines. The challenging task in the research was lack of standardized and well-prepared Silt’e text corpus and test queries which required conducting certain experimentation for evaluation of the proposed system and these will be future research directions in this area which contribute to the improvement of the system.
