Discovering Knowledge From complex Data: The Case of Ethiopian Revenue and Customs Authority
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The research area of the thesis is data mining in Revenue and customs sector. We applied data mining in imported Customs items data set by using machine learning techniques. The object of the research is to evaluate models trained by using machine learning algorithms and compare the results which would increase the efficiency of data analysis..
In this thesis, we collected a total number of 100,310 imported Items described with 15 attributes from ERCA. And 10% of this data set used in the experiment by random sampling method. The dataset for the study collected from ERCA and the data pre-processing and resampling techniques are explained in order to improve the performance of the training model. During the implementation of machine learning algorithms, three typical models (Ordinary Linear Regression, SVM and Random Forest) have been implemented by using the different packages in R on the given large datasets.
The experiment result shows almost 1 for multiple R2 and adjusted R2 for Ordinary Linear Regression that shows as there is existence of 100% of the Variation in total import costs. The 10fold cross validation result on the test set for OLR model shows 2952689 and 0.8113806 for the smaller RMSE and maximum R2 respectively. When we compare the result of RF with the OLR model, the minimum RMSE and maximum R2 we can get from the results are 464212.8 and 0.9928060 which shows better performance than OLR.
The quantitative and visual results of our practical machine learning implementation show the feasibility for the large datasets under the random forest algorithms. The research results of the work revealed new opportunities in the application of data mining methods by using machine learning in the Revenue and customs domain.
