Chickpea Genotype Recommendation And Yield Prediction Using Machine Learning
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
Chickpea (Cicer arietinum L.) is a multi-purpose legume crop in the world and ranking third
after soybean and bean. To keep food security, farmers want to use improved crops and new
technology-based farming systems. International center for agricultural research in dry areas
(ICARDA) established Centre for chickpea crop improvement through breeding by providing
materials for national agricultural Centre including Ethiopia. In Ethiopia, ICARDA and
Debre Zeit with the provision of funds from the united states agency for international
development (USAID) started to experiment production of chickpea. At the current time,
DZARC is improving chickpea crops using the conventional type of breeding. Conventional
breeding is a type of breeding that depends on phenotypic traits like plant height, pods per
plant, seeds per pods. Therefore, the current way of the chickpea genotype improvement
process is based on phenotypic observation as well as yield prediction is based on the farmer's
experience. To handle these problems, many researchers had used phenotypic trait data and
machine learning techniques. But the techniques that researchers used were not promising
methods for the existing problem. In this research, a dataset of 11990 rows with 14 features
was used from which four clusters were generated using the PSO-based K-means algorithm.
As the cluster results found are unbalanced, the synthetic minority oversampling technique
(SMOTE) was applied on training dataset and generated 17,268 data to train the Random
Forest (RF) classification algorithm for single unseen genotype grouping. An RF algorithm
performed well in predicting the class of new genotypes with an accuracy of 76%. DNN with a
structure of (9-146-146-146-146-1) integrated with the Bayesian optimization technique has
selected six hyperparameters which provided the best result of yield prediction with a mean
absolute error (MAE) of 0.018. Finally, the predicted yield can be compared with other
existing yields and the genotypes having better yield in the prediction process can be given to
the grouping algorithm. The four cluster results found by PSO-based K-means are
recommended for a breeding program using K-means inter-cluster distance mapping and
descriptive statistics analysis.
