Software Design Pattern Recommendation Using Text Classification Techniques
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
Software design is considered as a challenging task in the agile software development life cycle, where the fundamental structure of software artifacts is highly provoked to its evolution over time by adding new features or modifying the existing functionality. This affects the system-level quality attributes such as reusability, maintainability, and understandability. Therefore, software design patterns introduce a common practical approach to improve the design quality of the software. The classification scheme and semantic correlation between patterns depend on the experience and knowledge of experts in the corresponding domain. Consequently, a novice developer needs enough knowledge and efforts to understand the classification scheme, the semantic correlation between patterns, and the consequences of each pattern. Text classification-based approaches have been shown in greatness among other software design pattern recommendation methods. However, the inconsistency in text classification schemes and lack of semantically illustrative feature set to organize design patterns are the main constraints to use the existing machine learning models to find a candidate design pattern class and suggest a more appropriate pattern(s). Thus, in this study, we have selected feature extraction methods to exploit an experiment using a text classification-based via supervised learning techniques such as SVM, NB, KNN, and RF. These classifiers are employed to organize similar design patterns by constructing models and recommend the right design pattern group. To recommend the right design pattern, we use cosine similarity measures to compute the degree of closeness between design problems and patterns of suggested pattern class. The models trained using TF-IDF, word2vec, and word2vec weighted by TF-IDF feature extraction methods. To assimilate the importance of the proposed method, the study employed a comparative experiment to determine the best combination of feature extraction methods with their respective machine learning algorithms. A case study is conducted on GoF design pattern collection and along with 38 real software design problem scenarios to evaluate the classification performance. According to the experiment evaluation result the SVM classifier based on Word2Vec weighted by TF-IDF achieves better performance with F1-Score 81.6%, followed by NB with F1-Score of 79.1%. RF model is the least performing model with all feature extraction methods.
