A tree-based machine learning methodology to automatically classify software vulnerabilities

Author

G. Aivatoglou, M. Anastasiadis, G. Spanos, A. Voulgaridis, K. Votis and D. Tzovaras

Published in

2021 IEEE International Conference on Cyber Security and Resilience (CSR), Rhodes, Greece

Keywords

Software Vulnerability categorization, Cyber-security, Machine Learning, Decision Trees, Random Forests, Gradient Boosting

Open Access

Abstract

Software vulnerabilities have become a major problem for the security analysts, since the number of new vulnerabilities is constantly growing. Thus, there was a need for a categorization system, in order to group and handle these vulnerabilities in a more efficient way. Hence, the MITRE corporation introduced the Common Weakness Enumeration that is a list of the most common software and hardware vulnerabilities. However, the manual task of understanding and analyzing new vulnerabilities by security experts, is a very slow and exhausting process. For this reason, a new automated classification methodology is introduced in this paper, based on the vulnerability textual descriptions from National Vulnerability Database. The proposed methodology, combines textual analysis and tree-based machine learning techniques in order to classify vulnerabilities automatically. The results of the experiments showed that the proposed methodology performed pretty well achieving an overall accuracy close to 80%.

Source

https://ieeexplore.ieee.org/document/9527965