638
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Detecting unknown vulnerabilities in smart contracts using opcode sequences

, , , &
Article: 2313853 | Received 07 Nov 2023, Accepted 30 Jan 2024, Published online: 14 Feb 2024
 

Abstract

Unknown vulnerabilities, also known as zero-day vulnerabilities, are vulnerabilities in software, systems, or networks that have not yet been publicly disclosed or fixed. If these vulnerabilities are ever discovered by hackers, intentionally or unintentionally, they pose a major threat to network security. This is particularly true in the blockchain field, as smart contracts hold a lot of money, and if they are discovered and exploited by hackers, the financial losses to users will be even greater. However, the current research on smart contract vulnerabilities mainly focuses on known vulnerabilities, and the research on unknown vulnerabilities has been limited. Based on this, we introduce a machine learning-based method for detecting unknown vulnerabilities in smart contracts. First, the method obtains the opcode sequences executed by smart contract transactions in the EVM by instrumenting Geth and replaying the Ethereum transactions. Next, we employ an n-gram model and a vector weight penalty mechanism to extract the opcode sequence features. We then use machine learning algorithms to detect unknown vulnerabilities based on the similarity principle. Finally, we test the effectiveness of our method with four machine learning models: the K-Nearest Neighbor algorithm (KNN), Support Vector Machine (SVM), Logistic Regression (LR), and Decision Tree (DT). The SVM model performs best at detecting unknown vulnerabilities, with an accuracy of 96%, a precision of 91%, a recall of 100%, and an F1-score of 95%. We also discuss the benefits of the method: timely detection of attacks due to unknown vulnerabilities, thus reducing user losses.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported in part by the National Key Research and Development Program of China (2020YFB1005804), and in part by the National Natural Science Foundation of China under Grant 62372121.