Full article: Establishment and mapping of heterogeneous anomalies in network intrusion datasets

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

Anomaly detection in the scope of network security aims to identify network instances for the unexpected and unique, with various security operations employing such techniques to facilitate effective threat detection. However, many systems have been designed based on the absolute mapping of attacks to one of three anomaly types (i.e. point, collective, or contextual), a strategy not supported by the recent findings of hybrid anomaly classifications. Given the growing usage of network anomaly detection and the implications of hybrid anomalies, we propose several heterogeneous anomaly types and provide an unsupervised approach for the automated mapping of network threats. Initial findings on publicly available intrusion datasets support the existence of four unique heterogeneous anomaly types, providing unique insight regarding the next generation of network anomaly detection systems.

Keywords:

1. Introduction

The continued growth of network communication has led to a security crisis in recent years. With growing device diversity and security often being an afterthought, digital defence is a continuously expanding endeavour. By 2023, an estimated two-thirds of the world's population will be internet-connected, resulting in a near 60% increase in network-capable devices from 2018 (Cisco, Citation2020). In line with this, by 2025, it is estimated that cybercrime will cost upwards of US$10.5 trillion in damages per year, a 250% increase compared to the US$3 trillion estimates of 2015 (Morgan, Citation2020).

Anomaly detection broadly refers to approaches that identify instances, data points, or events that fall outside the scope of the previously observed (Ahmed et al., Citation2016; Ariyaluran Habeeb et al., Citation2018; Bhuyan et al., Citation2013; Chen et al., Citation2020; Moustafa et al., Citation2019; Zhou & Guo, Citation2018; Zoppi et al., Citation2020). Heavily leveraged by various Intrusion Detection Systems (IDS), Network Anomaly Detection Systems (NADS) have seen wide adoption, offering improved system robustness and unsupervised operation. (Bovenzi et al., Citation2020; Dahiya & Srivastava, Citation2018; Fernandes et al., Citation2019; Guarino et al., Citation2022; Kiani et al., Citation2020; Mirsky et al., Citation2018; Zoppi et al., Citation2020). Similarly, due to their focus on abnormality, NADS are also able to discover previously undocumented attacks (Bovenzi et al., Citation2020; Mirsky et al., Citation2018; Zhou & Guo, Citation2018; Zoppi et al., Citation2021, Citation2020), a unique characteristic that has received specific attention from industry to tackle the growing challenges of zero-day vulnerabilities (Ahmed, Citation2019; Bovenzi et al., Citation2020; Fernandes et al., Citation2019; Zhou & Guo, Citation2018; Zoppi et al., Citation2021).

Given this, NADS have noted difficulties identifying obfuscated threats (i.e. malware, backdoors) (Zoppi et al., Citation2021). A shortcoming highlighted by the discovery of Heartbleed, a critical vulnerability initially missed by numerous high-level security firms (Chen et al., Citation2021; Lee et al., Citation2014). In response, researchers have attempted to map network threats to specific anomaly definitions based on their underlying qualities (Ahmed, Citation2019; Fernandes et al., Citation2019; Kendall, Citation1999), improving attack understanding and guiding NADS development.

Primary works (Ahmed et al., Citation2016; Kendall, Citation1999) initially focused on mapping entire attack vectors to a respective anomaly type based on their fundamental characteristics. While undoubtedly useful, these original mappings have grown outdated due to the growth of attack variation, with only recent mapping endeavours focusing on the classification of individual attack types (i.e. heartbleed) (Zoppi et al., Citation2020).

Further adding complexity is the proposition of anomaly hybridisation, whereby a threat can simultaneously express qualities of two or more distinct anomaly types (Araya et al., Citation2016; Jiang et al., Citation2014). Initial research by Jiang et al. (Citation2014) proposed a system capable of the real-time detection of contextual collective anomalies from data streams, with the term contextual collective used to explain an identified cross-over of qualities. While these findings spurred the development of several approaches (Araya et al., Citation2016; Dou et al., Citation2019; Hu et al., Citation2021), their implementations are often outside the scope of network security.

Additionally, Zoppi et al. (Citation2020) found that both Denial of Service (DoS) and Distributed Denial of Service (DDoS) attacks can present both point/collective and collective/contextual qualities. The authors utilised a semi-supervised approach to map the anomaly characteristics of attacks from several network intrusion datasets. Further, the research demonstrated that models trained on one attack could detect differing attacks, assuming they shared identical anomaly classifications. Given this, the mapping procedure utilised human interpretation to identify contextual associations, an inherent shortcoming given the known difficulties of contextual detection.

Further, research by Kiani et al. (Citation2020) also identified that anomaly types often showcase thin class boundaries, with enough segregation within to prompt additional classification categories. Ultimately, this led the authors to suggest the existence of two unique anomaly types: collective normal and collective point. However, as with Jiang et al. (Citation2014), the datasets utilised were outside the scope of network security.

Given the recent findings of anomaly hybrids (Jiang et al., Citation2014; Kiani et al., Citation2020; Zoppi et al., Citation2020) and the attack variety challenges currently facing NADS development (Araya et al., Citation2016; Bovenzi et al., Citation2020; Dou et al., Citation2019; Guarino et al., Citation2022; Jiang et al., Citation2014; Mirsky et al., Citation2018), a clear area of exploratory research into the heterogeneous anomaly potentials of network attacks emerge. Similarly, the development of an unsupervised approach for the automated mapping of threats aims to expedite anomaly research and reduce interpretation bias. Facilitating the next generation of NADS in overcoming the volume and security issues facing the 21st century (Bovenzi et al., Citation2020; Dahiya & Srivastava, Citation2018; Guarino et al., Citation2022; Mirsky et al., Citation2018).

1.1. Contribution

The primary contributions of this paper include the establishment of several unique heterogeneous anomalies, simultaneously explaining the hybridised potentials of Zoppi et al. (Citation2020), the thin class boundaries of Kiani et al. (Citation2020), and the previously documented contextual collective type of Jiang et al. (Citation2014). The paper also provides a methodology adapted from Zoppi et al. (Citation2020) to classify network threats in an unsupervised and automated manner. Further, this method is applied to several networking intrusion datasets, demonstrating the existence of all theorised heterogeneous anomaly types. Finally, the paper details the implications of these discoveries, highlighting their impact on future NADS development and, more broadly, how these findings alter the current network anomaly landscape.

1.2. Structure

The paper is structured as follows: Section 2 gives an overview of the pre-existing anomaly types, providing essential context to the topic. Section 3 then lays the foundations for our hypothesised heterogeneous anomaly types, detailing their theoretical underpinnings. Section 4 then describes an unsupervised approach for mapping attacks to anomaly types, the datasets utilised, and the overall experimental procedure. Section 5 presents the experimental results and associated mappings, with Section 6 providing relevant discussion on our results and their implications. Finally, Section 7 concludes the paper and highlights the direction of future work.

2. Traditional anomalies

Fundamentally, three distinct categories have been used to classify network anomalies: point, collective, and contextual (Ahmed et al., Citation2016; Ariyaluran Habeeb et al., Citation2018; Bhuyan et al., Citation2013; Chandola et al., Citation2009; Chen et al., Citation2020; Zhou & Guo, Citation2018; Zoppi et al., Citation2020), the details of which are presented throughout this section. While the various definitions of anomalies are critical to multiple fields, it is essential to remember that anomaly detection is the isolation of “uniqueness” amongst a proverbial “sea of variables” (Ariyaluran Habeeb et al., Citation2018; Chen et al., Citation2020).

2.1. Point anomalies

Point anomalies are classical data outliers (Ariyaluran Habeeb et al., Citation2018; Bhuyan et al., Citation2013; Chatterjee & Ahmed, Citation2022; Fernandes et al., Citation2019; Moustafa et al., Citation2019; Zoppi et al., Citation2020) and deemed instances that explicitly fall outside the expected scope of normality (Ariyaluran Habeeb et al., Citation2018; Bhuyan et al., Citation2013; Chatterjee & Ahmed, Citation2022; Zoppi et al., Citation2020). Characterised by distinct separation, they are often the simplest to detect (Zhou & Guo, Citation2018). However, they rely on monitoring suitable data features during their occurrence.

Table shows an example of a point anomaly, whereby instance eight falls outside the scope of normality, specifically regarding the “Duration” feature. In this case, instance eight would associate strongly with a point anomaly classification due to outright deviation from the global norm.

Establishment and mapping of heterogeneous anomalies in network intrusion datasets

Abstract

1. Introduction

1.1. Contribution

1.2. Structure

2. Traditional anomalies

2.1. Point anomalies

Table 1. Example of a point anomaly in the context of network traffic data.

2.2. Collective anomalies

Table 2. Example of a collective anomaly in the context of network traffic data.

2.3. Contextual anomalies

Table 3s. Example of a contextual anomaly in the context of network traffic data.

3. Heterogeneous anomalies

3.1. Serriform anomalies

Table 4. Example of a serriform anomaly in the context of network traffic data.

3.2. Unitextual anomalies

Table 5. Example of a unitextual anomaly in the context of network traffic data.

3.3. Polytextual anomalies

Table 6. Example of a polytextual anomaly in the context of network traffic data.

3.4. Polyform anomalies

Table 7. Example of a polyform anomaly in the context of network traffic data.

4. Methodology

4.1. Datasets

Table 8. Characteristics of the utilised datasets.

4.1.1. UNSW-NB15

4.1.2. NDSec-1

4.1.3. CIC-IDS2017

4.2. Pre-processing

4.3. Association and mapping

4.3.1. Point anomaly identification

4.3.2. Collective anomaly identification

4.3.3. Contextual anomaly identification

4.4. Anomaly type mapping

5. Experimental results

5.1. UNSW-NB15

Table 9. Analysis results for the UNSW-NB15 dataset.

Table 10. Heterogeneous mapping for the UNSW-NB15 dataset.

5.2. NDSec-1 BYOD

Table 11. Analysis results for the NDSec-1 BYOD dataset.

Table 12. Heterogeneous mapping for the NDSec-1 BYOD dataset.

5.3. NDSec-1 Botnet

Table 13. Analysis results for the NDSec-1 Botnet dataset.

Table 14. Heterogeneous mapping for the NDSec-1 Botnet dataset.

5.4. NDSec-1 Wateringhole

Table 15. Analysis results for the NDSec-1 Wateringhole dataset.

Table 16. Heterogeneous mapping for the NDSec-1 Wateringhole dataset.

5.5. CIC-IDS2017

Table 17. Analysis results for attacks in the CIC-IDS2017 dataset.

Table 18. Heterogeneous mapping for the CICIDS-2017 dataset.

6. Discussion

6.1. Implications

7. Conclusion and future work

7.1. Future works

Acronyms

Acknowledgements

Data Availability Statement

Disclosure statement

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date