Seeing Through NAT to Detect Shadow IT: A Machine Learning Approach

Nassar, Reem

AUB ScholarWorks Home
→
Students Publications
→
AUB Students' Theses, Dissertations, and Projects
→
View Item

dc.contributor.advisor	Kayssi, Ayman
dc.contributor.author	Nassar, Reem
dc.date.accessioned	2022-12-21T11:54:02Z
dc.date.available	2022-12-21T11:54:02Z
dc.date.issued	12/21/2022
dc.date.submitted	12/21/2022
dc.identifier.uri	http://hdl.handle.net/10938/23782
dc.description.abstract	Network Address Translation (NAT) is present in many routers and Customer Premise Equipment (CPEs). It is used to distribute internet access to several local hosts. Most NAT devices implement Port Address Translation (PAT), which allows mapping multiple private IP addresses to a single public IP address. The private network behind a NAT becomes hidden from the public internet and only a single outward IP address will be visible to Internet Service Providers (ISP’s). With the proliferation of unauthorized wired and wireless NAT routers, internet subscribers can re-distribute an internet connection or deploy hidden devices, thus causing a problem known as shadow IT. To this end, it is of ISP’s interest to know how their services are used. This study will propose a method to detect NAT devices and identify the size of the network (number of hosts) hidden behind them. A supervised Machine Learning (ML) algorithm that uses aggregated network traffic flow features is proposed to detect NAT devices. Traffic features are aggregated within multiple window sizes to study the effect of feature aggregation on NAT detection. The host counting algorithm is processed by a machine learning approach on real network traffic features. This research demonstrates that eXtreme Gradient Boosting (XGBoost) performs best in NAT detection and hidden network size detection. Whereas the Random Forest (RF) classifier was more able to predict the exact number of hidden hosts than any other algorithm. The XGBoost NAT detection model can detect NAT devices with a 97.09% F1 score which significantly outperforms many state-of-the-art methods. The exact host counting model resulted in a 65.53% F1 score, and the result increased to 90.63% after transforming the problem into a binary one. Most previous methods focused on achieving a high detection rate on given datasets instead of focusing on the model’s generalizability. However, this thesis focuses on the performance of the detection algorithms especially when the network data is subjected to intended obfuscation or even when there is an environment change. The performance of detection models dropped below 70% when testing the model in a new network environment. In this thesis we also focus on interpreting the behavior of the complex algorithm to enhance trust in the results, understand the generalizability, and explain the importance of feature aggregation in case of NAT. Two eXplainable Artificial Intelligence (XAI) methods are used to analyze the generalizability of a given feature set to different network environments or after performing obfuscation techniques. These methods are also used to study the sensitivity of the detection algorithms to the aggregated feature set extracted. Finally, this study uses transfer learning to build an optimized model that can work in case of any feature change in the network traffic data.
dc.language.iso	en
dc.subject	Network Address Translation
dc.subject	NAT
dc.subject	Network Security
dc.subject	Passive Detection
dc.subject	Client Counting
dc.subject	Machine Learning
dc.subject	NAT Detectiom
dc.subject	Host Identification
dc.subject	User Anonymity
dc.title	Seeing Through NAT to Detect Shadow IT: A Machine Learning Approach
dc.type	Thesis
dc.contributor.department	Department of Electrical and Computer Engineering
dc.contributor.faculty	Maroun Semaan Faculty of Engineering and Architecture
dc.contributor.institution	American University of Beirut
dc.contributor.commembers	Elhajj, Imad
dc.contributor.commembers	Hajj, Hazem
dc.contributor.degree	Master of Engineering (ME)
dc.contributor.AUBidnumber	202123028

Files in this item

Name: NassarReem_2022.pdf

Size: 1.989Mb

Format: PDF

Description: Thesis Document for ...

View/Open

This item appears in the following Collection(s)

AUB Students' Theses, Dissertations, and Projects [12709]

Show simple item record

Search AUB ScholarWorks

Browse

All of AUB ScholarWorks
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

My Account

Copyright Statement

All materials included in the institutional repository are protected by copyright laws and are the property of their respective copyright holders. Materials may be used for non-commercial, educational, or research purposes only, and must be cited or attributed to the original source. Permission for any other use must be obtained from the copyright holder(s) directly. The American University of Beirut Libraries does not assume responsibility for any infringement of copyright laws that may occur as a result of the use of materials in the repository. If you believe that your copyright has been infringed upon in the repository, please contact the AUB Libraries immediately.

For further information, please contact us at scholarworks@aub.edu.lb