Generalized Machine Learning Based Network Traffic Classification

Chaiban, Jean Paul

AUB ScholarWorks Home
→
Students Publications
→
AUB Students' Theses, Dissertations, and Projects
→
View Item

dc.contributor.advisor	Elhajj, Imad
dc.contributor.author	Chaiban, Jean Paul
dc.date.accessioned	2023-01-10T11:53:55Z
dc.date.available	2023-01-10T11:53:55Z
dc.date.issued	1/10/2023
dc.date.submitted	1/10/2023
dc.identifier.uri	http://hdl.handle.net/10938/23854
dc.description.abstract	With the exponential rise in online activity, Internet Service Providers (ISPs) have prioritized network traffic classification in order to dynamically adapt their networks to best serve their customers while increasing their gains. While most work on machine learning based classification studied different models and the best techniques to solve the issue, none studied the effect of the traffic capture location on the model and whether a model could be generalized to work effectively with different flow capture directions. The aim of this work is to find the best approach in creating network traffic classification models that are, from one side, capable of generalizing to different environments, while being able to target narrower classes in internet traffic and from the other side, adaptive, scalable and performant in different production environments. While most previous work attempted separating general classes such as SSH traffic, VPN traffic and HTTP/HTTPS, we attempt to separate very similar classes related to gaming that use common protocols and backends with the added complexity of background noise traffic. Another contribution of this work is tackling the traffic direction problem, which is directly related to the traffic capture location. Since no multi location dataset was available, this work is limited in this regards. This problem was addressed by training and testing our models versus each of the directions of the flows apart followed by the full flow comparison. To this end, our approach to solve this issue is two-fold. From one side we attempt to tackle generalizability loss versus traffic capture direction. We thus attempt to create several models and test their generalizability. From the other side, we tackle another issue with generalizability which is the applicability of the same machine learning models used in previous work in classifying narrower classes. Using the Gaming Network Traffic Dataset, we attempt to classify gaming network traffic with much narrower user activity classes than previous work. We create several models: random forest the state-of-the-art algorithm, with pre-engineered features such as interarrival times, packet length and other flow statistics, as a baseline, which obtained a testing accuracy of 44.14%. The second Convolutional Neural Network (CNN) based deep learning model, also created based on previous work, having as input raw network traffic converted into either a grayscale or RGB image, where the optimal bi-flow grayscale model resulted with a testing accuracy of 47.24%. The third model, a deeper CNN-Long Short Term Memory (LSTM) based version that takes into consideration the temporal dimension of consecutive flows obtained a testing accuracy of 52.27% surpassing both the random forest and CNN state-of-the-art models. This model also consistently showed significant increase in accuracy versus the client-server side traffic where traffic categories are harder to separate. Finally, our proposed semi supervised stacked CNN – random forest model obtained a testing accuracy of 53.4%. We then analyze and compare the results of the proposed simple CNN model and the proposed semi-supervised CNN-random forest architecture for different datasets. The proposed algorithm proved to perform best versus the Gaming Network Traffic Dataset in specific surpassing both the CNN state-of-the-art algorithm and our previous LSTM-based model. This result was however bound to the dataset, provided it showed very slight improvement versus timeseries based datasets but worse results in regular image classification tasks where it also fell behind models mentioned in previous work. Model generalizability will have huge impact in future model development and large-scale deployment and adaptation to different networks. Future work will study further model optimization for the Gaming Network Dataset, the collection of a multi location dataset with a bigger number of samples on internet service provider premises, work on improved generalizability and the application of other possible more complex recurrent network structures from one side and unsupervised clustering algorithms from the other side in choosing the initial model class subgroups used.
dc.language.iso	en
dc.subject	machine learning
dc.subject	deep learning
dc.subject	CNN
dc.subject	LSTM
dc.subject	network traffic
dc.subject	classification
dc.subject	unsupervised learning
dc.subject	DBSCAN
dc.subject	cybersecurity
dc.title	Generalized Machine Learning Based Network Traffic Classification
dc.type	Thesis
dc.contributor.department	Department of Electrical and Computer Engineering
dc.contributor.faculty	Maroun Semaan Faculty of Engineering and Architecture
dc.contributor.institution	American University of Beirut
dc.contributor.commembers	Kayssi, Ayman
dc.contributor.commembers	Hajj, Hazem
dc.contributor.degree	ME
dc.contributor.AUBidnumber	202124523

Files in this item

Name: ChaibanJeanPaul_2 ...

Size: 2.906Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

AUB Students' Theses, Dissertations, and Projects [12714]

Show simple item record

Search AUB ScholarWorks

Browse

All of AUB ScholarWorks
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

My Account

Copyright Statement

All materials included in the institutional repository are protected by copyright laws and are the property of their respective copyright holders. Materials may be used for non-commercial, educational, or research purposes only, and must be cited or attributed to the original source. Permission for any other use must be obtained from the copyright holder(s) directly. The American University of Beirut Libraries does not assume responsibility for any infringement of copyright laws that may occur as a result of the use of materials in the repository. If you believe that your copyright has been infringed upon in the repository, please contact the AUB Libraries immediately.

For further information, please contact us at scholarworks@aub.edu.lb