KerMinSVM for imbalanced datasets with a case study on arabic comics classification

dc.contributor.authorNayal, Ammar
dc.contributor.authorJomaa, Hadi Samer
dc.contributor.authorAwad, Mariette
dc.contributor.departmentDepartment of Electrical and Computer Engineering
dc.contributor.facultyMaroun Semaan Faculty of Engineering and Architecture (MSFEA)
dc.contributor.institutionAmerican University of Beirut
dc.date.accessioned2025-01-24T11:29:26Z
dc.date.available2025-01-24T11:29:26Z
dc.date.issued2017
dc.description.abstractMany studies have been performed to classify large-sized text documents using different classifiers, ranging from simple distance classifiers such as K-Nearest-Neighbor (KNN) to more advanced classifiers such as Support Vector Machines. Traditional approaches fail when a short text is encountered due to sparsity resulting from a limited number of words. Another common problem in text classification is class imbalance (CI). CI occurs when one class of the data contains most of the samples while the other class contains only a few. Standard classifiers, when applied to imbalanced data, result in high accuracy for the majority class and low accuracy for the minority one. We were motivated to propose a novel framework for classifying the content of Arabic comics; therefore, we propose KerMinSVM, a kernel extension of our previously proposed MinSVM coupled with a new dimensionality featuring a reduction scheme based on word root frequency ratios (WRFR). KerMinSVM was tested on multiple imbalanced benchmark datasets, and the results were verified using three measures: accuracy, F-measure, and statistical analysis. WRFR was applied to the manual construction of the Arabic comic text dataset to detect strong content in children's comic books. Test results revealed that our proposed framework outperformed most of the methods for imbalanced datasets and short text classification. © 2017 Elsevier Ltd
dc.identifier.doihttps://doi.org/10.1016/j.engappai.2017.01.001
dc.identifier.eid2-s2.0-85008627332
dc.identifier.urihttp://hdl.handle.net/10938/27220
dc.language.isoen
dc.publisherElsevier Ltd
dc.relation.ispartofEngineering Applications of Artificial Intelligence
dc.sourceScopus
dc.subjectArabic comics analysis
dc.subjectImbalance datasets
dc.subjectNatural language processing
dc.subjectSupervised classification
dc.subjectSupport vector machines
dc.subjectNatural language processing systems
dc.subjectNearest neighbor search
dc.subjectText processing
dc.subjectDistance classifiers
dc.subjectK nearest neighbor (knn)
dc.subjectShort text classifications
dc.subjectTraditional approaches
dc.subjectClassification (of information)
dc.titleKerMinSVM for imbalanced datasets with a case study on arabic comics classification
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2017-9044.pdf
Size:
1.56 MB
Format:
Adobe Portable Document Format