XAI for Detecting Packer Signatures
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Software packing is defined as the practice of compressing or encrypting an executable and bundling it with a self-extracting stub. It is one of the most prevalent techniques used to conceal malware from static analysis tools. Detecting whether a file has been packed, and identifying which packer was used, is therefore a critical first step in any serious malware triage pipeline. Yet existing approaches either rely on brittle hand-built signatures or produce predictions that offer analysts no insight into why a file was flagged. This thesis presents an end-to-end framework that addresses both problems together. A raw-byte deep learning model is trained to detect packing and classify packer families directly from executable bytes, without manual feature engineering. Its decisions are then explained using byte-level attribution methods anchored to the PE file structure, and those explanations are validated causally to confirm they reflect genuine evidence rather than superficial patterns. A final GenAI layer translates the validated structural evidence into analyst-readable verdicts and candidate packer signatures. The proposed framework achieves 0.9939 test Area Under the Receiver Operating Characteristic (AUROC) and 0.9949 Average Precision for binary packed-versus-unpacked detection, and 96.51% test accuracy with 96.40% macro-F1 for multiclass packer-family classification. The explainability results show that the model relies on compact, structurally meaningful, and causally validated evidence regions. In the GenAI stage, enriched evidence yields 85.80% packer-family inference accuracy and 88.54% macro-F1, while the semantic output setting produces highly grounded candidate signatures, including a 93.49% signature grounding rate. Overall, the results show that software packing can be analyzed effectively through a closed-loop pipeline in which a raw-byte classifier predicts which class is present, PE-aware XAI localizes where the decisive evidence lies, and GenAI explains why that evidence is meaningful.