|
Optimization of Heuristic Analysis of Executable Files
Wiglasz, Michal ; Křoustek, Jakub (referee) ; Hruška, Tomáš (advisor)
This BSc Thesis was performed during a study stay at the Universita della Svizzera italiana, Swiss. This thesis describes the implementation of a classification tool for detection of unknown malware based on their behaviour which could replace current solution, based on manually chosen attributes'scores and a threshold. The database used for training and testing was provided by AVG Technologies company, which specializes in antivirus and security systems. Five different classifiers were compared in order to find the best one for implementation: Naive Bayes, a decision tree, RandomForrest, a neural net and a support vector machine. After series of experiments, the Naive Bayes classifier was selected. The implemented application covers all necessary steps: attribute extraction, training, estimation of the performance and classification of unknown samples. Because the company is willing to tolerate false positive rate of only 1% or less, the accuracy of the implemented classifier is only 61.7%, which is less than 1% better than the currently used approach. However it provides automation of the learning process and allows quick re-training (in average around 12 seconds for 90 thousand training samples).
|
| |
| |
| |
|
Optimization of Heuristic Analysis of Executable Files
Wiglasz, Michal ; Křoustek, Jakub (referee) ; Hruška, Tomáš (advisor)
This BSc Thesis was performed during a study stay at the Universita della Svizzera italiana, Swiss. This thesis describes the implementation of a classification tool for detection of unknown malware based on their behaviour which could replace current solution, based on manually chosen attributes'scores and a threshold. The database used for training and testing was provided by AVG Technologies company, which specializes in antivirus and security systems. Five different classifiers were compared in order to find the best one for implementation: Naive Bayes, a decision tree, RandomForrest, a neural net and a support vector machine. After series of experiments, the Naive Bayes classifier was selected. The implemented application covers all necessary steps: attribute extraction, training, estimation of the performance and classification of unknown samples. Because the company is willing to tolerate false positive rate of only 1% or less, the accuracy of the implemented classifier is only 61.7%, which is less than 1% better than the currently used approach. However it provides automation of the learning process and allows quick re-training (in average around 12 seconds for 90 thousand training samples).
|
| |