Article Detail

Loading...

Keywords:

Malware
Android malware
feature selection
random forest

Abstract

This study presents a machine learning workflow developed for the detection and classification of Android-based ransomware using the TUANDROMD dataset. The workflow constitutes an integrated process that begins with the automated retrieval and extraction of data. During preprocessing, consistency of the target variable was ensured, and the dataset was partitioned into training (80%) and testing (20%) subsets while maintaining class balance. To prevent data leakage and enhance reproducibility, the model architecture was structured within a pipeline framework. This pipeline sequentially incorporates Feature Scaling, Feature Selection, and Classification stages. To improve the generalization capability of the model, hyperparameters such as the number of trees (n_estimators) and maximum depth (max_depth) were optimized using GridSearch with three-fold cross-validation. The best-performing model was subsequently evaluated on the reserved test set through a classification report and confusion matrix. Finally, the resulting model was serialized and stored in .pkl format for future use.

References

  • Bensaoud, A., Kalita, J., Bensaoud, M. (2024). A survey of malware detection using deep learning. Machine Learning With Applications, 16, 100546.
  • Bergstra, J., Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281-305.
  • Bhattacharya, A., Goswami, R.T. (2018). Community based feature selection method for detection of android malware. Journal of Global Information Management, 26(3), 54-77.
  • Borah, P., Bhattacharyya, D.K., Kalita, J.K. (2020). Malware dataset generation and evaluation. In: Book of Proceedings. 2020 IEEE 4th Conference on Information & Communication Technology (CICT), 03-05 December 2020, Chennai, India, pp. 1-6.
  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
  • Brown, A., Gupta, M., Abdelsalam, M. (2024). Automated machine learning for deep learning based malware detection. Computers & Security, 137, 103582.
  • Falana, O.J., Sodiya, A.S., Onashoga, S.A., Badmus, B.S. (2022). Mal-Detect: An intelligent visualization approach for malware detection. Journal of King Saud University-Computer and Information Sciences, 34(5), 1968-1983.
  • Guyon, I., Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157-1182.
  • Han, J., Kamber, M., Pei, J. (2012). Data mining: concepts and techniques. Morgan Kaufmann.
  • Iqubal, A., Tiwari, S.K., Azad, S., Paswan, M.K. (2024). Android based malware detection technique using machine learning algorithms. In: Book of Proceedings. 2024 First International Conference on Pioneering Developments in Computer Science & Digital Technologies (IC2SDT), 02-04 August 2024, Delhi, India, pp. 610-615.
  • Mahindru, A., Arora, H., Kumar, A., Gupta, S.K., Mahajan, S., Kadry, S., Kim, J. (2024). PermDroid a framework developed using proposed feature selection approach and machine learning techniques for Android malware detection. Scientific Reports, 14(1), 10724.
  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
  • Polatidis, N., Kapetanakis, S., Trovati, M., Korkontzelos, I., Manolopoulos, Y. (2024). FSSDroid: Feature subset selection for Android malware detection. World Wide Web, 27(5), 50.
  • Rahman, M.M., Hossain, M.D., Ochiai, H., Kadobayashi, Y., Sakib, T., Ramadan, S.T.Y. (2024). Vision Based Malware Classification Using Deep Neural Network with Hybrid Data Augmentation. In: Book of Proceedings. 10th International Conference on Information Systems Security and Privacy (ICISSP), 26-28 February 2024, Rome, Italy, pp. 823-830.
  • Smmarwar, S.K., Gupta, G.P., Kumar, S. (2024). Android malware detection and identification frameworks by leveraging the machine and deep learning techniques: A comprehensive review. Telematics and Informatics Reports, 14, 100130.
  • Sokolova, M., Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427-437.
  • Wajahat, A., He, J., Zhu, N., Mahmood, T., Saba, T., Khan, A.R., Alamri, F.S. (2024). Outsmarting Android Malware with Cutting-Edge Feature Engineering and Machine Learning Techniques. Computers, Materials & Continua, 79(1). 651-673.

Article Summery

ISSN : 3023-7343

Volume 2 Issue 4

Doi : 10.5281/zenodo.18075972

Submission Date: 2025-09-09

Accepted Date : 2025-10-23

Available Online : 2025-12-28

Publication Date :2025-12-29

How to Cite

Cite as :

Doğan, N. (2025). Identifying the Dominant Attribute in Android-Based Malware Detection. Tethys Environmental Science, 2(4), 199-205, doi : 10.5281/zenodo.18075972