Malware
Android malware
feature selection
random forest
This study presents a machine learning workflow developed for the detection and classification of Android-based ransomware using the TUANDROMD dataset. The workflow constitutes an integrated process that begins with the automated retrieval and extraction of data. During preprocessing, consistency of the target variable was ensured, and the dataset was partitioned into training (80%) and testing (20%) subsets while maintaining class balance. To prevent data leakage and enhance reproducibility, the model architecture was structured within a pipeline framework. This pipeline sequentially incorporates Feature Scaling, Feature Selection, and Classification stages. To improve the generalization capability of the model, hyperparameters such as the number of trees (n_estimators) and maximum depth (max_depth) were optimized using GridSearch with three-fold cross-validation. The best-performing model was subsequently evaluated on the reserved test set through a classification report and confusion matrix. Finally, the resulting model was serialized and stored in .pkl format for future use.