1. bookVolume 8 (2018): Issue 1 (January 2018)
Journal Details
License
Format
Journal
First Published
30 Dec 2014
Publication timeframe
4 times per year
Languages
English
access type Open Access

Classifiers Accuracy Improvement Based on Missing Data Imputation

Published Online: 01 Nov 2017
Page range: 31 - 48
Received: 14 Feb 2017
Accepted: 28 Mar 2017
Journal Details
License
Format
Journal
First Published
30 Dec 2014
Publication timeframe
4 times per year
Languages
English

In this paper we investigate further and extend our previous work on radar signal identification and classification based on a data set which comprises continuous, discrete and categorical data that represent radar pulse train characteristics such as signal frequencies, pulse repetition, type of modulation, intervals, scan period, scanning type, etc. As the most of the real world datasets, it also contains high percentage of missing values and to deal with this problem we investigate three imputation techniques: Multiple Imputation (MI); K-Nearest Neighbour Imputation (KNNI); and Bagged Tree Imputation (BTI). We apply these methods to data samples with up to 60% missingness, this way doubling the number of instances with complete values in the resulting dataset. The imputation models performance is assessed with Wilcoxon’s test for statistical significance and Cohen’s effect size metrics. To solve the classification task, we employ three intelligent approaches: Neural Networks (NN); Support Vector Machines (SVM); and Random Forests (RF). Subsequently, we critically analyse which imputation method influences most the classifiers’ performance, using a multiclass classification accuracy metric, based on the area under the ROC curves. We consider two superclasses (‘military’ and ‘civil’), each containing several ‘subclasses’, and introduce and propose two new metrics: inner class accuracy (IA); and outer class accuracy (OA), in addition to the overall classification accuracy (OCA) metric. We conclude that they can be used as complementary to the OCA when choosing the best classifier for the problem at hand.

Keywords

[1] C. Enders, Applied missing data analysis. Guilford Press, New York, 2010.Search in Google Scholar

[2] J. Osborne, Best Practices in Data Cleaning. SAGE, 2013.Search in Google Scholar

[3] P. Schmitt, J. Mandel, M. Guedj, A Comparison of Six Methods for Missing Data Imputation. Journal of Biometrics & Biostatistics, 6(1), 2015, 1-6.Search in Google Scholar

[4] G. Ridgeway, Generalized Boosted Models: A guide to the gbm package. Update 1.1, 2007. www.saedsayad.com/docs/gbm2.pdf. Accessed 20 October 2016.Search in Google Scholar

[5] M. Richards, Fundamentals of radar signal processing. Tata McGraw-Hill Education, 2005.Search in Google Scholar

[6] I. Jordanov, N. Petrov, Intelligent Radar Signal Recognition and Classification. In Abielmona, R., Falcon, R., Zincir-Heywood, N., Abbass, H. (eds.) Recent Advances in Computational Intelligence in Defense and Security, 2016, 101-135.Search in Google Scholar

[7] I. Jordanov, N. Petrov, A. Petrozziello, Supervised radar signal classification. Neural Networks (IJCNN), 2016 International Joint Conference on. IEEE., 2016, 1464-1471.Search in Google Scholar

[8] L. Carro-Calvo, et al., An evolutionary multiclass algorithm for automatic classification of high range resolution radar targets. Integrated Computer-Aided Engineering, 16(1), 2009, 51-60.Search in Google Scholar

[9] E. Granger, M. Rubin, S. Grossberg, P. Lavoie, A What-and-Where fusion neural network for recognition and tracking of multiple radar emitters. Neural Networks, 14 (3), 2001, 325-344.10.1016/S0893-6080(01)00019-3Open DOISearch in Google Scholar

[10] S. Maytal, F. Provost, Handling missing values when applying classification models. Journal of Machine Learning Research, 8, 2007, 1625-1657.Search in Google Scholar

[11] N. Ibrahim, R. Abdullah, M. Saripan, Artificial neural network approach in radar target classification. Journal of Computer Science, 5(1), 2009, 23.Search in Google Scholar

[12] M. Ahmadlou, H. Adeli, Enhanced probabilistic neural network with local decision circles: A robust classifier. Integrated Computer-Aided Engineering, 17(3), 2010, 197-210.Search in Google Scholar

[13] Z. Yin, W. Yang, Z. Yang, L. Zuo, H. Gao, A study on radar emitter recognition based on SPDS neural network. Information Technology Journal, 10(4), 2011, 883-888.Search in Google Scholar

[14] M. Gong, J. Zhao, J. Liu, Q. Miao, L. Jiao, Change Detection in Synthetic Aperture Radar Images Based on Deep Neural Networks, IEEE Trans. on Neural Networks and Learning Systems, 27(1), 2016, 125-138.Search in Google Scholar

[15] C. Shieh, C. Lin, A vector neural network for emitter identification. IEEE Trans. on Antennas and Propagation, 50(8), 2002, 1120-1127.Search in Google Scholar

[16] S. Zhai, T. Jiang, A new sense-through-foliage target recognition method based on hybrid differential evolution and self-adaptive particle swarm optimization-based support vector machine, Neurocomputing, 149(1), 2015, 573-584.Search in Google Scholar

[17] Z. Xin, W. Ying, Y. Bin, Signal classification method based on support vector machine and high-order cumulants. Wireless Sensor Network, 2(1), 2010, 48-52.Search in Google Scholar

[18] E. Abdulkadir, I. Onaran, Pulse Doppler radar target recognition using a two-stage SVM procedure. Aerospace and Electronic Systems, 47(2), 2011, 1450-1457.Search in Google Scholar

[19] A. Karatzoglou, M. David, H. Kurt, Support vector machines in R, Department of Statistics and Mathematics, WU Vienna University of Economics and Business, 2005.Search in Google Scholar

[20] L. Breiman, Random forests. Machine Learning, 45(1), 2001, 5-32.10.1023/A:1010933404324Open DOISearch in Google Scholar

[21] A. Yali, D. Geman, Shape quantization and recognition with randomized trees. Neural computation, 9(7), 1997, 1545-1588.Search in Google Scholar

[22] M. Fernandez-Delgado, E. Cernadas, S. Barro, D. Amorim, Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15(1), 2014, 3133-3181.Search in Google Scholar

[23] M. Wainberg, B. Alipanahi, B. Frey, Are Random Forests Truly the Best Classifiers? Journal of Machine Learning Research 17, 2016, 1-5.Search in Google Scholar

[24] I. Jordanov, N. Petrov, Sets with Incomplete and Missing Data – NN Radar Signal Classification. IEEE WCCI’14 World Congress on Computational Intelligence, Beijing, China, 2014, 218-225.Search in Google Scholar

[25] R. Geaur, Z. Islam, A decision tree-based missing value imputation technique for data pre-processing. Proceedings of the Ninth Australasian Data Mining Conference, 121, 2011, 41-50.Search in Google Scholar

[26] A. Feelders, Handling missing data in trees surrogate splits or statistical imputation? Principles of Data Mining and Knowledge Discovery. Springer Berlin Heidelberg, 2009, 329-334.Search in Google Scholar

[27] A. Petrozziello, I. Jordanov, Data Analytics for Online Travelling Recommendation System: A Case Study. Proceedings of the IASTED International Conference Modelling, Identification and Control (MIC 2017), Innsbruck, Austria, 2017, 106-112.Search in Google Scholar

[28] M. Templ, A. Kowarik, P. Filzmoser, Iterative stepwise regression imputation using standard and robust methods. Journal of Computational Statistics and Data Analysis, 55, 2011, 2793-2806.10.1016/j.csda.2011.04.012Open DOISearch in Google Scholar

[29] S. Verboven, K. Branden, P. Goos, Sequential imputation for missing values. Computational Biology and Chemistry, 31(5), 2007, 320-327.10.1016/j.compbiolchem.2007.07.001Open DOISearch in Google Scholar

[30] F. Sarro, A. Petrozziello, M. Harman, Multi-objective software effort estimation. Proceedings of the 38th International Conference on Software Engineering, ACM, 2016, 619-630).Search in Google Scholar

[31] J. Cohen, Statistical power analysis for the behavioural sciences. Routledge, New York, 2013.Search in Google Scholar

[32] P. Dalgaard, Introductory Statistics with R. Springer, New York, 2008.Search in Google Scholar

[33] J. Huang, C. Ling, Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 17(3), 2005, 299-310.10.1109/TKDE.2005.50Open DOISearch in Google Scholar

[34] D. Hand, R. Till, A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine learning, 45(2), 2001, 171-186.Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo