1. bookVolume 9 (2019): Issue 2 (April 2019)
Journal Details
License
Format
Journal
eISSN
2449-6499
First Published
30 Dec 2014
Publication timeframe
4 times per year
Languages
English
access type Open Access

Supposed Maximum Mutual Information for Improving Generalization and Interpretation of Multi-Layered Neural Networks

Published Online: 31 Dec 2018
Page range: 123 - 147
Received: 06 Feb 2018
Accepted: 13 Aug 2018
Journal Details
License
Format
Journal
eISSN
2449-6499
First Published
30 Dec 2014
Publication timeframe
4 times per year
Languages
English
Abstract

The present paper1 aims to propose a new type of information-theoretic method to maximize mutual information between inputs and outputs. The importance of mutual information in neural networks is well known, but the actual implementation of mutual information maximization has been quite difficult to undertake. In addition, mutual information has not extensively been used in neural networks, meaning that its applicability is very limited. To overcome the shortcoming of mutual information maximization, we present it here in a very simplified manner by supposing that mutual information is already maximized before learning, or at least at the beginning of learning. The method was applied to three data sets (crab data set, wholesale data set, and human resources data set) and examined in terms of generalization performance and connection weights. The results showed that by disentangling connection weights, maximizing mutual information made it possible to explicitly interpret the relations between inputs and outputs.

Keywords

[1] R. Kamimura, Mutual information maximization for improving and interpreting multi-layered neural network, in Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence (SSCI) (SSCI 2017), 2017.10.1109/SSCI.2017.8285182Search in Google Scholar

[2] R. Linsker, Self-organization in a perceptual network, Computer, vol. 21, no. 3, pp. 105–117, 1988.10.1109/2.36Search in Google Scholar

[3] R. Linsker, How to generate ordered maps by maximizing the mutual information between input and output signals, Neural computation, vol. 1, no. 3, pp. 402–411, 1989.10.1162/neco.1989.1.3.402Search in Google Scholar

[4] R. Linsker, Local synaptic learning rules suffice to maximize mutual information in a linear network, Neural Computation, vol. 4, no. 5, pp. 691–702, 1992.10.1162/neco.1992.4.5.691Search in Google Scholar

[5] R. Linsker, Improved local learning rule for information maximization and related applications, Neural networks, vol. 18, no. 3, pp. 261–265, 2005.10.1016/j.neunet.2005.01.002Search in Google Scholar

[6] R. Battiti, Using mutual information for selecting features in supervised neural net learning, IEEE Transactions on neural networks, vol. 5, no. 4, pp. 537–550, 1994.10.1109/72.29822418267827Search in Google Scholar

[7] S. Becker, Mutual information maximization: models of cortical self-organization, Network: Computation in Neural Systems, vol. 7, pp. 7–31, 1996.10.1080/0954898X.1996.1197865329480142Search in Google Scholar

[8] G. Deco, W. Finnoff, and H. Zimmermann, Unsupervised mutual information criterion for elimination of overtraining in supervised multilayer networks, Neural Computation, vol. 7, no. 1, pp. 86–107, 1995.10.1162/neco.1995.7.1.86Search in Google Scholar

[9] G. Deco and D. Obradovic, An information-theoretic approach to neural computing. Springer Science & Business Media, 2012.Search in Google Scholar

[10] J. C. Principe, D. Xu, and J. Fisher, Information theoretic learning, Unsupervised adaptive filtering, vol. 1, pp. 265–319, 2000.Search in Google Scholar

[11] J. C. Principe, Information theoretic learning: Renyi’s entropy and kernel perspectives, Springer Science & Business Media, 2010.10.1007/978-1-4419-1570-2Search in Google Scholar

[12] P. A. Estévez, M. Tesmer, C. A. Perez, and J. M. Zurada, Normalized mutual information feature selection, IEEE Transactions on Neural Networks, vol. 20, no. 2, pp. 189–201, 2009.10.1109/TNN.2008.200560119150792Search in Google Scholar

[13] P. Comon, Independent component analysis, Higher-Order Statistics, pp. 29–38, 1992.Search in Google Scholar

[14] A. J. Bell and T. J. Sejnowski, The independent components of natural scenes are edge filters, Vision research, vol. 37, no. 23, pp. 3327–3338, 1997.Search in Google Scholar

[15] A. Hyvärinen and E. Oja, Independent component analysis: algorithms and applications, Neural networks, vol. 13, no. 4, pp. 411–430, 2000.10.1016/S0893-6080(00)00026-5Search in Google Scholar

[16] P. Comon, Independent component analysis: a new concept, Signal Processing, vol. 36, pp. 287–314, 1994.10.1016/0165-1684(94)90029-9Search in Google Scholar

[17] A. Bell and T. J. Sejnowski, An information-maximization approach to blind separation and blind deconvolution, Neural Computation, vol. 7, no. 6, pp. 1129–1159, 1995.Search in Google Scholar

[18] J. Karhunen, A. Hyvarinen, R. Vigário, J. Hurri, and E. Oja, Applications of neural blind separation to signal and image processing, in Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, vol. 1, pp. 131–134, IEEE, 1997.Search in Google Scholar

[19] H. B. Barlow, Unsupervised learning, Neural computation, vol. 1, no. 3, pp. 295–311, 1989.10.1162/neco.1989.1.3.295Search in Google Scholar

[20] H. B. Barlow, T. P. Kaushal, and G. J. Mitchison, Finding minimum entropy codes, Neural Computation, vol. 1, no. 3, pp. 412–423, 1989.10.1162/neco.1989.1.3.412Search in Google Scholar

[21] R. Kamimura, Simple and stable internal representation by potential mutual information maximization, in International Conference on Engineering Applications of Neural Networks, pp. 309–316, Springer, 2016.10.1007/978-3-319-44188-7_23Search in Google Scholar

[22] R. Kamimura, Self-organizing selective potentiality learning to detect important input neurons, in Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on, pp. 1619–1626, IEEE, 2015.Search in Google Scholar

[23] R. Kamimura, Collective interpretation and potential joint information maximization, in Intelligent Information Processing VIII: 9th IFIP TC 12 International Conference, IIP 2016, Melbourne, VIC, Australia, November 18-21, 2016, Proceedings 9, pp. 12–21, 2016. Springer.10.1007/978-3-319-48390-0_2Search in Google Scholar

[24] R. Kamimura, Repeated potentiality assimilation: simplifying learning procedures by positive, independent and indirect operation for improving generalization and interpretation, in Neural Networks (IJCNN), 2016 International Joint Conference on, pp. 803–810, IEEE, 2016.10.1109/IJCNN.2016.7727282Search in Google Scholar

[25] R. Kamimura, Collective mutual information maximization to unify passive and positive approaches for improving interpretation and generalization, Neural Networks, vol. 90, pp. 56–71, 2017.10.1016/j.neunet.2017.03.001Search in Google Scholar

[26] R. Kamimura, Direct potentiality assimilation for improving multi-layered neural networks, in Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, pp. 19–23, 2017.10.15439/2017F552Search in Google Scholar

[27] R. Andrews, J. Diederich, and A. B. Tickle, Survey and critique of techniques for extracting rules from trained artificial neural networks, Knowledge-based systems, vol. 8, no. 6, pp. 373–389, 1995.10.1016/0950-7051(96)81920-4Search in Google Scholar

[28] J. M. Benítez, J. L. Castro, and I. Requena, Are artificial neural networks black boxes?, IEEE Transactions on Neural Networks, vol. 8, no. 5, pp. 1156–1164, 1997.Search in Google Scholar

[29] M. Ishikawa, Rule extraction by successive regularization, Neural Networks, vol. 13, no. 10, pp. 1171–1183, 2000.Search in Google Scholar

[30] T. Q. Huynh and J. A. Reggia, Guiding hidden layer representations for improved rule extraction from neural networks, IEEE Transactions on Neural Networks, vol. 22, no. 2, pp. 264–275, 2011.10.1109/TNN.2010.2094205Search in Google Scholar

[31] B. Mak and T. Munakata, Rule extraction from expert heuristics: a comparative study of rough sets with neural network and ID3, European journal of Operational Research, vol. 136, pp. 212–229, 2002.10.1016/S0377-2217(01)00062-5Search in Google Scholar

[32] J. Yosinski, J. Clune, T. Fuchs, and H. Lipson, Understanding neural networks through deep visualization, in In ICML Workshop on Deep Learning, Citeseer, 2015.Search in Google Scholar

[33] D. Erhan, Y. Bengio, A. Courville, and P. Vincent, Visualizing higher-layer features of a deep network, University of Montreal, vol. 1341, 2009.Search in Google Scholar

[34] J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks, vol. 61, pp. 85–117, 2015.10.1016/j.neunet.2014.09.00325462637Search in Google Scholar

[35] M. G. Cardoso, Logical discriminant models, in Quantitative Modelling In Marketing And Management, pp. 223–253, World Scientific, 2013.10.1142/9789814407724_0008Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo