1. bookVolume 32 (2022): Edition 2 (June 2022)
    Towards Self-Healing Systems through Diagnostics, Fault-Tolerance and Design (Special section, pp. 171-269), Marcin Witczak and Ralf Stetter (Eds.)
Détails du magazine
License
Format
Magazine
eISSN
2083-8492
Première parution
05 Apr 2007
Périodicité
4 fois par an
Langues
Anglais
access type Accès libre

Revisiting Strategies for Fitting Logistic Regression for Positive and Unlabeled Data

Publié en ligne: 04 Jul 2022
Volume & Edition: Volume 32 (2022) - Edition 2 (June 2022)<br/>Towards Self-Healing Systems through Diagnostics, Fault-Tolerance and Design (Special section, pp. 171-269), Marcin Witczak and Ralf Stetter (Eds.)
Pages: 299 - 309
Reçu: 05 Nov 2021
Accepté: 10 Feb 2022
Détails du magazine
License
Format
Magazine
eISSN
2083-8492
Première parution
05 Apr 2007
Périodicité
4 fois par an
Langues
Anglais
Abstract

Positive unlabeled (PU) learning is an important problem motivated by the occurrence of this type of partial observability in many applications. The present paper reconsiders recent advances in parametric modeling of PU data based on empirical likelihood maximization and argues that they can be significantly improved. The proposed approach is based on the fact that the likelihood for the logistic fit and an unknown labeling frequency can be expressed as the sum of a convex and a concave function, which is explicitly given. This allows methods such as the concave-convex procedure (CCCP) or its variant, the disciplined convex-concave procedure (DCCP), to be applied. We show by analyzing real data sets that, by using the DCCP to solve the optimization problem, we obtain significant improvements in the posterior probability and the label frequency estimation over the best available competitors.

Keywords

Bahorik, A.L., Newhill, C.E., Queen, C.C. and Eack, S.M. (2014). Under-reporting of drug use among individuals with schizophrenia: Prevalence and predictors, Psychological Medicine 44(12): 61–69, DOI: 10.1017/S0033291713000548.23551851 Ouvrir le DOISearch in Google Scholar

Bekker, J. and Davis, J. (2018). Estimating the class prior in positive and unlabeled data through decision tree induction, Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, USA 32(1): 2712–2719.10.1609/aaai.v32i1.11715 Search in Google Scholar

Bekker, J. and Davis, J. (2020). Learning from positive and unlabeled data: A survey, Machine Learning 109(4): 719–760, DOI: 10.1007/s10994-020-05877-5. Ouvrir le DOISearch in Google Scholar

Bekker, J., Robberechts, P. and Davis, J. (2019). Beyond the selected completely at random assumption for learning from positive and unlabeled data, in U. Brefeld et al. (Eds), Proceedings of the 2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Springer, Cham, pp. 71–85, DOI: 10.1007/978-3-030-46147-8_5. Ouvrir le DOISearch in Google Scholar

Cover, T. and Thomas, J. (1991). Elements of Information Theory, Wiley, New York, DOI: 10.1002/047174882X. Ouvrir le DOISearch in Google Scholar

Elkan, C. and Noto, K. (2008). Learning classifiers from only positive and unlabeled data, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, USA, pp. 213–220, DOI: 10.1145/1401890.1401920. Ouvrir le DOISearch in Google Scholar

Łazęcka, M., Mielniczuk, J. and Teisseyre, P. (2021). Estimating the class prior for positive and unlabelled data via logistic regression, Advances in Data Analysis and Classification 15(4): 1039–1068, DOI: 10.1007/s11634-021-00444-9. Ouvrir le DOISearch in Google Scholar

Lipp, T. and Boyd, S. (2016). Variations and extension of the convex-concave procedure, Optimization and Engineering 17(2): 263–287, DOI: 10.1007/s11081-015-9294-x. Ouvrir le DOISearch in Google Scholar

Liu, B., Dai, Y., Li, X., Lee, W.S. and Yu, P.S. (2003). Building text classifiers using positive and unlabeled examples, Proceedings of the 3rd IEEE International Conference on Data Mining, ICDM’03, Melbourne, USA, pp. 179–186, DOI: 10.1109/ICDM.2003.1250918. Ouvrir le DOISearch in Google Scholar

Na, B., Kim, H., Song, K., Joo, W., Kim, Y.-Y. and Moon, I.-C. (2020). Deep generative positive-unlabeled learning under selection bias, Proceedings of the 29th ACM International Conference on Information and Knowledge Management, CIKM’20, Ireland, pp. 1155–1164, DOI: 10.1145/3340531.3411971, (virtual event). Ouvrir le DOISearch in Google Scholar

Scott, B., Blanchard, G. and Handy, G. (2013). Classification with asymetric label noise: Consistency and maximal denoising, Proceedings of Machine Learning Research 30(2013): 1–23. Search in Google Scholar

Sechidis, K., Sperrin, M., Petherick, E.S., Luján, M. and Brown, G. (2017). Dealing with under-reported variables: An information theoretic solution, International Journal of Approximate Reasoning 85(1): 159–177, DOI: 10.1016/j.ijar.2017.04.002. Ouvrir le DOISearch in Google Scholar

Shen, X., Diamond, S., Gu, Y. and Boyd, S. (2016). Disciplined convex-concave programming, Proceedings of 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, USA, pp. 1009–1014, DOI: 10.1109/CDC.2016.7798400. Ouvrir le DOISearch in Google Scholar

Teisseyre, P., Mielniczuk, J. and Łazęcka, M. (2020). Different strategies of fitting logistic regression for positive and unlabelled data, in V.V. Krzhizhanovskaya et al. (Eds), Proceedings of the International Conference on Computational Science ICCS’20, Springer International Publishing, Cham, pp. 3–17, DOI: 10.1007/978-3-030-50423-6_1. Ouvrir le DOISearch in Google Scholar

Ward, G., Hastie, T., Barry, S., Elith, J. and Leathwick, J. (2009). Presence-only data and the EM algorithm, Biometrics 65(2): 554–563, DOI: 10.1111/j.1541-0420.2008.01116.x.482188618759851 Ouvrir le DOISearch in Google Scholar

Yang, P., Li, X., Chua, H., Kwoh, C. and Ng, S. (2014). Ensemble positive unlabeled learning for disease gene identification, PLOS ONE 9(5): 1–11, DOI: 10.1371/journal.pone.0097079.401624124816822 Ouvrir le DOISearch in Google Scholar

Yuille, A. and Rangarajan, A. (2003). The concave-convex procedure, Neural Computation 15(4): 915–936, DOI: 10.1162/08997660360581958.12689392 Ouvrir le DOISearch in Google Scholar

Articles recommandés par Trend MD

Planifiez votre conférence à distance avec Sciendo