1. bookVolume 32 (2022): Edizione 2 (June 2022)
    Towards Self-Healing Systems through Diagnostics, Fault-Tolerance and Design (Special section, pp. 171-269), Marcin Witczak and Ralf Stetter (Eds.)
Dettagli della rivista
License
Formato
Rivista
eISSN
2083-8492
Prima pubblicazione
05 Apr 2007
Frequenza di pubblicazione
4 volte all'anno
Lingue
Inglese
access type Accesso libero

Revisiting Strategies for Fitting Logistic Regression for Positive and Unlabeled Data

Pubblicato online: 04 Jul 2022
Volume & Edizione: Volume 32 (2022) - Edizione 2 (June 2022)<br/>Towards Self-Healing Systems through Diagnostics, Fault-Tolerance and Design (Special section, pp. 171-269), Marcin Witczak and Ralf Stetter (Eds.)
Pagine: 299 - 309
Ricevuto: 05 Nov 2021
Accettato: 10 Feb 2022
Dettagli della rivista
License
Formato
Rivista
eISSN
2083-8492
Prima pubblicazione
05 Apr 2007
Frequenza di pubblicazione
4 volte all'anno
Lingue
Inglese
Abstract

Positive unlabeled (PU) learning is an important problem motivated by the occurrence of this type of partial observability in many applications. The present paper reconsiders recent advances in parametric modeling of PU data based on empirical likelihood maximization and argues that they can be significantly improved. The proposed approach is based on the fact that the likelihood for the logistic fit and an unknown labeling frequency can be expressed as the sum of a convex and a concave function, which is explicitly given. This allows methods such as the concave-convex procedure (CCCP) or its variant, the disciplined convex-concave procedure (DCCP), to be applied. We show by analyzing real data sets that, by using the DCCP to solve the optimization problem, we obtain significant improvements in the posterior probability and the label frequency estimation over the best available competitors.

Keywords

Bahorik, A.L., Newhill, C.E., Queen, C.C. and Eack, S.M. (2014). Under-reporting of drug use among individuals with schizophrenia: Prevalence and predictors, Psychological Medicine 44(12): 61–69, DOI: 10.1017/S0033291713000548.23551851 Apri DOISearch in Google Scholar

Bekker, J. and Davis, J. (2018). Estimating the class prior in positive and unlabeled data through decision tree induction, Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, USA 32(1): 2712–2719.10.1609/aaai.v32i1.11715 Search in Google Scholar

Bekker, J. and Davis, J. (2020). Learning from positive and unlabeled data: A survey, Machine Learning 109(4): 719–760, DOI: 10.1007/s10994-020-05877-5. Apri DOISearch in Google Scholar

Bekker, J., Robberechts, P. and Davis, J. (2019). Beyond the selected completely at random assumption for learning from positive and unlabeled data, in U. Brefeld et al. (Eds), Proceedings of the 2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Springer, Cham, pp. 71–85, DOI: 10.1007/978-3-030-46147-8_5. Apri DOISearch in Google Scholar

Cover, T. and Thomas, J. (1991). Elements of Information Theory, Wiley, New York, DOI: 10.1002/047174882X. Apri DOISearch in Google Scholar

Elkan, C. and Noto, K. (2008). Learning classifiers from only positive and unlabeled data, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, USA, pp. 213–220, DOI: 10.1145/1401890.1401920. Apri DOISearch in Google Scholar

Łazęcka, M., Mielniczuk, J. and Teisseyre, P. (2021). Estimating the class prior for positive and unlabelled data via logistic regression, Advances in Data Analysis and Classification 15(4): 1039–1068, DOI: 10.1007/s11634-021-00444-9. Apri DOISearch in Google Scholar

Lipp, T. and Boyd, S. (2016). Variations and extension of the convex-concave procedure, Optimization and Engineering 17(2): 263–287, DOI: 10.1007/s11081-015-9294-x. Apri DOISearch in Google Scholar

Liu, B., Dai, Y., Li, X., Lee, W.S. and Yu, P.S. (2003). Building text classifiers using positive and unlabeled examples, Proceedings of the 3rd IEEE International Conference on Data Mining, ICDM’03, Melbourne, USA, pp. 179–186, DOI: 10.1109/ICDM.2003.1250918. Apri DOISearch in Google Scholar

Na, B., Kim, H., Song, K., Joo, W., Kim, Y.-Y. and Moon, I.-C. (2020). Deep generative positive-unlabeled learning under selection bias, Proceedings of the 29th ACM International Conference on Information and Knowledge Management, CIKM’20, Ireland, pp. 1155–1164, DOI: 10.1145/3340531.3411971, (virtual event). Apri DOISearch in Google Scholar

Scott, B., Blanchard, G. and Handy, G. (2013). Classification with asymetric label noise: Consistency and maximal denoising, Proceedings of Machine Learning Research 30(2013): 1–23. Search in Google Scholar

Sechidis, K., Sperrin, M., Petherick, E.S., Luján, M. and Brown, G. (2017). Dealing with under-reported variables: An information theoretic solution, International Journal of Approximate Reasoning 85(1): 159–177, DOI: 10.1016/j.ijar.2017.04.002. Apri DOISearch in Google Scholar

Shen, X., Diamond, S., Gu, Y. and Boyd, S. (2016). Disciplined convex-concave programming, Proceedings of 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, USA, pp. 1009–1014, DOI: 10.1109/CDC.2016.7798400. Apri DOISearch in Google Scholar

Teisseyre, P., Mielniczuk, J. and Łazęcka, M. (2020). Different strategies of fitting logistic regression for positive and unlabelled data, in V.V. Krzhizhanovskaya et al. (Eds), Proceedings of the International Conference on Computational Science ICCS’20, Springer International Publishing, Cham, pp. 3–17, DOI: 10.1007/978-3-030-50423-6_1. Apri DOISearch in Google Scholar

Ward, G., Hastie, T., Barry, S., Elith, J. and Leathwick, J. (2009). Presence-only data and the EM algorithm, Biometrics 65(2): 554–563, DOI: 10.1111/j.1541-0420.2008.01116.x.482188618759851 Apri DOISearch in Google Scholar

Yang, P., Li, X., Chua, H., Kwoh, C. and Ng, S. (2014). Ensemble positive unlabeled learning for disease gene identification, PLOS ONE 9(5): 1–11, DOI: 10.1371/journal.pone.0097079.401624124816822 Apri DOISearch in Google Scholar

Yuille, A. and Rangarajan, A. (2003). The concave-convex procedure, Neural Computation 15(4): 915–936, DOI: 10.1162/08997660360581958.12689392 Apri DOISearch in Google Scholar

Articoli consigliati da Trend MD

Pianifica la tua conferenza remota con Sciendo