1. bookVolume 36 (2020): Issue 3 (September 2020)
    Special Issue on Nonresponse
Journal Details
First Published
01 Oct 2013
Publication timeframe
4 times per year
access type Open Access

Proxy Pattern-Mixture Analysis for a Binary Variable Subject to Nonresponse

Published Online: 24 Jul 2020
Page range: 703 - 728
Received: 01 Aug 2018
Accepted: 01 Oct 2019
Journal Details
First Published
01 Oct 2013
Publication timeframe
4 times per year

Given increasing survey nonresponse, good measures of the potential impact of nonresponse on survey estimates are particularly important. Existing measures, such as the R-indicator, make the strong assumption that missingness is missing at random, meaning that it depends only on variables that are observed for respondents and nonrespondents. We consider assessment of the impact of nonresponse for a binary survey variable Y subject to nonresponse when missingness may be not at random, meaning that missingness may depend on Y itself. Our work is motivated by missing categorical income data in the 2015 Ohio Medicaid Assessment Survey (OMAS), where whether or not income is missing may be related to the income value itself, with low-income earners more reluctant to respond. We assume there is a set of covariates observed for nonrespondents and respondents, which for the item nonresponse (as in OMAS) is often a rich set of variables, but which may be potentially limited in cases of unit nonresponse. To reduce dimensionality and for simplicity we reduce these available covariates to a continuous proxy variable X, available for both respondents and nonrespondents, that has the highest correlation with Y, estimated from a probit regression analysis of respondent data. We extend the previously proposed proxy-pattern mixture (PPM) analysis for continuous outcomes to the binary outcome using a latent variable approach for modeling the joint distribution of Y and X. Our method does not assume data are missing at random but includes it as a special case, thus creating a convenient framework for sensitivity analyses. Maximum likelihood, Bayesian, and multiple imputation versions of PPM analysis are described, and robustness of these methods to model assumptions is discussed. Properties are demonstrated through simulation and with the 2015 OMAS data.


Agresti, A. 2002. Categorical Data Analysis. New York: Wiley.Search in Google Scholar

Albert, J.H. and S. Chib. 1993. “Bayesian Analysis of Binary and Polychotomous Response Data.” Journal of the American Statistical Association 88: 669–679. DOI: https://doi.org/10.1080/01621459.1993.10476321.Search in Google Scholar

Andridge, R., A.M. Noone, and N. Howlader. 2017. “Imputing estrogen receptor (ER) status in a populationbased cancer registry: a sensitivity analysis.” Statistics in Medicine 36: 1014–1028. DOI: https://doi.org/10.1002/sim.7193.Search in Google Scholar

Andridge, R.R. and R.J.A. Little. 2011. “Proxy Pattern-Mixture Analysis for Survey Nonresponse.” Journal of Official Statistics 27: 153–180. DOI: https://doi.org/10.1214/15-AOAS878SUPP.Search in Google Scholar

Andridge, R.R. and K.J. Thompson. 2015. “Assessing nonresponse bias in a business survey: proxy pattern-mixture analysis for skewed data.” The Annals of Applied Statistics 9: 2237–2265. DOI: https://doi.org/10.1214/15-AOAS878.Search in Google Scholar

Barnhart, W.R., D. Ellsworth, A.C. Robinson, J. Myers, R.R. Andridge, and S.M. Havercamp. 2019. “Caregiving in the shadows: National analysis of health outcomes and intensity and duration of care among those who care for people with mental illness and for people with developmental disabilities.” Disability and Health Journal 3: 100837. DOI: https://doi.org/10.1016/j.dhjo.2019.100837.Search in Google Scholar

Brick, J.M. and D. Williams. 2013. “Explaining Rising Nonresponse Rates in Cross-Sectional Surveys.” The ANNALS of the American Academy of Political and Social Science 645: 36–59. DOI: https://doi.org/10.1177/0002716212456834.Search in Google Scholar

Cox, N.R. 1974. “Estimation of the Correlation Between a Continuous and a Discrete Variable.” Biometrics 30: 171–178. DOI: https://doi.org/10.2307/2529626.Search in Google Scholar

Curtain, R., S. Presser, and E. Singer. 2005. “Changes in Telephone Survey Nonresponse over the Past Quarter Century.” Public Opinion Quarterly 69: 87–98. DOI: https://doi.org/10.1093/poq/nfi002.Search in Google Scholar

Heckman, J.J. 1976. “The Common Structure of Statistical Models of Truncation, Sample Selection, and Limited Dependent Variables and a Simple Estimator for Such Models.” The Annals of Economic and Social Measurement 5: 475–492.Search in Google Scholar

Hedeker, D., R.J. Mermelstein, and H. Demirtas. 2007. “Analysis of binary outcomes with missing data: missing = smoking, last observation carried forward, and a little multiple imputation.” Addiction 102: 1564–1573. DOI: https://doi.org/10.1111/j.1360-0443.2007.01946.x.Search in Google Scholar

Higgins, J.P.T., I.R. White, and A.M. Wood. 2008. “Imputation methods for missing outcome data in meta-analysis of clinical trials.” Clinical Trials 5: 225–239. DOI: https://doi.org/10.1177/1740774508091600.Search in Google Scholar

Jackson, D., I.R. White, D. Mason, and S. Sutton. 2014. “A general method for handling missing binary outcome data in randomized controlled trials.” Addiction 109: 1286–1993. DOI: https://doi.org/10.1111/add.12721.Search in Google Scholar

Kim, J.K., J.M. Brick, W.A. Fuller, and G. Kalton. 2006. “On the Bias of the Multiple-Imputation Variance Estimator in Survey Sampling.” Journal of the Royal Statistical Society B 68: 509–521. DOI: https://doi.org/10.1111/j.1467-9868.2006.00546.x.Search in Google Scholar

Little, R.J.A. 1993. “Pattern-Mixture Models for Multivariate Incomplete Data.” Journal of the American Statistical Association 88: 125–134. DOI: https://doi.org/10.2307/2533148.Search in Google Scholar

Little, R.J.A. 1994. “A Class of Pattern-Mixture Models for Normal Incomplete Data.” Biometrika 81: 471–483. DOI: https://doi.org/10.1093/biomet/81.3.471.Search in Google Scholar

Little, R.J.A. and D.B. Rubin. 2019. “Statistical Analysis with Missing Data.” 3rd edition. Wiley: New York.Search in Google Scholar

Little, R.J.A. and Y. Wang. 1996. “Pattern-Mixture Models for Multivariate Incomplete Data with Covariates.” Biometrics 52: 98–111. DOI: https://doi.org/10.1080/01621459.1993.10594302.Search in Google Scholar

Little, R.J.A., B.T. West, P.S. Boonstra, and J. Hu. 2019. “Measures of the Degree of Departure from Ignorable Sample Selection.” Journal of Survey Statistics and Methodology. DOI: https://doi.org/10.1093/jssam/smz023.Search in Google Scholar

Liublinska, V. and D.B. Rubin. 2014. “Sensitivity analysis for a partially missing binary outcome in two-arm randomized clinical trial.” Statistics in Medicine 33: 4170–4185. DOI: https://doi.org/10.1002/sim.6197.Search in Google Scholar

Lumley, T. 2004. “Analysis of Complex Survey Samples.” Journal of Statistical Software 9: 1–19.Search in Google Scholar

Magder, L.S. 2003. “Simple approaches to assess the possible impact of missing outcome information on estimates of risk ratios, odds ratios, and risk differences.” Controlled Clinical Trials 24: 411–421. DOI: https://doi.org/10.1016/s0197-2456(03)00021-7.Search in Google Scholar

Manski, C.F. 2016. “Credible Interval Estimates for Official Statsitics with Survey Nonresponse.” Journal of Econometrics 191: 293–301. DOI: https://doi.org/10.1016/j.jeconom.2015.12.002.Search in Google Scholar

Muthen, B., T. Asparouhov, A.M. Hunter, and A.F. Leuchter. 2011. “Growth Modeling With Nonignorable Dropout: Alternative Analyses of the STAR*D Antidepressant Trial.” Psychological Methods 16: 17–33. DOI: https://doi.org/10.1037/a0022634.Search in Google Scholar

Nandram, B. and J.W. Choi. 2002a. “A Bayesian Analysis of a Proportion Under Non-Ignorable Non-Response.” Statistics in Medicine 21: 1189–1212. DOI: https://doi.org/10.1002/sim.1100.Search in Google Scholar

Nandram, B. and J.W. Choi. 2002b. “Hierarchical Bayesian Nonresponse Models for Binary Data from Small Areas with Uncertainty about Ignorability.” Journal of the American Statistical Association 97: 381 – 388. DOI: https://doi.org/10.1198/016214502760046934.Search in Google Scholar

Nandram, B., G. Han, and J.W. Choi. 2002. “A Hierarchical Bayesian Nonignorable Nonresponse Model for Multinomial Data from Small Areas.” Survey Methodology 28: 145–156.Search in Google Scholar

Nandram, B., N. Liu, J.W. Choi, and L. Cox. 2005. “Bayesian Non-response Models for Categorical Data from Small Areas: An Application to BMD and Age.” Statistics in Medicine 24: 1047–1074. DOI: https://doi.org/10.1002/sim.1985.Search in Google Scholar

Olsson, U., F. Drasgow, and N.J. Dorans. 1982. “The Polyserial Correlation Coefficient.” Psychometrika 47: 337–347. DOI: https://doi.org/10.1007/BF02294164.Search in Google Scholar

R Core Team. 2017. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available at: https://www.R-project.org (accessed June 2020).Search in Google Scholar

RTI International. 2015. 2015 Ohio Medicaid Assessment Survey Methodology Report. Technical report, RTI International. Available at: http://grc.osu.edu/sites/default/files/inline-files/12015OMASMethReptFinal121115psg.pdf (accessed June 2020).Search in Google Scholar

Rubin, D.B. 1976. “Inference and Missing Data” (with Discussion). Biometrika 63: 581–592. DOI: https://doi.org/10.1093/biomet/63.3.581.Search in Google Scholar

Rubin, D.B. 1977. “Formalizing Subjective Notions About the Effect of Nonrespondents in Sample Surveys.” Journal of the American Statistical Association 72: 538–542. DOI: https://doi.org/10.1080/01621459.1991.10475033.Search in Google Scholar

Rubin, D.B. 1978. “Multiple Imputation in Sample Surveys.” A Phenomenological Bayesian Approach to Nonresponse. In Proceedings of the Survey Research Methods Section, American Statistical Association (San Diego, CA): 20–34. DOI: https://doi.org/10.1002/9780470316696.Search in Google Scholar

Stasny, E.A. 1991. “Hierarchical Models for the Probabilities of a Survey Classification and Nonresponse: An Example from the National Crime Survey.” Journal of the American Statistical Association 86: 296–303. DOI: https://doi.org/10.1080/01621459.1991.10475033.Search in Google Scholar

Sullivan, D. and R. Andridge. 2015. “A hot deck imputation procedure for multiply imputing nonignorable missing data: The proxy pattern-mixture hot deck.” Computational Statistics and Data Analysis 82: 173–185. DOI: https://doi.org/10.1016/j.csda.2014.09.008.Search in Google Scholar

Tate, R.F. 1955a. “Applications of Correlation Models for Biserial Data.” Journal of the American Statistical Association 50: 1078–1095. DOI: https://doi.org/10.1080/01621459.1955.10501293.Search in Google Scholar

Tate, R.F. 1955b. “The Theory of Correlation Between Two Continuous Variables When One is Dichotomized.” Biometrika 42: 205–216. DOI: https://doi.org/10.2307/2333437.Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo