1. bookVolume 2022 (2022): Issue 2 (April 2022)
Journal Details
License
Format
Journal
eISSN
2299-0984
First Published
16 Apr 2015
Publication timeframe
4 times per year
Languages
English
access type Open Access

Differentially Private Simple Linear Regression

Published Online: 03 Mar 2022
Volume & Issue: Volume 2022 (2022) - Issue 2 (April 2022)
Page range: 184 - 204
Received: 31 Aug 2021
Accepted: 16 Dec 2021
Journal Details
License
Format
Journal
eISSN
2299-0984
First Published
16 Apr 2015
Publication timeframe
4 times per year
Languages
English
Abstract

Economics and social science research often require analyzing datasets of sensitive personal information at fine granularity, with models fit to small subsets of the data. Unfortunately, such fine-grained analysis can easily reveal sensitive individual information. We study regression algorithms that satisfy differential privacy, a constraint which guarantees that an algorithm’s output reveals little about any individual input data record, even to an attacker with side information about the dataset. Motivated by the Opportunity Atlas, a high-profile, small-area analysis tool in economics research, we perform a thorough experimental evaluation of differentially private algorithms for simple linear regression on small datasets with tens to hundreds of records—a particularly challenging regime for differential privacy. In contrast, prior work on differentially private linear regression focused on multivariate linear regression on large datasets or asymptotic analysis. Through a range of experiments, we identify key factors that affect the relative performance of the algorithms. We find that algorithms based on robust estimators—in particular, the median-based estimator of Theil and Sen—perform best on small datasets (e.g., hundreds of datapoints), while algorithms based on Ordinary Least Squares or Gradient Descent perform better for large datasets. However, we also discuss regimes in which this general finding does not hold. Notably, the differentially private analogues of Theil–Sen (one of which was suggested in a theoretical work of Dwork and Lei) have not been studied in any prior experimental work on differentially private linear regression.

Keywords

[1] Jacob Abernethy, Chansoo Lee, and Ambuj Tewari. 2016. Perturbation techniques in online learning and optimization. Perturbations, Optimization, and Statistics (2016), 233. Search in Google Scholar

[2] Oguz Akbiligic, Hamparsum Bozdogan, and M. Erdal Balaban. 2013. A novel Hybrid RBF Neural Networks model as a forecaster. Statistics and Computing (2013). This dataset was collected from imkb.gov.tr and finance.yahoo.com. Search in Google Scholar

[3] Hilal Asi and John C Duchi. 2020. Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms. Advances in Neural Information Processing Systems 33 (2020). Search in Google Scholar

[4] Jordan Awan and Aleksandra Slavković. 2020. Structure and sensitivity in differential privacy: Comparing k-norm mechanisms. J. Amer. Statist. Assoc. just-accepted (2020), 1–56. Search in Google Scholar

[5] Emily Badger and Quoctrung Bui. 2020. Detailed Maps Show How Neighborhoods Shape Children for Life. https://www.nytimes.com/2018/10/01/upshot/maps-neighborhoods-shape-child-poverty.html. Online; accessed 15 October 2020. Search in Google Scholar

[6] Raef Bassily, Adam D. Smith, and Abhradeep Thakurta. 2014. Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds. In 55th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2014, Philadelphia, PA, USA, October 18-21, 2014. 464–473. Search in Google Scholar

[7] Garrett Bernstein and Daniel R Sheldon. 2019. Differentially Private Bayesian Linear Regression. In Advances in Neural Information Processing Systems 32. 523–533. Search in Google Scholar

[8] Mark Bun and Thomas Steinke. 2016. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference. Springer, 635–658. Search in Google Scholar

[9] Mark Bun and Thomas Steinke. 2019. Average-Case Averages: Private Algorithms for Smooth Sensitivity and Mean Estimation. In Advances in Neural Information Processing Systems 32. 181–191. Search in Google Scholar

[10] Tony Cai, Yichen Wang, and Linjun Zhang. 2019. The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy. CoRR abs/1902.04495 (2019). http://arxiv.org/abs/1902.04495 Search in Google Scholar

[11] Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. 2011. Differentially Private Empirical Risk Minimization. Journal of Machine Learning Research 12 (2011), 1069–1109. Search in Google Scholar

[12] Raj Chetty, John N Friedman, Nathaniel Hendren, Maggie R Jones, and Sonya R Porter. 2018. The opportunity atlas: Mapping the childhood roots of social mobility. Technical Report. National Bureau of Economic Research.10.3386/w25147 Search in Google Scholar

[13] Raj Chetty, Nathaniel Hendren, Patrick Kline, and Emmanuel Saez. 2014. Where is the land of opportunity? The geography of intergenerational mobility in the United States. The Quarterly Journal of Economics 129, 4 (2014), 1553–1623. Search in Google Scholar

[14] Graham Cormode. [n. d.]. Building Blocks of Privacy: Differentially Private Mechanisms. ([n. d.]), 18–19. Search in Google Scholar

[15] Simon Couch, Zeki Kazan, Kaiyan Shi, Andrew Bray, and Adam Groce. 2019. Differentially Private Nonparametric Hypothesis Testing. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS ’19). 737–751.10.1145/3319535.3339821 Search in Google Scholar

[16] Alfred DeMaris. 2004. Regression with social data : modeling continuous and limited response variables. Wiley-Interscience, Hoboken, NJ.10.1002/0471677566 Search in Google Scholar

[17] Irit Dinur and Kobbi Nissim. 2003. Revealing information while preserving privacy. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 202–210.10.1145/773153.773173 Search in Google Scholar

[18] Cynthia Dwork and Jing Lei. 2009. Differential privacy and robust statistics.. In STOC, Vol. 9. 371–380. Search in Google Scholar

[19] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. 2006. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings. 265–284. Search in Google Scholar

[20] Cynthia Dwork, Kunal Talwar, Abhradeep Thakurta, and Li Zhang. 2014. Analyze gauss: optimal bounds for privacy-preserving principal component analysis. In STOC. 11–20.10.1145/2591796.2591883 Search in Google Scholar

[21] Frank McSherry and Kunal Talwar. 2007. Mechanism Design via Differential Privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), October 20-23, 2007, Providence, RI, USA, Proceedings. 94–103. Search in Google Scholar

[22] Frank D McSherry. 2009. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. 19–30.10.1145/1559845.1559850 Search in Google Scholar

[23] Kobbi Nissim, Sofya Raskhodnikova, and Adam D. Smith. 2007. Smooth sensitivity and sampling in private data analysis. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing, San Diego, California, USA, June 11-13, 2007. 75–84. Search in Google Scholar

[24] Pranab Kumar Sen. 1968. Estimates of the regression coefficient based on Kendall’s tau. Journal of the American statistical association 63, 324 (1968), 1379–1389. Search in Google Scholar

[25] Or Sheffet. 2017. Differentially Private Ordinary Least Squares. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. 3105–3114. http://proceedings.mlr.press/v70/sheffet17a.html Search in Google Scholar

[26] Adam Smith. 2011. Privacy-preserving statistical estimation with optimal convergence rates. In Proceedings of the forty-third annual ACM symposium on Theory of computing. 813–822.10.1145/1993636.1993743 Search in Google Scholar

[27] Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. 2013. Stochastic gradient descent with differentially private updates. In IEEE Global Conference on Signal and Information Processing, GlobalSIP 2013, Austin, TX, USA, December 3-5, 2013. 245–248. Search in Google Scholar

[28] Henri Theil. 1950. A rank-invariant method of linear and polynomial regression analysis, 3; confidence regions for the parameters of polynomial regression equations. Indagationes Mathematicae 1, 2 (1950), 467–482. Search in Google Scholar

[29] Yu-Xiang Wang. 2018. Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain. In Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2018, Monterey, California, USA, August 6-10, 2018. 93–103. Search in Google Scholar

[30] Xin Yan and Xiao Gang Su. 2009. Linear Regression Analysis: Theory and Computing. World Scientific Publishing Co., Inc., USA.10.1142/6986 Search in Google Scholar

[31] Jun Zhang, Zhenjie Zhang, Xiaokui Xiao, Yin Yang, and Marianne Winslett. 2012. Functional Mechanism: Regression Analysis under Differential Privacy. Proc. VLDB Endow. 5, 11 (2012), 1364–1375.10.14778/2350229.2350253 Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo