1. bookVolume 2017 (2017): Issue 4 (October 2017)
Journal Details
First Published
16 Apr 2015
Publication timeframe
4 times per year
access type Open Access

Privacy-Preserving Distributed Linear Regression on High-Dimensional Data

Published Online: 10 Oct 2017
Page range: 345 - 364
Received: 28 Feb 2017
Accepted: 02 Jun 2017
Journal Details
First Published
16 Apr 2015
Publication timeframe
4 times per year

We propose privacy-preserving protocols for computing linear regression models, in the setting where the training dataset is vertically distributed among several parties. Our main contribution is a hybrid multi-party computation protocol that combines Yao’s garbled circuits with tailored protocols for computing inner products. Like many machine learning tasks, building a linear regression model involves solving a system of linear equations. We conduct a comprehensive evaluation and comparison of different techniques for securely performing this task, including a new Conjugate Gradient Descent (CGD) algorithm. This algorithm is suitable for secure computation because it uses an efficient fixed-point representation of real numbers while maintaining accuracy and convergence rates comparable to what can be obtained with a classical solution using floating point numbers. Our technique improves on Nikolaenko et al.’s method for privacy-preserving ridge regression (S&P 2013), and can be used as a building block in other analyses. We implement a complete system and demonstrate that our approach is highly scalable, solving data analysis problems with one million records and one hundred features in less than one hour of total running time.

[1] G. Asharov, Y. Lindell, T. Schneider, and M. Zohner. More efficient oblivious transfer and extensions for faster secure computation. In ACM Conference on Computer and Communications Security, pages 535–548. ACM, 2013.10.1145/2508859.2516738Search in Google Scholar

[2] D. Beaver. Efficient multiparty protocols using circuit randomization. In CRYPTO, volume 576 of Lecture Notes in Computer Science, pages 420–432. Springer, 1991.10.1007/3-540-46766-1_34Search in Google Scholar

[3] M. Bellare, V. T. Hoang, S. Keelveedhi, and P. Rogaway. Efficient garbling from a fixed-key blockcipher. In IEEE Symposium on Security and Privacy, pages 478–492. IEEE Computer Society, 2013.10.1109/SP.2013.39Search in Google Scholar

[4] T. Bertin-Mahieux. YearPredictionMSD data set. https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD, 2011.Search in Google Scholar

[5] M. Blanton, A. Steele, and M. Aliasgari. Data-oblivious graph algorithms for secure computation and outsourcing. In ASIACCS, pages 207–218. ACM, 2013.10.1145/2484313.2484341Search in Google Scholar

[6] D. Bogdanov, L. Kamm, S. Laur, and V. Sokk. Rmind: a tool for cryptographically secure statistical analysis. IEEE Transactions on Dependable and Secure Computing, 2016.10.1109/TDSC.2016.2587623Open DOISearch in Google Scholar

[7] D. Bogdanov, S. Laur, and J. Willemson. Sharemind: A framework for fast privacy-preserving computations. In ESORICS, pages 192–206. Springer, 2008.10.1007/978-3-540-88313-5_13Search in Google Scholar

[8] K. Buza. Feedback prediction for blogs. In GfKl, Studies in Classification, Data Analysis, and Knowledge Organization, pages 145–152. Springer, 2012.10.1007/978-3-319-01595-8_16Search in Google Scholar

[9] K. Buza. BlogFeedback data set. https://archive.ics.uci.edu/ml/datasets/BlogFeedback, 2014.Search in Google Scholar

[10] M. D. Cock, R. Dowsley, A. C. A. Nascimento, and S. C. Newman. Fast, privacy preserving linear regression over distributed datasets based on pre-distributed data. In AISec@CCS, pages 3–14. ACM, 2015.10.1145/2808769.2808774Search in Google Scholar

[11] P. Cortez. Student performance data set. https://archive.ics.uci.edu/ml/datasets/Student+Performance, 2014.Search in Google Scholar

[12] P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47(4):547–553, 2009.10.1016/j.dss.2009.05.016Open DOISearch in Google Scholar

[13] P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis. Wine quality data set. https://archive.ics.uci.edu/ml/datasets/Wine+Quality, 2009.Search in Google Scholar

[14] P. Cortez and A. M. G. Silva. Using data mining to predict secondary school student performance. In Future Business Technology Conference, pages 5–12. EUROSIS, 2008.Search in Google Scholar

[15] I. Damgård, V. Pastro, N. P. Smart, and S. Zakarias. Multiparty computation from somewhat homomorphic encryption. In CRYPTO, pages 643–662. Springer, 2012.10.1007/978-3-642-32009-5_38Search in Google Scholar

[16] S. de Hoogh, B. Schoenmakers, and M. Veeningen. Certificate validation in secure computation and its use in verifiable linear programming. In AFRICACRYPT, volume 9646 of Lecture Notes in Computer Science, pages 265–284. Springer, 2016.10.1007/978-3-319-31517-1_14Search in Google Scholar

[17] D. Demmler, G. Dessouky, F. Koushanfar, A. Sadeghi, T. Schneider, and S. Zeitouni. Automated synthesis of optimized circuits for secure computation. In ACM Conference on Computer and Communications Security, pages 1504–1517. ACM, 2015.10.1145/2810103.2813678Search in Google Scholar

[18] D. Demmler, T. Schneider, and M. Zohner. ABY - A framework for efficient mixed-protocol secure two-party computation. In NDSS. The Internet Society, 2015.10.14722/ndss.2015.23113Search in Google Scholar

[19] W. Du and M. J. Atallah. Privacy-preserving cooperative scientific computations. In CSFW, pages 273–294. IEEE Computer Society, 2001.Search in Google Scholar

[20] W. Du and M. J. Atallah. Protocols for secure remote database access with approximate matching. In ECommerce Security and Privacy, volume 2 of Advances in Information Security, pages 87–111. Springer, 2001.10.1007/978-1-4615-1467-1_6Search in Google Scholar

[21] W. Du, Y. S. Han, and S. Chen. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In SDM, pages 222–233. SIAM, 2004.10.1137/1.9781611972740.21Search in Google Scholar

[22] C. Dwork and K. Nissim. Privacy-preserving datamining on vertically partitioned databases. In CRYPTO, volume 3152 of Lecture Notes in Computer Science, pages 528–544. Springer, 2004.10.1007/978-3-540-28628-8_32Search in Google Scholar

[23] C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211–407, 2014.10.1561/0400000042Search in Google Scholar

[24] M. D. Ercegovac and T. Lang. Digital arithmetic. Elsevier, 2004.10.1016/B978-155860798-9/50011-7Search in Google Scholar

[25] H. Fanaee-T and J. Gama. Bike sharing dataset data set. https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset, 2013.Search in Google Scholar

[26] H. Fanaee-T and J. Gama. Event labeling combining ensemble detectors and background knowledge. Progress in AI, 2(2-3):113–127, 2014.10.1007/s13748-013-0040-3Search in Google Scholar

[27] J. Fonollosa and R. Huerta. Gas sensor array under dynamic gas mixtures data set. https://archive.ics.uci.edu/ml/datasets/Gas+sensor+array+under+dynamic+gas+mixtures, 2015.Search in Google Scholar

[28] J. Fonollosa, S. Sheik, R. Huerta, and S. Marco. Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring. Sensors and Actuators B: Chemical, 215:618–629, 2015.10.1016/j.snb.2015.03.028Search in Google Scholar

[29] N. Gilboa. Two party RSA key generation. In CRYPTO, volume 1666 of Lecture Notes in Computer Science, pages 116–129. Springer, 1999.10.1007/3-540-48405-1_8Search in Google Scholar

[30] O. Goldreich. The Foundations of Cryptography - Volume 2, Basic Applications. Cambridge University Press, 2004.Search in Google Scholar

[31] O. Goldreich, S. Micali, and A. Wigderson. How to play any mental game or A completeness theorem for protocols with honest majority. In STOC, pages 218–229. ACM, 1987.10.1145/28395.28420Search in Google Scholar

[32] S. D. Gordon, J. Katz, V. Kolesnikov, F. Krell, T. Malkin, M. Raykova, and Y. Vahlis. Secure two-party computation in sublinear (amortized) time. In ACM Conference on Computer and Communications Security, pages 513–524. ACM, 2012.10.1145/2382196.2382251Search in Google Scholar

[33] T. Graepel, K. E. Lauter, and M. Naehrig. ML confidential: Machine learning on encrypted data. In ICISC, pages 1–21. Springer, 2012.10.1007/978-3-642-37682-5_1Search in Google Scholar

[34] F. Graf, H.-P. Kriegel, M. Schubert, S. Poelsterl, and A. Cavallaro. Relative location of ct slices on axial axis data set. https://archive.ics.uci.edu/ml/datasets/Relative+location+of+CT+slices+on+axial+axis, 2011.Search in Google Scholar

[35] R. Hall, S. E. Fienberg, and Y. Nardi. Secure multiple linear regression based on homomorphic encryption. Journal of Official Statistics, 27(4):669, 2011.Search in Google Scholar

[36] W. Henecka, S. Kögl, A. Sadeghi, T. Schneider, and I. Wehrenberg. TASTY: tool for automating secure two-party computations. In ACM Conference on Computer and Communications Security, pages 451–462. ACM, 2010.10.1145/1866307.1866358Search in Google Scholar

[37] Y. Huang, D. Evans, J. Katz, and L. Malka. Faster secure two-party computation using garbled circuits. In USENIX Security Symposium. USENIX Association, 2011.10.1007/978-3-642-25560-1_2Search in Google Scholar

[38] Y. Huang, C. Shen, D. Evans, J. Katz, and A. Shelat. Efficient secure computation with garbled circuits. In ICISS, volume 7093 of Lecture Notes in Computer Science, pages 28–48. Springer, 2011.10.1007/978-3-642-25560-1_2Search in Google Scholar

[39] A. Karatsuba and Y. Ofman. Multiplication of many-digital numbers by automatic computers. In Proceedings of the USSR Academy of Sciences 145, pages 293–294, 1962.Search in Google Scholar

[40] A. F. Karr, X. Lin, A. P. Sanil, and J. P. Reiter. Regression on distributed databases via secure multi-party computation. In DG.O, ACM International Conference Proceeding Series. Digital Government Research Center, 2004.Search in Google Scholar

[41] M. Keller, E. Orsini, and P. Scholl. MASCOT: faster malicious arithmetic secure computation with oblivious transfer. In ACM Conference on Computer and Communications Security, pages 830–842. ACM, 2016.10.1145/2976749.2978357Search in Google Scholar

[42] M. Keller and P. Scholl. Efficient, oblivious data structures for MPC. In ASIACRYPT (2), pages 506–525. Springer, 2014.10.1007/978-3-662-45608-8_27Search in Google Scholar

[43] D. E. Knuth. The Art of Computer Programming, Volume 2 (3rd Ed.): Seminumerical Algorithms. Addison-Wesley, 1997.Search in Google Scholar

[44] V. Kolesnikov and T. Schneider. Improved garbled circuit: Free XOR gates and applications. In ICALP (2), pages 486–498. Springer, 2008.10.1007/978-3-540-70583-3_40Search in Google Scholar

[45] P. Laud and M. Pettai. Secure multiparty sorting protocols with covert privacy. In NordSec, volume 10014 of Lecture Notes in Computer Science, pages 216–231, 2016.10.1007/978-3-319-47560-8_14Search in Google Scholar

[46] M. Lichman. UCI machine learning repository. http://archive.ics.uci.edu/ml, 2013.Search in Google Scholar

[47] Y. Lindell. How to simulate it - A tutorial on the simulation proof technique. IACR Cryptology ePrint Archive, 2016:46, 2016.Search in Google Scholar

[48] Y. Lindell and B. Pinkas. Privacy preserving data mining. J. Cryptology, 15(3):177–206, 2002.10.1007/s00145-001-0019-2Search in Google Scholar

[49] Y. Lindell and B. Pinkas. A proof of security of Yao’s protocol for two-party computation. J. Cryptology, 22(2):161–188, 2009.10.1007/s00145-008-9036-8Search in Google Scholar

[50] Y. Lindell, B. Pinkas, N. P. Smart, and A. Yanai. Efficient constant round multi-party computation combining BMR and SPDZ. In CRYPTO (2), pages 319–338. Springer, 2015.10.1007/978-3-662-48000-7_16Search in Google Scholar

[51] C. Liu, X. S. Wang, K. Nayak, Y. Huang, and E. Shi. ObliVM: A programming framework for secure computation. In IEEE Symposium on Security and Privacy, pages 359–376. IEEE Computer Society, 2015.10.1109/SP.2015.29Search in Google Scholar

[52] G. Meurant. The Lanczos and conjugate gradient algorithms: from theory to finite precision computations, volume 19. SIAM, 2006.10.1137/1.9780898718140Search in Google Scholar

[53] P. Mohassel and M. K. Franklin. Efficiency tradeoffs for malicious two-party computation. In Public Key Cryptography, volume 3958 of Lecture Notes in Computer Science, pages 458–473. Springer, 2006.10.1007/11745853_30Search in Google Scholar

[54] K. P. Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.Search in Google Scholar

[55] M. Naor and B. Pinkas. Efficient oblivious transfer protocols. In SODA, pages 448–457. ACM/SIAM, 2001.Search in Google Scholar

[56] M. Naor, B. Pinkas, and R. Sumner. Privacy preserving auctions and mechanism design. In EC, pages 129–139, 1999.10.1145/336992.337028Search in Google Scholar

[57] K. Nayak, X. S. Wang, S. Ioannidis, U. Weinsberg, N. Taft, and E. Shi. Graphsc: Parallel secure computation made easy. In IEEE Symposium on Security and Privacy, pages 377–394. IEEE Computer Society, 2015.10.1109/SP.2015.30Search in Google Scholar

[58] J. B. Nielsen, P. S. Nordholt, C. Orlandi, and S. S. Burra. A new approach to practical active-secure two-party computation. In CRYPTO, volume 7417 of Lecture Notes in Computer Science, pages 681–700. Springer, 2012.10.1007/978-3-642-32009-5_40Search in Google Scholar

[59] V. Nikolaenko, U. Weinsberg, S. Ioannidis, M. Joye, D. Boneh, and N. Taft. Privacy-preserving ridge regression on hundreds of millions of records. In IEEE Symposium on Security and Privacy, pages 334–348. IEEE Computer Society, 2013.10.1109/SP.2013.30Search in Google Scholar

[60] J. Nocedal and S. Wright. Numerical optimization. Springer Science & Business Media, 2006.Search in Google Scholar

[61] P. Pullonen and S. Siim. Combining secret sharing and garbled circuits for efficient private IEEE 754 floating-point computations. In Financial Cryptography Workshops, pages 172–183. Springer, 2015.10.1007/978-3-662-48051-9_13Search in Google Scholar

[62] M. Rabin. How to Exchange Secrets by Oblivious Transfer. Technical Report TR-81, Harvard Aiken Computation Laboratory, 1981.Search in Google Scholar

[63] M. Redmond. Communities and crime data set. https://archive.ics.uci.edu/ml/datasets/Communities+and+Crime, 2009.Search in Google Scholar

[64] M. Redmond and A. Baveja. A data-driven software tool for enabling cooperative information sharing among police departments. European Journal of Operational Research, 141(3):660–678, 2002.10.1016/S0377-2217(01)00264-8Search in Google Scholar

[65] A. P. Sanil, A. F. Karr, X. Lin, and J. P. Reiter. Privacy preserving regression modelling via distributed computation. In KDD, pages 677–682. ACM, 2004.10.1145/1014052.1014139Search in Google Scholar

[66] E. M. Songhori, S. U. Hussain, A. Sadeghi, T. Schneider, and F. Koushanfar. Tinygarble: Highly compressed and scalable sequential garbled circuits. In IEEE Symposium on Security and Privacy, pages 411–428. IEEE Computer Society, 2015.10.1109/SP.2015.32Search in Google Scholar

[67] X. Wang, A. J. Malozemoff, and J. Katz. Faster secure two-party computation in the single-execution setting. In EUROCRYPT (3), volume 10212 of Lecture Notes in Computer Science, pages 399–424, 2017.10.1007/978-3-319-56617-7_14Search in Google Scholar

[68] M. H. Weik. A third survey of domestic electronic digital computing systems. Technical report, DTIC Document, 1961.10.21236/AD0253212Search in Google Scholar

[69] J. H. Wilkinson. The algebraic eigenvalue problem. Clarendon Press Oxford, 1988.Search in Google Scholar

[70] A. C. Yao. How to generate and exchange secrets (extended abstract). In FOCS, pages 162–167. IEEE Computer Society, 1986.10.1109/SFCS.1986.25Search in Google Scholar

[71] H. Yu, J. Vaidya, and X. Jiang. Privacy-preserving SVM classification on vertically partitioned data. In PAKDD, pages 647–656. Springer, 2006.10.1007/11731139_74Search in Google Scholar

[72] S. Zahur and D. Evans. Obliv-C: A language for extensible data-oblivious computation. IACR Cryptology ePrint Archive, 2015:1153, 2015.Search in Google Scholar

[73] S. Zahur, M. Rosulek, and D. Evans. Two halves make a whole - reducing data transfer in garbled circuits using half gates. In EUROCRYPT (2), pages 220–250. Springer, 2015.10.1007/978-3-662-46803-6_8Search in Google Scholar

[74] Absentminded crypto kit. https://bitbucket.org/jackdoerner/absentminded-crypto-kit.Search in Google Scholar

[75] Secure distributed linear regression. https://github.com/schoppmp/linreg-mpc.Search in Google Scholar

[76] Auto MPG data set. https://archive.ics.uci.edu/ml/datasets/Auto+MPG, 1993.Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo