1. bookVolume 2022 (2022): Issue 1 (January 2022)
Journal Details
License
Format
Journal
First Published
16 Apr 2015
Publication timeframe
4 times per year
Languages
English
access type Open Access

Multiparty Reach and Frequency Histogram: Private, Secure, and Practical

Published Online: 20 Nov 2021
Page range: 373 - 395
Received: 31 May 2021
Accepted: 16 Sep 2021
Journal Details
License
Format
Journal
First Published
16 Apr 2015
Publication timeframe
4 times per year
Languages
English
Abstract

Consider the setting where multiple parties each hold a multiset of users and the task is to estimate the reach (i.e., the number of distinct users appearing across all parties) and the frequency histogram (i.e., fraction of users appearing a given number of times across all parties). In this work we introduce a new sketch for this task, based on an exponentially distributed counting Bloom filter. We combine this sketch with a communication-efficient multi-party protocol to solve the task in the multi-worker setting. Our protocol exhibits both differential privacy and security guarantees in the honest-but-curious model and in the presence of large subsets of colluding workers; furthermore, its reach and frequency histogram estimates have a provably small error. Finally, we show the practicality of the protocol by evaluating it on internet-scale audiences.

Keywords

[1] J. M. Abowd. The US Census Bureau adopts differential privacy. In KDD, pages 2867–2867, 2018. Search in Google Scholar

[2] A. Acar, H. Aksu, A. S. Uluagac, and M. Conti. A survey on homomorphic encryption schemes: Theory and implementation. Computing Surveys, 79, 2018. Search in Google Scholar

[3] M. Alaggan, M. Cunche, and S. Gambs. Privacy-preserving wi-fi analytics. PoPETs, pages 4–26, 2018. Search in Google Scholar

[4] M. Alaggan, S. Gambs, and A.-M. Kermarrec. Blip: Noninteractive differentially-private similarity computation on Bloom filters. In Stabilization, Safety, and Security of Distributed Systems, pages 202–216, 2012. Search in Google Scholar

[5] N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. JCSS, 58(1):137–147, 1999. Search in Google Scholar

[6] Apple Differential Privacy Team. Learning with privacy at scale. Apple Machine Learning Journal, 2017. Search in Google Scholar

[7] V. Balcer, A. Cheu, M. Joseph, and J. Mao. Connecting robust shuffle privacy and pan-privacy. In SODA, pages 2384–2403, 2021. Search in Google Scholar

[8] Z. Bar-Yossef, T. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan. Counting distinct elements in a data stream. In RANDOM, pages 1–10, 2002. Search in Google Scholar

[9] A. Beimel, K. Nissim, and E. Omri. Distributed private data analysis: Simultaneously solving how and what. In CRYPTO, pages 451–468, 2008. Search in Google Scholar

[10] K. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis, and R. Gemulla. On synopses for distinct-value estimation under multiset operations. In SIGMOD, pages 199–210, 2007. Search in Google Scholar

[11] A. Bittau, Ú. Erlingsson, P. Maniatis, I. Mironov, A. Raghu-nathan, D. Lie, M. Rudominer, U. Kode, J. Tinnés, and B. Seefeld. Prochlo: Strong privacy for analytics in the crowd. In SOSP, pages 441–459, 2017. Search in Google Scholar

[12] B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. CACM, 13(7):422–426, 1970. Search in Google Scholar

[13] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth. Practical secure aggregation for privacy-preserving machine learning. In CCS, pages 1175–1191, 2017. Search in Google Scholar

[14] D. Boneh. The decision Diffie–Hellman problem. In ANTS, pages 48–63, 1998. Search in Google Scholar

[15] J. Brody, A. Chakrabarti, R. Kondapally, D. P. Woodruff, and G. Yaroslavtsev. Beyond set disjointness: the communication complexity of finding the intersection. In PODC, pages 106–113, 2014. Search in Google Scholar

[16] T. H. Chan, E. Shi, and D. Song. Optimal lower bound for differentially private multi-party aggregation. In ESA, pages 277–288, 2012. Search in Google Scholar

[17] M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In ICALP, pages 693–703, 2002. Search in Google Scholar

[18] L. Chen, B. Ghazi, R. Kumar, and P. Manurangsi. On distributed differential privacy and counting distinct elements. In ITCS, pages 56:1–56:18, 2021. Search in Google Scholar

[19] A. Cheu, A. D. Smith, J. Ullman, D. Zeber, and M. Zhilyaev. Distributed differential privacy via shuffling. In EUROCRYPT, pages 375–403, 2019. Search in Google Scholar

[20] S. G. Choi, D. Dachman-Soled, M. Kulkarni, and A. Yerukhimovich. Differentially-private multi-party sketching for large-scale statistics. PoPETs, 3:153–174, 2020. Search in Google Scholar

[21] E. Cohen. Size-estimation framework with applications to transitive closure and reachability. JCSS, 55(3):441–453, 1997. Search in Google Scholar

[22] G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. J. Algorithms, 55(1):58–75, 2005. Search in Google Scholar

[23] G. Cormode, S. Muthukrishnan, and K. Yi. Algorithms for distributed functional monitoring. TALG, 7(2):1–20, 2011. Search in Google Scholar

[24] H. Corrigan-Gibbs and D. Boneh. Prio: Private, robust, and scalable computation of aggregate statistics. In NSDI, pages 259–282, 2017. Search in Google Scholar

[25] D. Desfontaines, A. Lochbihler, and D. Basin. Cardinality estimators do not preserve privacy. PoPETs, pages 26–46, 2019. Search in Google Scholar

[26] B. Ding, J. Kulkarni, and S. Yekhanin. Collecting telemetry data privately. In NIPS, pages 3571–3580, 2017. Search in Google Scholar

[27] M. Durand and P. Flajolet. Loglog counting of large cardinalities. In ESA, pages 605–617, 2003. Search in Google Scholar

[28] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pages 486–503, 2006. Search in Google Scholar

[29] C. Dwork, F. McSherry, K. Nissim, and A. D. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265–284, 2006. Search in Google Scholar

[30] C. Dwork and A. Roth. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211–407, 2014. Search in Google Scholar

[31] R. Egert, M. Fischlin, D. Gens, S. Jacob, M. Senker, and J. Tillmanns. Privately computing set-union and set-intersection cardinality via Bloom filters. In Information Security and Privacy, pages 413–430, 2015. Search in Google Scholar

[32] Ú. Erlingsson, V. Feldman, I. Mironov, A. Raghunathan, K. Talwar, and A. Thakurta. Amplification by shuffling: From local to central differential privacy via anonymity. In SODA, pages 2468–2479, 2019. Search in Google Scholar

[33] Ú. Erlingsson, V. Pihur, and A. Korolova. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In CCS, pages 1054–1067, 2014. Search in Google Scholar

[34] C. Estan and G. Varghese. New directions in traffic measurement and accounting. SIGCOMM Comput. Commun. Rev., 32(4):323–336, 2002. Search in Google Scholar

[35] C. Estan, G. Varghese, and M. Fisk. Bitmap algorithms for counting active flows on high speed links. In IMC, pages 153–166, 2003. Search in Google Scholar

[36] D. Evans, V. Kolesnikov, and M. Rosulek. A Pragmatic Introduction to Secure Multi-Party Computation. Foundations and Trends® in Privacy and Security, 2(2-3):70–246, 2018. Search in Google Scholar

[37] P. Flajolet, É. Fusy, O. Gandouet, and F. Meunier. Hyper-LogLog: the analysis of a near-optimal cardinality estimation algorithm. In AofA: Analysis of Algorithms, pages 137–156, 2007. Search in Google Scholar

[38] P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. JCSS, 31(2):182–209, 1985. Search in Google Scholar

[39] A. Ghosh, T. Roughgarden, and M. Sundararajan. Universally utility-maximizing privacy mechanisms. SICOMP, 41(6):1673–1693, 2012. Search in Google Scholar

[40] Google. Private join and compute. https://github.com/google/private-join-and-compute. Search in Google Scholar

[41] A. Greenberg. Apple’s “differential privacy” is about collecting your data – but not your data. Wired, June, 13, 2016. Search in Google Scholar

[42] H. Harmouch and F. Naumann. Cardinality estimation: An experimental survey. VLDB, 11(4):499–512, 2017. Search in Google Scholar

[43] M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially-private histograms through consistency. VLDB, 3(1):1021–1032, 2010. Search in Google Scholar

[44] S. Heule, M. Nunkesser, and A. Hall. Hyperloglog in practice: Algorithmic engineering of a state of the art cardinality estimation algorithm. In EDBT, page 683–692, 2013. Search in Google Scholar

[45] M. Ion, B. Kreuter, A. E. Nergiz, S. Patel, S. Saxena, K. Seth, M. Raykova, D. Shanahan, and M. Yung. On deploying secure computing: Private intersection-sum-with-cardinality. In EuroS&P, pages 370–389, 2020. Search in Google Scholar

[46] Y. Ishai, E. Kushilevitz, R. Ostrovsky, and A. Sahai. Cryptography from anonymity. In FOCS, pages 239–248, 2006. Search in Google Scholar

[47] D. M. Kane, J. Nelson, and D. P. Woodruff. An optimal algorithm for the distinct elements problem. In PODS, pages 41–52, 2010. Search in Google Scholar

[48] S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn privately? SICOMP, 40(3):793–826, 2011. Search in Google Scholar

[49] D. Mir, S. Muthukrishnan, A. Nikolov, and R. N. Wright. Pan-private algorithms via statistics on sketches. In PODS, pages 37–48, 2011. Search in Google Scholar

[50] I. Mironov, O. Pandey, O. Reingold, and S. P. Vadhan. Computational differential privacy. In CRYPTO, pages 126–142, 2009. Search in Google Scholar

[51] G. W. Oehlert. A note on the delta method. The American Statistician, 46(1):27–29, 1992. Search in Google Scholar

[52] W. F. of Advertisers. Cross-media measurement initiative. https://github.com/world-federation-of-advertisers/cross_media_measurement_project_site/blob/master/public_papers/PRFE_results/PrivateReach&FrequencyEstimatorsEvaluationResults.md, 2020. Search in Google Scholar

[53] R. Pagh and N. M. Stausholm. Efficient differentially private f0 linear sketching. In ICDT, 2021. Search in Google Scholar

[54] S. Pohlig and M. Hellman. An improved algorithm for computing logarithms over GF(p) and its cryptographic significance. IEEE TOIT, 24(1):106–110, 1978. Search in Google Scholar

[55] S. Shankland. How Google tricks itself to protect Chrome user privacy. CNET, October, 2014. Search in Google Scholar

[56] E. Skvortsov, J. Wilhelm, W. Bradbury, J. Bao, A. Ulbrich, and L. Tsang. Tracking audience statistics with hyperloglog. Google Research Tech Report, 2021. Search in Google Scholar

[57] A. T. Suresh. Differentially private anonymized histograms. In NeurIPS, pages 7971–7981, 2019. Search in Google Scholar

[58] S. Tarkoma, C. E. Rothenberg, and E. Lagerspetz. Theory and practice of Bloom filters for distributed systems. IEEE Communications Surveys Tutorials, 14(1):131–155, 2012. Search in Google Scholar

[59] S. Vadhan. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography, pages 347–450. Springer, 2017. Search in Google Scholar

[60] T. Wang, J. Blocki, N. Li, and S. Jha. Locally differentially private protocols for frequency estimation. In USENIX, pages 729–745, 2017. Search in Google Scholar

[61] D. P. Woodruff and Q. Zhang. An optimal lower bound for distinct elements in the message passing model. In SODA, pages 718–733, 2014. Search in Google Scholar

[62] Y. W. Yu and G. M. Weber. Balancing accuracy and privacy in federated queries of clinical data repositories: Algorithm development and validation. J Med Internet Res, 22(11):e18735, Nov 2020. Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo