1. bookVolume 2021 (2021): Issue 3 (July 2021)
Journal Details
First Published
16 Apr 2015
Publication timeframe
4 times per year
access type Open Access

Genome Reconstruction Attacks Against Genomic Data-Sharing Beacons

Published Online: 27 Apr 2021
Page range: 28 - 48
Received: 30 Nov 2020
Accepted: 16 Mar 2021
Journal Details
First Published
16 Apr 2015
Publication timeframe
4 times per year

Sharing genome data in a privacy-preserving way stands as a major bottleneck in front of the scientific progress promised by the big data era in genomics. A community-driven protocol named genomic data-sharing beacon protocol has been widely adopted for sharing genomic data. The system aims to provide a secure, easy to implement, and standardized interface for data sharing by only allowing yes/no queries on the presence of specific alleles in the dataset. However, beacon protocol was recently shown to be vulnerable against membership inference attacks. In this paper, we show that privacy threats against genomic data sharing beacons are not limited to membership inference. We identify and analyze a novel vulnerability of genomic data-sharing beacons: genome reconstruction. We show that it is possible to successfully reconstruct a substantial part of the genome of a victim when the attacker knows the victim has been added to the beacon in a recent update. In particular, we show how an attacker can use the inherent correlations in the genome and clustering techniques to run such an attack in an efficient and accurate way. We also show that even if multiple individuals are added to the beacon during the same update, it is possible to identify the victim’s genome with high confidence using traits that are easily accessible by the attacker (e.g., eye color or hair type). Moreover, we show how a reconstructed genome using a beacon that is not associated with a sensitive phenotype can be used for membership inference attacks to beacons with sensitive phenotypes (e.g., HIV+). The outcome of this work will guide beacon operators on when and how to update the content of the beacon and help them (along with the beacon participants) make informed decisions.


[1] 2020. https://www.ga4gh.org/about-us/. [Online; accessed 10-January-2020].Search in Google Scholar

[2] 2020. http://beacon-network.org. [Online; accessed 10-January-2020].Search in Google Scholar

[3] 2020. https://ghr.nlm.nih.gov/primer/genomicresearch/snp. [Online; accessed 10-January-2020].Search in Google Scholar

[4] 2020. https://humandbs.biosciencedbc.jp/en/hum0029-v1. [Online; accessed 03-December-2020].Search in Google Scholar

[5] 2020. https://gnomad.broadinstitute.org/. [Online; accessed 03-December-2020].Search in Google Scholar

[6] 2020. Disease Risk. http://www.eupedia.com/genetics/medical_dna_test.shtml [Online; accessed 10-January-2020].Search in Google Scholar

[7] 2020. OpenSNP. http://opensnp.org. [Online; accessed 10-January-2020].Search in Google Scholar

[8] 2020. SNPedia. https://www.snpedia.com/. [Online; accessed 10-January-2020].Search in Google Scholar

[9] Md Momin Al Aziz, Reza Ghasemi, Md Waliullah, and Noman Mohammed. 2017. Aftermath of Bustamante attack on genomic beacon service. BMC Medical Genomics 10, 2 (2017), 43.Search in Google Scholar

[10] Hana Lango Allen, Karol Estrada, Guillaume Lettre, Sonja I Berndt, Michael N Weedon, Fernando Rivadeneira, Cristen J Willer, Anne U Jackson, Sailaja Vedantam, Soumya Raychaudhuri, et al. 2010. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 7317 (2010), 832–838.Search in Google Scholar

[11] Erman Ayday, Emiliano De Cristofaro, Jean-Pierre Hubaux, and Gene Tsudik. 2013. The chills and thrills of whole genome sequencing. (2013).Search in Google Scholar

[12] Erman Ayday, Jean Louis Raisaro, Jean-Pierre Hubaux, and Jacques Rougemont. 2013. Protecting and evaluating genomic privacy in medical tests and personalized medicine. In Proceedings of the 12th ACM Workshop on Privacy in the Electronic Society. 95–106.Search in Google Scholar

[13] Pierre Baldi, Roberta Baronio, Emiliano De Cristofaro, Paolo Gasti, and Gene Tsudik. 2011. Countering GATTACA: effi-cient and secure testing of fully-sequenced human genomes. In Proceedings of the 18th ACM conference on Computer and communications security. 691–702.Search in Google Scholar

[14] James C Bezdek, Robert Ehrlich, and William Full. 1984. FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences 10, 2-3 (1984), 191–203.Search in Google Scholar

[15] Marina Blanton, Mikhail J Atallah, Keith B Frikken, and Qutaibah Malluhi. 2012. Secure and efficient outsourcing of sequence comparisons. In Proceedings of European Symposium on Research in Computer Security. 505–522.Search in Google Scholar

[16] Kevin W. Bowyer, Nitesh V. Chawla, Lawrence O. Hall, and W. Philip Kegelmeyer. 2011. SMOTE: Synthetic Minority Over-sampling Technique. CoRR abs/1106.1813 (2011). arXiv:1106.1813 http://arxiv.org/abs/1106.1813Search in Google Scholar

[17] Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. CoRR abs/1603.02754 (2016). arXiv:1603.02754 http://arxiv.org/abs/1603.02754Search in Google Scholar

[18] Peter Claes, Denise K Liberton, Katleen Daniels, Kerri Matthes Rosana, Ellen E Quillen, Laurel N Pearson, Brian McEvoy, Marc Bauchet, Arslan A Zaidi, Wei Yao, et al. 2014. Modeling 3D facial shape from DNA. PLoS Genetics 10, 3 (2014).Search in Google Scholar

[19] David Clayton. 2010. On inferring presence of an individual in a mixture: a Bayesian approach. Biostatistics 11, 4 (2010), 661–673.Search in Google Scholar

[20] Francis S Collins and Harold Varmus. 2015. A new initiative on precision medicine. New England Journal of Medicine 372, 9 (2015), 793–795.Search in Google Scholar

[21] International HapMap Consortium et al. 2003. The international HapMap project. Nature 426, 6968 (2003), 789.Search in Google Scholar

[22] Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273–297.Search in Google Scholar

[23] J.S. Cramer. 2002. The Origins of Logistic Regression. Tinbergen Institute, Tinbergen Institute Discussion Papers (01 2002). https://doi.org/10.2139/ssrn.360300Search in Google Scholar

[24] Emiliano De Cristofaro, Sky Faber, and Gene Tsudik. 2013. Secure Genomic Testing with Size- and Position-hiding Private Substring Matching. In Proceedings of the 12th ACM Workshop on Privacy in the Electronic Society.Search in Google Scholar

[25] Iman Deznabi, Mohammad Mobayen, Nazanin Jafari, Oznur Tastan, and Erman Ayday. 2018. An inference attack on genomic data using kinship, complex correlations, and phenotype information. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 15, 4 (2018), 1333–1343.Search in Google Scholar

[26] Cynthia Dwork. 2006. Differential Privacy. Proceedings of the 33rd International Conference on Automata, Languages and Programming (2006).Search in Google Scholar

[27] Yaniv Erlich and Arvind Narayanan. 2014. Routes for breaching and protecting genetic privacy. Nature Reviews Genetics 15, 6 (2014), 409–421.Search in Google Scholar

[28] Stephen E Fienberg, Aleksandra Slavkovic, and Caroline Uhler. 2011. Privacy preserving GWAS data sharing. In IEEE 11th International Conference on Data Mining Workshops (ICDMW). 628–635.Search in Google Scholar

[29] Richard A Gibbs, John W Belmont, Paul Hardenbol, Thomas D Willis, Fuli Yu, Huanming Yang, Lan-Yang Ch’ang, Wei Huang, Bin Liu, Yan Shen, et al. 2003. The international HapMap project. Nature 426, 6968 (2003), 789–796.Search in Google Scholar

[30] Jane Gitschier. 2009. Inferential genotyping of Y chromosomes in Latter-Day Saints founders and comparison to Utah samples in the HapMap project. American Journal of Human Genetics 84, 2 (2009), 251–258.Search in Google Scholar

[31] Gustavo Glusman, Juan Caballero, Denise E Mauldin, Leroy Hood, and Jared C Roach. 2011. Kaviar: an accessible system for testing SNV novelty. Bioinformatics 27, 22 (2011), 3216–3217.Search in Google Scholar

[32] Bastian Greshake, Philipp E Bayer, Helge Rausch, and Julia Reda. 2014. OpenSNP–a crowdsourced web resource for personal genomics. PLoS One 9, 3 (2014), e89204.Search in Google Scholar

[33] Melissa Gymrek, Amy L McGuire, David Golan, Eran Halperin, and Yaniv Erlich. 2013. Identifying personal genomes by surname inference. Science 339, 6117 (2013), 321–324.Search in Google Scholar

[34] Inken Hagestedt, Yang Zhang, Mathias Humbert, Pascal Berrang, Haixu Tang, XiaoFeng Wang, and Michael Backes. 2019. MBeacon: Privacy-Preserving Beacons for DNA Methylation Data. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019. https://www.ndss-symposium.org/ndss-paper/mbeacon-privacy-preserving-beacons-for-dna-methylation-data/Search in Google Scholar

[35] Erika Check Hayden. 2013. Privacy protections: The genome hacker. Nature 497 (2013), 172–174.Search in Google Scholar

[36] Nils Homer, Szabolcs Szelinger, Margot Redman, David Duggan, Waibhav Tembe, Jill Muehling, John V Pearson, Dietrich A Stephan, Stanley F Nelson, and David W Craig. 2008. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genetics 4, 8 (2008).Search in Google Scholar

[37] Nils Homer, Szabolcs Szelinger, Margot Redman, David Duggan, Waibhav Tembe, Jill Muehling, John V Pearson, Dietrich A Stephan, Stanley F Nelson, and David W Craig. 2008. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genetics 4, 8 (2008).Search in Google Scholar

[38] Mathias Humbert, Erman Ayday, Jean-Pierre Hubaux, and Amalio Telenti. 2013. Addressing the concerns of the Lacks family: quantification of kin genomic privacy. In Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security. ACM, 1141–1152.Search in Google Scholar

[39] Mathias Humbert, Kévin Huguenin, Joachim Hugonot, Erman Ayday, and Jean-Pierre Hubaux. 2015. De-anonymizing Genomic Databases Using Phenotypic Traits. Proceedings on Privacy Enhancing Technologies 2015 (2015), 99–114.Search in Google Scholar

[40] Mathias Humbert, Kévin Huguenin, Joachim Hugonot, Erman Ayday, and Jean-Pierre Hubaux. 2015. De-anonymizing genomic databases using phenotypic traits. Proceedings on Privacy Enhancing Technologies 2015, 2 (2015), 99–114.Search in Google Scholar

[41] Hae Kyung Im, Eric R Gamazon, Dan L Nicolae, and Nancy J Cox. 2012. On sharing quantitative trait GWAS results in an era of multiple-omics data and the limits of genomic privacy. American Journal of Human Genetics 90, 4 (2012), 591–598.Search in Google Scholar

[42] Kevin B Jacobs, Meredith Yeager, Sholom Wacholder, David Craig, Peter Kraft, David J Hunter, Justin Paschal, Teri A Manolio, Margaret Tucker, Robert N Hoover, et al. 2009. A new statistic and its power to infer membership in a genome-wide association study using genotype frequencies. Nature genetics 41, 11 (2009), 1253–1257.Search in Google Scholar

[43] Somesh Jha, Louis Kruger, and Vitaly Shmatikov. 2008. Towards practical privacy for genomic computation. In Proceedings of IEEE Symposium on Security and Privacy. 216–230.Search in Google Scholar

[44] Aaron Johnson and Vitaly Shmatikov. 2013. Privacy-preserving data exploration in genome-wide association studies. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1079–1087.Search in Google Scholar

[45] Gulce Kale, Erman Ayday, and Öznur Tastan. 2017. A utility maximizing and privacy preserving approach for protecting kinship in genomic databases. Bioinformatics (2017).Search in Google Scholar

[46] Manfred Kayser and Peter de Knijff. 2011. Improving human forensics through advances in genetics, genomics and molecular biology. Nature Reviews Genetics 12, 3 (2011), 179–192.Search in Google Scholar

[47] Harold W Kuhn. 1955. The Hungarian method for the assignment problem. Naval research logistics quarterly 2, 1-2 (1955), 83–97.Search in Google Scholar

[48] Melissa J Landrum, Jennifer M Lee, Mark Benson, Garth R Brown, Chen Chao, Shanmuga Chitipiralla, Baoshan Gu, Jennifer Hart, Douglas Hoffman, Wonhee Jang, et al. 2017. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic acids research 46, D1 (2017), D1062–D1067.Search in Google Scholar

[49] H Ledford. 2016. AstraZeneca launches project to sequence 2 million genomes. Nature 532, 7600 (2016), 427.Search in Google Scholar

[50] Z. Lin, A. B. Owen, and R. B. Altman. 2004. Genomic research and human subject privacy. Science 305, 5681 (Jul 2004), 183.Search in Google Scholar

[51] Christoph Lippert, Riccardo Sabatini, M. Cyrus Maher, Eun Yong Kang, Seunghak Lee, Okan Arikan, Alena Harley, Axel Bernal, Peter Garst, Victor Lavrenko, Ken Yocum, Theodore Wong, Mingfu Zhu, Wen-Yun Yang, Chris Chang, Tim Lu, Charlie W. H. Lee, Barry Hicks, Smriti Ramakrishnan, Haibao Tang, Chao Xie, Jason Piper, Suzanne Brew-erton, Yaron Turpaz, Amalio Telenti, Rhonda K. Roby, Franz J. Och, and J. Craig Venter. 2017. Identification of individuals by trait prediction using whole-genome sequencing data. Proceedings of the National Academy of Sciences (2017). https://doi.org/10.1073/pnas.1711125114Search in Google Scholar

[52] Fan Liu, Fedde van der Lijn, Claudia Schurmann, Gu Zhu, M Mallar Chakravarty, Pirro G Hysi, Andreas Wollstein, Oscar Lao, Marleen de Bruijne, M Arfan Ikram, et al. 2012. A genome-wide association study identifies five loci influencing facial morphology in Europeans. PLoS Genetics 8, 9 (2012).Search in Google Scholar

[53] Bradley A. Malin and Latanya Sweeney. 2004. How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems. Journal of Biomedical Informatics 37, 3 (2004), 179–192.Search in Google Scholar

[54] Alisa K Manning, Marie-France Hivert, Robert A Scott, Jonna L Grimsby, Nabila Bouatia-Naji, Han Chen, Denis Rybin, Ching-Ti Liu, Lawrence F Bielak, Inga Prokopenko, et al. 2012. A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nature Genetics 44, 6 (2012), 659–669.Search in Google Scholar

[55] Muhammad Naveed, Shashank Agrawal, Manoj Prabhakaran, XiaoFeng Wang, Erman Ayday, Jean-Pierre Hubaux, and Carl Gunter. 2014. Controlled Functional Encryption. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security.Search in Google Scholar

[56] Muhammad Naveed, Erman Ayday, Ellen W Clayton, Jacques Fellay, Carl A Gunter, Jean-Pierre Hubaux, Bradley A Malin, and XiaoFeng Wang. 2015. Privacy in the genomic era. ACM Computing Surveys (CSUR) 48, 1 (2015), 6.Search in Google Scholar

[57] Andrew Y Ng, Michael I Jordan, and Yair Weiss. 2002. On spectral clustering: Analysis and an algorithm. In Advances in neural information processing systems. 849–856.Search in Google Scholar

[58] Xue-ling Ou, Jun Gao, Huan Wang, Hong-sheng Wang, Huiling Lu, and Hong-yu Sun. 2012. Predicting human age with bloodstains by sjTREC quantification. PloS ONE 7, 8 (2012).Search in Google Scholar

[59] Jean L Raisaro, Florian Tramer, Ji Zhanglong, Diyue Bu, Yongan Zhao, Knox Carey, David Lloyd, Heidi Sofia, Dixie Baker, Paul Flicek, Suyash S Shringarpure, Carlos D Bustamante, Suang Wang, Xiaoqian Jiang, Lucila Ohno-Machado, Haixu Tang, XiaoFeng Wang, and Jean-Pierre Hubaux. 2016. Addressing Beacon Re-Identification Attacks: Quantification and Mitigation of Privacy Risks. The Journal of the American Medical Informatics Association 24, 4 (2016), 799–805.Search in Google Scholar

[60] Mayra Z Rodriguez, Cesar H Comin, Dalcimar Casanova, Odemir M Bruno, Diego R Amancio, Luciano da F Costa, and Francisco A Rodrigues. 2019. Clustering algorithms: A comparative approach. PloS one 14, 1 (2019), e0210236.Search in Google Scholar

[61] A. Salem, Apratim Bhattacharyya, M. Backes, M. Fritz, and Y. Zhang. 2020. Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning. ArXiv abs/1904.01067 (2020).Search in Google Scholar

[62] Sahel Shariati Samani, Zhicong Huang, Erman Ayday, Mark Elliot, Jacques Fellay, Jean-Pierre Hubaux, and Zoltán Kutalik. 2015. Quantifying genomic privacy via inference attack with high-order SNV correlations. In Security and Privacy Workshops (SPW), 2015 IEEE. 32–40.Search in Google Scholar

[63] Sriram Sankararaman, Guillaume Obozinski, Michael I Jordan, and Eran Halperin. 2009. Genomic privacy and limits of individual detection in a pool. Nature Genetics 41, 9 (2009), 965–967.Search in Google Scholar

[64] Michael C Schatz. 2015. Biological data sciences in genome research. Genome Research 25, 10 (2015), 1417–1422.Search in Google Scholar

[65] Suyash S Shringarpure and Carlos D Bustamante. 2015. Privacy risks from genomic data-sharing beacons. The American Journal of Human Genetics 97, 5 (2015), 631–646.Search in Google Scholar

[66] Latanya Sweeney, Akua Abu, and Julia Winn. 2013. Identifying participants in the personal genome project by name. arXiv preprint arXiv:1304.7605 (2013).Search in Google Scholar

[67] Tin Kam Ho. 1995. Random decision forests. In Proceedings of 3rd International Conference on Document Analysis and Recognition, Vol. 1. 278–282 vol.1.Search in Google Scholar

[68] Florian Tramer, Zhicong Huang, Jean-Pierre Hubaux, and Erman Ayday. 2015. Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies. In Proceedings of ACM Conference on Computer and Communications Security (CCS). 1286–1297.Search in Google Scholar

[69] Juan Ramón Troncoso-Pastoriza, Stefan Katzenbeisser, and Mehmet Celik. 2007. Privacy preserving error resilient DNA searching through oblivious automata. Proceedings of ACM CCS ’07 (2007).Search in Google Scholar

[70] Verizon. 2021. Verizon Fios Home Internet. https://www.verizon.com/home/fios-fastest-internet/Search in Google Scholar

[71] Peter M Visscher and William G Hill. 2009. The limits of individual identification from sample allele frequencies: theory and statistical analysis. PLoS Genet 5, 10 (2009).Search in Google Scholar

[72] Christoph von der Malsburg. 1986. Frank Rosenblatt: Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Brain Theory (01 1986), 245–248. https://doi.org/10.1007/978-3-642-70911-1_20Search in Google Scholar

[73] Nora von Thenen, Erman Ayday, and A Ercument Cicek. 2018. Re-identification of individuals in genomic data-sharing beacons via allele inference. Bioinformatics 35, 3 (2018), 365–371.Search in Google Scholar

[74] Susan Walsh, Fan Liu, Kaye N Ballantyne, Mannis van Oven, Oscar Lao, and Manfred Kayser. 2011. IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information. Forensic Science International: Genetics 5, 3 (2011), 170–180.Search in Google Scholar

[75] Rui Wang, Yong Fuga Li, XiaoFeng Wang, Haixu Tang, and Xiaoyong Zhou. 2009. Learning Your Identity and Disease from Research Papers: Information Leaks in Genome Wide Association Study. In Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS ’09). Association for Computing Machinery, New York, NY, USA, 534–544. https://doi.org/10.1145/1653662.1653726Search in Google Scholar

[76] Fei Yu, Stephen E Fienberg, Aleksandra B Slavkovi¢, and Caroline Uhler. 2014. Scalable privacy-preserving data sharing methodology for genome-wide association studies. Journal of Biomedical Informatics 50 (2014), 133–141.Search in Google Scholar

[77] Xiaoyong Zhou, Bo Peng, Yong Fuga Li, Yangyi Chen, Haixu Tang, and XiaoFeng Wang. 2011. To release or not to release: Evaluating information leaks in aggregate human-genome data. ESORICS’11: Proc. of the 16th European Conf. on Research in Computer Security (2011), 607–627.Search in Google Scholar

[78] Dmitry Zubakov, Fan Liu, MC Van Zelm, J Vermeulen, BA Oostra, CM Van Duijn, GJ Driessen, JJM Van Dongen, Manfred Kayser, and AW Langerak. 2010. Estimating human age from T-cell DNA rearrangements. Current Biology 20, 22 (2010), R970–R971.Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo