Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation : WestminsterResearch

Publication dates
Title	Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation
Type	Journal article
Authors	Yigzaw, KY.
	Michalas, A.
	Bellika, J.G.
Abstract	Background Techniques have been developed to compute statistics on distributed datasets without revealing private information except the statistical results. However, duplicate records in a distributed dataset may lead to incorrect statistical results. Therefore, to increase the accuracy of the statistical analysis of a distributed dataset, secure deduplication is an important preprocessing step. Methods We designed a secure protocol for the deduplication of horizontally partitioned datasets with deterministic record linkage algorithms. We provided a formal security analysis of the protocol in the presence of semi-honest adversaries. The protocol was implemented and deployed across three microbiology laboratories located in Norway, and we ran experiments on the datasets in which the number of records for each laboratory varied. Experiments were also performed on simulated microbiology datasets and data custodians connected through a local area network. Results The security analysis demonstrated that the protocol protects the privacy of individuals and data custodians under a semi-honest adversarial model. More precisely, the protocol remains secure with the collusion of up to N − 2 corrupt data custodians. The total runtime for the protocol scales linearly with the addition of data custodians and records. One million simulated records distributed across 20 data custodians were deduplicated within 45 s. The experimental results showed that the protocol is more efficient and scalable than previous protocols for the same problem. Conclusions The proposed deduplication protocol is efficient and scalable for practical uses while protecting the privacy of patients and data custodians.
Keywords	Bloom Filter
	Data Reuse
	Deduplication
	Distributed Statistical Computation
	Data Linkage
	Duplicate Record
	Electronic Health Record
	Privacy
	Record Linkage
	Set Intersection
Article number	17
Journal	BMC Medical Informatics and Decision Making
ISSN	1472-6947
Year	2017
Publisher	BioMed Central
Publisher's version	2016-BMC - Secure_Scalable_Deduplication.pdf
Digital Object Identifier (DOI)	https://doi.org/10.1186/s12911-016-0389-x
Published	03 Jan 2017
License	CC BY 4.0

Related outputs

FaaS and Furious: Accelerating Privacy-Preserving ML with Function as a Service at the Edge
Tusa, F., Michalas, A., Bowden, J. and Kiss, T. 2025. FaaS and Furious: Accelerating Privacy-Preserving ML with Function as a Service at the Edge. ICCCN 2025 - 34th International Conference on Computer Communications and Networks. Tokyo, Japan 04 - 07 Aug 2025 IEEE .

Blind Brother: Attribute-Based Selective Video Encryption
Frimpong, E., Liu, B., Nuoskala, C. and Michalas, A. 2025. Blind Brother: Attribute-Based Selective Video Encryption. 15th ACM Conference on Data and Application Security and Privacy (CODASPY'25). Pittsburgh, PA, USA 04 - 06 Jun 2025 ACM. https://doi.org/10.1145/3714393.372651

Point Intervention: Improving ACVP Test Vector Generation Through Human Assisted Fuzzing
Gridin, I. and Michalas, A. 2024. Point Intervention: Improving ACVP Test Vector Generation Through Human Assisted Fuzzing. Point Intervention: Improving ACVP Test Vector Generation Through Human Assisted Fuzzing. Mytilene, Lesvos, Greece 26 - 28 Aug 2024 Springer.

Wildest Dreams: Reproducible Research in Privacy-preserving Neural Network Training
Khan, T., Budzys, M., Nguyen, K. and Michalas, A. 2024. Wildest Dreams: Reproducible Research in Privacy-preserving Neural Network Training. Privacy Enhancing Technologies Symposium 2024. Bristol, UK 15 - 20 Jul 2024 Proceedings on Privacy Enhancing Technologies. https://doi.org/10.56553/popets-2024-0072

Rainbow Over Clouds: A Lightweight Pairing-free Multi-Replica Multi-Cloud Public Auditing Scheme
Rabaninejad, R., Michalas, A. and Kanhere, S. 2024. Rainbow Over Clouds: A Lightweight Pairing-free Multi-Replica Multi-Cloud Public Auditing Scheme. The 19th International Conference on Risks and Security of Internet and Systems (CRiSIS'24). Aix-en-Provence, France 26 - 28 Nov 2024 Springer.

SPADE: Digging into Selective and PArtial DEcryption using Functional Encryption
Nuoskala, C., Abdinasibfar, H. and Michalas, A. 2024. SPADE: Digging into Selective and PArtial DEcryption using Functional Encryption. The 20th EAI International Conference on Security and Privacy in Communication Networks (SecureComm’24). Dubai, United Arab Emirates 28 - 30 Oct 2024 EAI.

Make Split, not Hijack: Preventing Feature-Space Hijacking Attacks in Split Learning
Khan, T., Budzys, M. and Michalas, A. 2024. Make Split, not Hijack: Preventing Feature-Space Hijacking Attacks in Split Learning. The 29th ACM Symposium on Access Control Models and Technologies. San Antonio, TX, USA 15 - 17 May 2024 Association for Computing Machinery (ACM). https://doi.org/10.1145/3649158.3657039

FE[r]Chain: Enforcing Fairness in Blockchain Data Exchanges Through Verifiable Functional Encryption
Nuoskala, C., Rabbaninejad, R., Dimitriou, T. and Michalas, A. 2024. FE[r]Chain: Enforcing Fairness in Blockchain Data Exchanges Through Verifiable Functional Encryption. The 29th ACM Symposium on Access Control Models and Technologies. San Antonio, TX, USA 15 - 17 May 2024 Association for Computing Machinery (ACM). https://doi.org/10.1145/3649158.3657049

GuardML: Efficient Privacy-Preserving Machine Learning Services Through Hybrid Homomorphic Encryption
Frimpong, E., Nguyen, K., Budzys, M., Khan, T. and Michalas, A. 2024. GuardML: Efficient Privacy-Preserving Machine Learning Services Through Hybrid Homomorphic Encryption. Michalas, A. (ed.) 39th ACM/SIGAPP Symposium On Applied Computing (SAC'24). Avila, Spain 08 - 12 Apr 2024 Association for Computing Machinery (ACM). https://doi.org/10.1145/3605098.3635983

Symmetrical Disguise: Realizing Homomorphic Encryption Services from Symmetric Primitives
Bakas, A., Frimpong, E. and Michalas, A. 2023. Symmetrical Disguise: Realizing Homomorphic Encryption Services from Symmetric Primitives. 18th EAI International Conference on Security and Privacy in Communication Networks (SecureComm’22). Kansas City, United States 17 - 19 Oct 2022 Springer. https://doi.org/10.1007/978-3-031-25538-0_19

MetaPriv: Acting in Favor of Privacy on Social Media Platforms
Cantaragiu, R., Michalas, A., Frimpong, E. and Bakas, A. 2023. MetaPriv: Acting in Favor of Privacy on Social Media Platforms. 18th EAI International Conference on Security and Privacy in Communication Networks (SecureComm’22). Kansas City, United States 17 - 19 Oct 2022 Springer. https://doi.org/10.1007/978-3-031-25538-0_36

Love or Hate? Share or Split? Privacy-Preserving Training Using Split Learning and Homomorphic Encryption
Khan, T., Nguyen, K., Michalas, A. and Bakas, A. 2023. Love or Hate? Share or Split? Privacy-Preserving Training Using Split Learning and Homomorphic Encryption. 20th Annual International Conference on Privacy, Security & Trust (PST'23). Copenhagen, Denmark 21 - 23 Aug 2023 Springer.

stoRNA: Stateless Transparent Proofs of Storage-time
Rabaninejad, R., Abdolmaleki, B., Malavolta, G., Michalas, A. and Nabizadeh, A. 2023. stoRNA: Stateless Transparent Proofs of Storage-time. 28th European Symposium on Research in Computer Security (ESORICS’23). The Hague, the Netherlands 25 - 29 Sep 2023 Springer. https://doi.org/10.1007/978-3-031-51479-1_20

Split Without a Leak: Reducing Privacy Leakage in Split Learning
Nguyen, K., Khan, T. and Michalas, A. 2023. Split Without a Leak: Reducing Privacy Leakage in Split Learning. 19th EAI International Conference on Security and Privacy in Communication Networks (SecureComm’23). Hong Kong SAR, Hong Kong 19 - 21 Oct 2023 Springer.

Cryptographic Role-Based Access Control, Reconsidered
Liu, B., Michalas, A. and Warinschi, B. 2022. Cryptographic Role-Based Access Control, Reconsidered. 16th International Conference on Provable and Practical Security (ProvSec’22). Nanjing, China 11 - 12 Nov 2022 Springer. https://doi.org/10.1007/978-3-031-20917-8_19

Footsteps in the fog: Certificateless fog-based access control
Frimpong, E., Michalas, A. and Ullah, A. 2022. Footsteps in the fog: Certificateless fog-based access control. Computers and Security. 121 102866. https://doi.org/10.1016/j.cose.2022.102866

Power Range: Forward Private Multi-Client Symmetric Searchable Encryption with Range Queries Support
Bakas, A. and Michalas, A. 2020. Power Range: Forward Private Multi-Client Symmetric Searchable Encryption with Range Queries Support. The 25th IEEE International Conference on Communications (ISCC’20). Rennes, France (switched to virtual due to the pandemic) 07 - 10 Jul 2020 IEEE . https://doi.org/10.1109/ISCC50000.2020.9219739

Charlie and the CryptoFactory: Towards Secure and Trusted Manufacturing Environments
Michalas, A. and Kiss, T. 2020. Charlie and the CryptoFactory: Towards Secure and Trusted Manufacturing Environments. IEEE MELECON 2020. Palermo, Italy 16 - 18 Jun 2020 IEEE . https://doi.org/10.1109/MELECON48756.2020.9140712

Modern Family: A Revocable Hybrid Encryption Scheme Based on Attribute-Based Encryption, Symmetric Searchable Encryption and SGX
Bakas, A. and Michalas, A. 2019. Modern Family: A Revocable Hybrid Encryption Scheme Based on Attribute-Based Encryption, Symmetric Searchable Encryption and SGX. 15th EAI International Conference on Security and Privacy in Communication Networks (SecureComm’19). Orlando, United States 25 Jul - 23 Oct 2019 Springer. https://doi.org/10.1007/978-3-030-37231-6_28

The Lord of the Shares: Combining Attribute-Based Encryption and Searchable Encryption for Flexible Data Sharing
Michalas, A. 2019. The Lord of the Shares: Combining Attribute-Based Encryption and Searchable Encryption for Flexible Data Sharing. 34th ACM/SIGAPP Symposium on Applied Computing (SAC'19). Limassol, Cyprus 08 - 12 Apr 2019 ACM. https://doi.org/10.1145/3297280.3297297

Towards Secure Cloud Orchestration for Multi-Cloud Deployments
Paladi, N., Michalas, A. and Dang, H. 2018. Towards Secure Cloud Orchestration for Multi-Cloud Deployments. The 5th Workshop on CrossCloud Infrastructures & Platforms. Porto, Portugal 23 - 26 Apr 2018 ACM.

MemTri: A Memory Forensics Triage Tool using Bayesian Network and Volatility
Michalas, A. and Murray, R. 2017. MemTri: A Memory Forensics Triage Tool using Bayesian Network and Volatility. The 9th ACM CCS International Workshop on Managing Insider Security Threats (MIST’17) in Conjunction with ACM CCS 2017. Dallas, TX, USA 30 Oct - 03 Nov 2017 ACM. https://doi.org/10.1145/3139923.3139926

A Survey on Design and Implementation of Protected Searchable Data in the Cloud
Dowsley, R., Michalas, A., Nagel, M. and Paladi, N. 2017. A Survey on Design and Implementation of Protected Searchable Data in the Cloud. Computer Science Review. 26, pp. 17-30. https://doi.org/10.1016/j.cosrev.2017.08.001

Middle Man: An Efficient Two-Factor Authentication Framework
Costa, J. and Michalas, A. 2017. Middle Man: An Efficient Two-Factor Authentication Framework. 3rd IEEE International Conference On Computing, Communication, Control And Automation. Pune, India 17 - 18 Aug 2017 IEEE . https://doi.org/10.1109/ICCUBEA.2017.8463686

HealthShare: Using Attribute-Based Encryption for Secure Data Sharing Between Multiple Clouds
Michalas, A. and Weingarten, N. 2017. HealthShare: Using Attribute-Based Encryption for Secure Data Sharing Between Multiple Clouds. Proceedings of the 30th IEEE International Symposium on Computer-Based Medical Systems (CBMS’17). Thessaloniki, Greece 22 - 24 Jun 2017 IEEE . https://doi.org/10.1109/CBMS.017.30

PaaSword: A Holistic Data Privacy and Security by Design Framework for Cloud Services
Verginadis, Y., Michalas, A., Gouvas, P., Schiefer, G., Hübsch, G. and Paraskakis, I. 2017. PaaSword: A Holistic Data Privacy and Security by Design Framework for Cloud Services. Journal of Grid Computing. 15 (2), pp. 219-234. https://doi.org/10.1007/s10723-017-9394-2

Providing User Security Guarantees in Public Infrastructure Clouds
Paladi, N., Gehrmann, C. and Michalas, A. 2017. Providing User Security Guarantees in Public Infrastructure Clouds. IEEE Transactions on Cloud Computing. 5 (3), pp. 405-419. https://doi.org/10.1109/TCC.2016.2525991

Mem Tri: Memory Forensics Triage Tool
Michalas, A. and Murray, R 2016. Mem Tri: Memory Forensics Triage Tool. Cyber Security Group, University of Westminster.

LocLess: Do You Really Care Where Your Cloud Files Are?
Michalas, A. and Yigzaw, K.Y. 2016. LocLess: Do You Really Care Where Your Cloud Files Are? Cloud Security and Data Privacy by Design (CloudSPD’16), Workshop co-located with the 9th IEEE/ACM International Conference on Utility and Cloud Computing. Luxembourg 12 - 15 Dec 2016 IEEE . https://doi.org/10.1109/CloudCom.2016.0090

Sharing in the Rain: Secure and Efficient Data Sharing for the Cloud
Michalas, A. 2016. Sharing in the Rain: Secure and Efficient Data Sharing for the Cloud. 11th International Conference for Internet Technology and Secured Transactions (ICITST-2016). Barcelona 05 - 07 Dec 2016 IEEE . https://doi.org/10.1109/ICITST.2016.7856693

Secure and Scalable Statistical Computation of Questionnaire Data in R
Yigzaw, K.Y., Michalas, A. and Bellika, J. 2016. Secure and Scalable Statistical Computation of Questionnaire Data in R. IEEE Access. 4, pp. 4635-4645. https://doi.org/10.1109/ACCESS.2016.2599851

The Data of Things: Strategies, Patterns and Practice of Cloud-based Participatory Sensing
Michalas, A. and Giannetsos, T. 2016. The Data of Things: Strategies, Patterns and Practice of Cloud-based Participatory Sensing. International Conference on Innovations in InfoBusiness and Technology (ICIIT). Colombo, Sri Lanka 04 Mar 2016

PaaSword: A Holistic Data Privacy and Security by Design Framework for Cloud Services
Verginadis, Y., Michalas, A., Gouvas, P., Schiefer, G., Hübsch, G. and Paraskakis, I. 2015. PaaSword: A Holistic Data Privacy and Security by Design Framework for Cloud Services. 5th International Conference on Cloud Computing and Services Science (CLOSER'15). Lisbon, Portugal 20 May 2015 SCITEPRESS. https://doi.org/10.5220/0005489302060213

"One of our hosts in another country": Challenges of data geolocation in cloud storage
Paladi, N. and Michalas, A. 2014. "One of our hosts in another country": Challenges of data geolocation in cloud storage. The 6th IEEE Conference on Wireless Communication, Vehicular Technology, Information Theory and Aerospace & Electronic Systems Technology (Wireless VITAE). Aalborg, Denmark 11 May 2014 IEEE . https://doi.org/10.1109/VITAE.2014.6934507

The lord of the sense: A privacy preserving reputation system for participatory sensing applications
Michalas, A. and Komninos, N. 2014. The lord of the sense: A privacy preserving reputation system for participatory sensing applications. The 19th IEEE International Conference on Communications (ISCC'2014). Madeira, Portugal 23 Jun 2014 IEEE . https://doi.org/10.1109/ISCC.2014.6912480

Domain Based Storage Protection with Secure Access Control for the Cloud
Paladi, N., Michalas, A. and Gehrmann, C. 2014. Domain Based Storage Protection with Secure Access Control for the Cloud. The 2014 International Workshop on Security in Cloud Computing, held in conjunction with the 9th ACM Symposium on Information, Computer and Communications Security (ASIACCS). Kyoto, Japan 04 Jun 2014 ACM. https://doi.org/10.1145/2600075.2600082

Security aspects of e-health systems migration to the cloud
Michalas, A., Paladi, N. and Gehrmann, C. 2014. Security aspects of e-health systems migration to the cloud. 16th IEEE International Conference on E-health Networking, Application & Services (Healthcom). Natal, Brazil 15 Oct 2014 IEEE . https://doi.org/10.1109/HealthCom.2014.7001843

Multi-party trust computation in decentralized environments in the presence of malicious adversaries
Dimitriou, T. and Michalas, A. 2013. Multi-party trust computation in decentralized environments in the presence of malicious adversaries. Ad Hoc Networks . 15 (2014), pp. 53-66. https://doi.org/10.1016/j.adhoc.2013.04.013

Multi-Party Trust Computation in Decentralized Environments
Dimitriou, T. and Michalas, A. 2012. Multi-Party Trust Computation in Decentralized Environments. International Conference on New Technologies, Mobility and Security (NTMS). Istanbul 07 - 10 May 2012 IEEE . https://doi.org/10.1109/NTMS.2012.6208686

Secure & Trusted Communication in Emergency Situations
Michalas, A., Bakopoulos, M., Komninos, N. and Prasad Neeli, R. 2012. Secure & Trusted Communication in Emergency Situations. Sarnoff Symposium (SARNOFF). Newark, NJ 21 - 22 May 2012 IEEE . https://doi.org/10.1109/SARNOF.2012.6222751

SecGOD - Google Docs: Now I Feel Safer!
Michalas, A. and Bakopoulos, M. 2012. SecGOD - Google Docs: Now I Feel Safer! The 7th IEEE International Conference for Internet Technology and Secured Transactions (ICITST-2012). London, UK 10 Dec 2012 IEEE .

Vulnerabilities of decentralized additive reputation systems regarding the privacy of individual votes
Michalas, A., Dimitriou, T., Giannetsos, T., Komninos, N. and Prasad Neeli, R. 2012. Vulnerabilities of decentralized additive reputation systems regarding the privacy of individual votes. Wireless Personal Communications. 66 (3), pp. 559-575. https://doi.org/10.1007/s11277-012-0734-z

Mitigate DoS and DDoS Attack in Ad Hoc Networks
Michalas, A., Komninos, N. and Prasad Neeli, R. 2011. Mitigate DoS and DDoS Attack in Ad Hoc Networks. International Journal of Digital Crime and Forensics (IJDCF). 3 (1), pp. 14-36. https://doi.org/10.4018/jdcf.2011010102

Permalink - https://westminsterresearch.westminster.ac.uk/item/9z6vx/secure-and-scalable-deduplication-of-horizontally-partitioned-health-data-for-privacy-preserving-distributed-statistical-computation

Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation

Related outputs

Share this

Usage statistics

Export as