«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1671-6833.2023.03.010]
点击复制

改进的密度峰值聚类算法的差分隐私保护方案()

分享到：

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:: 44
期数:: 2023年06期

页码:: 19-24

栏目:

出版日期:: 2023-12-25

文章信息/Info

Title:: Differential Privacy Protection Scheme of Adaptive Clustering by Fast Search and Find of Density Peaks

作者:: 葛丽娜¹; 2; 3; 陈圆圆¹; 王捷¹; 2; 王哲¹; 1. 广西民族大学人工智能学院,广西南宁 530006;2. 广西民族大学网络通信工程重点实验室,广西南宁 530006;3. 广西民族大学广西混杂计算与集成电路分析设计重点实验室,广西南宁 530006

Author(s):: GE Lina¹; 2; 3; CHEN Yuanyuan¹; WANG Jie¹; 2; WANG Zhe¹; 1. School of Artificial Intelligence, Guangxi Minzu University, Nanning 530006, China; 2. Key Laboratory of Network Communication Engineering, Guangxi Minzu University, Nanning 530006, China; 3. Guangxi Key Laboratory of Hybrid Computation and IC Design Analysis, Guangxi Minzu University, Nanning 530006, China

关键词:: 密度峰值; 差分隐私; 随机噪声; 聚类算法

Keywords:: density peaks; differential privacy; random noise; clustering algorithm

分类号:: TP391

DOI:: 10.13705/j.issn.1671-6833.2023.03.010

文献标志码:: A

摘要:: 针对改进的密度峰值聚类(AdDPC) 算法在计算局部密度时产生的隐私泄露问题以及算法的一次分配策略,提出一种改进的密度峰值聚类算法的差分隐私保护方案。该方案在算法计算局部密度的过程中添加 Laplace 随机噪声,使得即使攻击者拥有最大背景知识,也无法通过添加或者删除数据集中的某一点来获取相应的信息,从而利用差分攻击获取目标数据点的信息,达到保护隐私数据的目的,并且在分配非聚类中心点时引入可达定义改进 AdDPC 算法的分配策略,避免因为一次分配策略导致数据点分配错误的问题。实验对比了 DP-rcCFSFDP 算法、 AdAPC-rDP 算法、IDP K-means 算法的 F-Measure 和 ARI,结果表明:当隐私预算大于 1. 5 时,所提算法的 F-Measure 和 ARI 优于其他算法,所提算法能够在保护敏感数据的同时保证数据的可用性。

Abstract:: In order to solve the privacy leakage problem caused by adaptive clustering by fast search and find of density peaks(AdDPC) when calculating the local density and the primary allocation strategy, a differential privacy protection scheme of an improved density peak clustering algorithm was proposed. In this scheme, the Laplace random noise was added in the process of calculating the local density of the algorithm. In this way, even if the attacker had the maximum background knowledge, it could not obtain the corresponding information by adding or deleting a point in the dataset, thereby, differential attack was used to obtain the information of the target data point, and to achieve the purpose of protecting the privacy data. In addition, the reachability definition was introduced to improve the allocation strategy of AdDPC when assigning non-clustered center points, so as to avoid the problem of data point allocation error caused by the one-time allocation strategy. The experiment compared F-Measure and ARI values of DP-rcCFSFDP, AdAPC-rDP, IDP K-means, and results showed that: when the privacy budget was greater than 1. 5, the F-Measure and ARI values of the proposed algorithm were better than those of other algorithms, and this algorithm could protect sensitive data and data availability at the same time.

参考文献/References:

[1] SARLE W S. Algorithms for clustering data[ J] . Technometrics, 1990, 32(2) : 227-229.

[2] MA S H, GUO P K, YOU H R, et al. An image matching optimization algorithm based on pixel shift clustering RANSAC[J]. Information Sciences, 2021, 562: 452-474.

[3] DU Z J, LUO H Y, LIN X D, et al. A trust-similarity analysis-based clustering method for large-scale group decision-making under a social network [ J ] . Information Fusion, 2020, 63: 13-29.

[4] HASSAN B A, RASHID T A, HAMARASHID H K. A novel cluster detection of COVID-19 patients and medical disease conditions using improved evolutionary clustering algorithm star[ J] . Computers in Biology and Medicine, 2021, 138: 104866.

[5] 熊平, 朱天清, 王晓峰. 差分隐私保护及其应用[ J] . 计算机学报, 2014, 37(1) : 101-122.

XIONG P, ZHU T Q, WANG X F. A survey on differential privacy and applications [ J ] . Chinese Journal of Computers, 2014, 37(1) : 101-122.

[6] DWORK C. Differential privacy[M]∥Automata, Languages and Programming. Berlin: Springer, 2006: 1-12.

[7] 李杨, 郝志峰, 温雯, 等. 差分隐私保护 k-means 聚类方法研究[J]. 计算机科学, 2013, 40(3): 287-290.

LI Y, HAO Z F, WEN W, et al. Research on differential privacy preserving k-means clustering [ J] . ComputerScience, 2013, 40(3) : 287-290.

[8] RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[ J] . Science, 2014, 344 ( 6191) : 1492-1496.

[9] 谢娟英, 高红超, 谢维信. K 近邻优化的密度峰值快速搜索聚类算法[ J] . 中国科学: 信息科学, 2016, 46 (2) : 258-280.

XIE J Y, GAO H C, XIE W X. K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a dataset[ J] . Scientia Sinica ( Informationis) , 2016, 46(2) : 258-280.

[10] DWORK C, MCSHERRY F, NISSIM K, et al. Calibrating noise to sensitivity in private data analysis[M]∥Theory of Cryptography. Berlin: Springer, 2006: 265-284.

[11] 陈韵. 基于差分隐私密度峰值聚类算法的研究和应用 [D] . 南京: 南京邮电大学, 2020.

CHEN Y. Research of density peak clustering algorithm based on differential privacy preserving[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2020.

[12] SASAKL Y. The truth of the F-Measure [ J] . Teach tutor mater, 2007, 1(5) : 1-5.

[13] HUBERT L, ARABIE P. Comparing partitions[ J] . Journal of Classification, 1985, 2(1) : 193-218.

[14] DUA D, TANISKIDOU E K. UCI Machine learning repository[EB / OL] . ( 2017 - 02 - 13) [ 2022 - 10 - 11] . https:∥archive. ics. uci. edu / ml / index. php.

[15] STREET W N, WOLBERG W H, MANGASARIAN O L. Nuclear feature extraction for breast tumor diagnosis[C]∥ Proc SPIE 1905, Biomedical Image Processing and Biomedical Visualization. San Jose: SPIE, 1993: 861-870.

[16] GIONIS A, MANNILA H, TSAPARAS P. Clustering aggregation[ J] . ACM Transactions on Knowledge Discovery from Data, 2007, 1(1) : 1-12.

更新日期/Last Update: 2023-10-22

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

文章信息/Info

参考文献/References:

常用功能

导航/Navigate

工具/Tools

统计/Statistics