Shuting Xu, and Jun Zhang
Laboratory for High Performance Scientific Computing and
Computer Simulation
Department of Computer Science
University of Kentucky
Lexington, KY 40506-0046, USA
Accurate information extracted from datasets is required for making reasonable decisions using data mining algorithms. Privacy preservation has become one of the top priorities in the design of various data mining applications. In this paper, a novel data distortion strategy based on structural partition and sparsified Singular Value Decomposition (SSVD) technique is proposed. Three schemes, object-based partition, feature-based partition and hybrid partition, are defined to permit a tradeoff between privacy protection on centralized datasets and accuracy of data mining techniques. Some metrics to measure privacy preservation are used to examine the performance of the proposed new strategies. Data utility of the three proposed schemes is examined by a binary classification based on the support vector machine. Furthermore, the effect of different ranks of SVD and the threshold value of SSVD on data distortion and utility are also tested. Our experimental results indicate that, in comparison with standard data distortion techniques, the proposed schemes are very efficient in achieving a good tradeoff between data privacy and data utility, and it affords a feasible solution, with a significant reduction on the computational cost from SVD, to protect sensitive information and promise high accuracy in decision making.
Mathematics Subject Classification:
The research work was supported in part by the Kentucky New Economy Safety and Security (NESSI) Consortium, and by a grant No. 02KJB63001 of Research Project Grant of JiangSu, China.