Data Pattern Maintenance by Matrix Approximation:
An Application to Information Security

Jie Wang and Jun Zhang
Laboratory for High Performance Scientific Computing and Computer Simulation
Department of Computer Science
University of Kentucky
Lexington, KY 40506-0046, USA


Maintaining data mining accuracy on distorted datasets is an important issue in privacy preserving data mining. Using matrix approximation, we propose several efficient and flexible techniques to address this issue, and utilize some statistical metrics to analyse change of data pattern. We use the K-nearest neighbour classification to compare accuracy maintenance after data distortion by different methods. With better performance than some classical data perturbation approaches, nonnegative matrix factorization and singular value decomposition are considered to be promising techniques for privacy preserving data mining. Experimental results demonstrate that mining accuracy on the distorted data used these methods is almost as good as that on the original data, with added property of privacy preservation. It indicates that our matrix factorization-based data distortion schemes perturb only confidential attributes to meet privacy requirements while preserving general data pattern for knowledge extraction.

Key words: matrix factorization, nonnegative matrix factorization, privacy, data mining

Mathematics Subject Classification:

Download the the PDF file jiewang4.pdf.
Technical Report No. 477-07, Department of Computer Science, University of Kentucky, Lexington, KY, 2007.