Data mining techniques enable discovery of valuable data patterns and knowledge in shared data and increase profitability and enhance national security. Security and privacy threats arising from the use of data mining techniques bring a risk of disclosure of confidential knowledge as data is made public. How to control the level of knowledge disclosure and secure certain confidential patterns is a subtask comparable to confidential data hiding in privacy preserving data mining. We propose a technique to simultaneously hide data values and confidential patterns without undesirable side effects on distorting nonconfidential patterns. We use nonnegative matrix factorization technique to distort the original dataset and preserve its overall characteristics. A factor swapping method is designed to hide particular confidential patterns in an unsupervised learning. The effectiveness of this novel hiding technique is examined by conducting k-means clustering on a benchmark dataset. Experimental results indicate that our technique can produce a single modified dataset to achieve both pattern and data value hiding. The usability of the data is well maintained. Under certain constraints on the nonnegative matrix factorization iterations, an optimal solution can be computed in which the user-specified confidential memberships or relationships are hidden without undesirable alterations on nonconfidential patterns.
Mathematics Subject Classification:
The research work of Jun Zhang was supported in part by the U.S. National Science Foundation under grant CCF-0527967, in part by the National Institutes of Health under grant 1R01HL086644-01, in part by the Kentucky Science and Engineering Foundation under grant KSEF-148-502-06-186, and in part by the Alzheimer's Association under Grant NIGR-06-25460.