The text retrieval method using Latent Semantic Indexing (LSI) with the truncated Singular Value Decomposition (SVD) has been intensively studied in recent years. The term-document matrices after SVD are full matrices, although the rank is reduced substantially. To reduce memory consumption, we examine some strategies to sparsify the truncated SVD matrices. After applying the sparsification strategies to three popular document databases, we find that some of our strategies not only sparsify the SVD matrices, but may also increase the accuracy of the text retrieval in some cases.
Mathematics Subject Classification:
Technical Report 368-03, Department of Computer Science, University of Kentucky, Lexington, KY, 2003.
The research work of J. Gao was supported in part by the U.S. National Science Foundation under grant CCR-0092532.
The research work of J. Zhang was supported in part by the U.S. National Science Foundation under grants CCR-9988165, CCR-0092532, and ACR-0202934, by the U.S. Department of Energy Office of Science under grant DE-FG02-02ER45961, by the Kentucky Science & Engineering Foundation under grant KSEF-02-264-RED-002, by the apanese Research Organization for Information Science & Technology, and by the University of Kentucky Research Committee.