Sparsification Strategies in Latent Semantic Indexing

Jing Gao, and Jun Zhang
Laboratory for High Performance Scientific Computing and Computer Simulation
Department of Computer Science
University of Kentucky
Lexington, KY 40506-0046, USA


The text retrieval method using Latent Semantic Indexing (LSI) with the truncated Singular Value Decomposition (SVD) has been intensively studied in recent years. The term-document matrices after SVD are full matrices, although the rank is reduced substantially. To reduce memory consumption, we examine some strategies to sparsify the truncated SVD matrices. After applying the sparsification strategies to three popular document databases, we find that some of our strategies not only sparsify the SVD matrices, but may also increase the accuracy of the text retrieval in some cases.

Key words:

Mathematics Subject Classification:

Download the compressed postscript file, or the PDF file svd.pdf.gz.
This paper has been published in Proceedings of the 2003 Text Mining Workshop, M. W. Berry and W. M. Pottenger, Eds., pp. 93-103, San Francisco, CA, May 3, 2003.

Technical Report 368-03, Department of Computer Science, University of Kentucky, Lexington, KY, 2003.

The research work of J. Gao was supported in part by the U.S. National Science Foundation under grant CCR-0092532.

The research work of J. Zhang was supported in part by the U.S. National Science Foundation under grants CCR-9988165, CCR-0092532, and ACR-0202934, by the U.S. Department of Energy Office of Science under grant DE-FG02-02ER45961, by the Kentucky Science & Engineering Foundation under grant KSEF-02-264-RED-002, by the apanese Research Organization for Information Science & Technology, and by the University of Kentucky Research Committee.