Clustered SVD Strategies in Latent Semantic Indexing

Jing Gao, and Jun Zhang
Laboratory for High Performance Scientific Computing and Computer Simulation
Department of Computer Science
University of Kentucky
Lexington, KY 40506-0046, USA

Abstract

The text retrieval method using Latent Semantic Indexing (LSI) technique with truncated Singular Value Decomposition (SVD) has been intensively studied in recent years. The SVD reduces the noise contained in the original representation of the term-document matrix and improves the information retrieval accuracy. Recent studies indicate that SVD is mostly useful for small homogeneous data collections. For large inhomogeneous datasets, the performance of the SVD based text retrieval technique may deteriorate. We propose to partition a large inhomogeneous dataset into several smaller ones with clustered structure, on which we apply the truncated SVD. Our experimental results show the the clustered SVD strategies may enhance the retrieval accuracy and reduce the computing and storage costs.


Key words:

Mathematics Subject Classification:


Download the compressed postscript file gao2.ps.gz, or the PDF file gao2.pdf.

This paper has been published in Information Processing and Management, Vol. 41, No. 5, pp. 1051-1063, (2005).

Also as, Technical Report 382-03, Department of Computer Science, University of Kentucky, Lexington, KY, 2003.

The research work of the authors was supported in part by the U.S. National Science Foundation under grants CCR-9988165, CCR-0092532, ACR-0202934, and ACR-0234270, by the U.S. Department of Energy Office of Science under grant DE-FG02-02ER45961.