Document clustering algorithms usually show different performance on document sets with different closeness. In general, most document clustering algorithms perform better on independent and distant document sets than on similar or close document sets. We propose an efficient method, based on the Principal Direction Divisive Partitioning (PDDP) algorithm, which refines the clustering solutions according to the closeness of the document sets. The experimental results show that the quality of the clustering solutions obtained by our method is better than that from PDDP, while the time cost is about 39% less on average.
Mathematics Subject Classification:
The research work of S. Xu was supported in part by the U.S. National Science Foundation under grant ACR-0234270.
The research work of J. Zhang was supported in part by NSF under grants CCR-9988165, CCR-0092532, ACR-0202934, ACR-0234270, by DOE under grant DE-FG02-02ER45961, and by the University of Kentucky Research Committee.