In this paper, we present a hybrid clustering method that combines the divisive hierarchical clustering with the agglomerative hierarchical clustering. We used the bisect K-means divisive clustering algorithm in our method. First, we cluster the document collection using bisect K-means clustering algorithm with K' > K as the total number of clusters. Second, we calculate the centroids of K' clusters obtained from the previous step. Then we apply the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) agglomerative hierarchical algorithm on these centroids for the given K. After the UPGMA nds K clusters in these K' centroids, if two centroids ended up in the same cluster, then all of their documents will belong to the same cluster. We compared the goodness of clusters generated by the standard bi- sect K-means algorithm and the proposed hybrid algorithm, measured on various cluster evaluation metrics. Our experimental results show that the proposed method outperforms the standard bisect K-means algorithm. At the end, we show that picking a value for K' has no major impact in the end results by analyzing the relation between the value of K' and the quality of the clusters.
Mathematics Subject Classification: