High-Dimensional Text Datasets Clustering Algorithm Based on Cuckoo Search and Latent Semantic Indexing

	À propos de l'application DSpace

	Accueil

Parcourir le dépôt par :
	Communautés et collections
	Issue Date
	Author
	Title
	Subject

Services personnalisés :
	Recevoir les nouveautés
	Espace personnel utilisateurs autorisés
	Modifier mon profil


	À propos de DSpace

Depot Institutionnel de l'UMBB >
Publications Scientifiques >
Publications Internationales >

Veuillez utiliser cette adresse pour citer ce document : http://dlibrary.univ-boumerdes.dz:8080/handle/123456789/6695

Titre:	High-Dimensional Text Datasets Clustering Algorithm Based on Cuckoo Search and Latent Semantic Indexing
Auteur(s):	Ishak Boushaki, Saida Kamel, Nadjet Bendjeghaba, Omar
Mots-clés:	Cuckoo search optimisation Document clustering High-dimensional text clustering Vector space model
Date de publication:	2018
Editeur:	World Scientific Publishing Co. Pte Ltd
Collection/Numéro:	Journal of Information and Knowledge ManagementVol. 17, N° 3, 1 (2018);
Résumé:	The clustering is an important data analysis technique. However, clustering high-dimensional data like documents needs more effort in order to extract the richness relevant information hidden in the multidimensionality space. Recently, document clustering algorithms based on metaheuristics have demonstrated their efficiency to explore the search area and to achieve the global best solution rather than the local one. However, most of these algorithms are not practical and suffer from some limitations, including the requirement of the knowledge of the number of clusters in advance, they are neither incremental nor extensible and the documents are indexed by high-dimensional and sparse matrix. In order to overcome these limitations, we propose in this paper, a new dynamic and incremental approach (CS_LSI) for document clustering based on the recent cuckoo search (CS) optimization and latent semantic indexing (LSI). Conducted Experiments on four well-known high-dimensional text datasets show the efficiency of LSI model to reduce the dimensionality space with more precision and less computational time. Also, the proposed CS_LSI determines the number of clusters automatically by employing a new proposed index, focused on significant distance measure. This later is also used in the incremental mode and to detect the outlier documents by maintaining a more coherent clusters. Furthermore, comparison with conventional document clustering algorithms shows the superiority of CS_LSI to achieve a high quality of clustering.
URI/URL:	DOI: 10.1142/S0219649218500338 http://dlibrary.univ-boumerdes.dz:8080/handle/123456789/6695
ISSN:	02196492
Collection(s) :	Publications Internationales

Fichier(s) constituant ce document :

Il n'y a pas de fichiers associés à ce document.

View Statistics

Ce site utilise l'application DSpace, Version 1.4.1 - Commentaires