Soinn

Páginas: 40 (9927 palabras) Publicado: 3 de junio de 2012
DTD 5

ARTICLE IN PRESS

Neural Networks xx (2005) 1–17 www.elsevier.com/locate/neunet

An incremental network for on-line unsupervised classification and topology learning
Shen Furaoa,*, Osamu Hasegawab,c
a

Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology R2-52, 4259 Nagatsuta, Midori-ku, Yokohama 226-8503, Japan b Imaging Science andEngineering Lab., Tokyo Institute of Technology c PRESTO, Japan Science and Technology Agency (JST), Japan Received 12 December 2003; accepted 11 April 2005

Abstract This paper presents an on-line unsupervised learning mechanism for unlabeled data that are polluted by noise. Using a similarity thresholdbased and a local error-based insertion criterion, the system is able to grow incrementally and toaccommodate input patterns of on-line nonstationary data distribution. A definition of a utility parameter, the error-radius, allows this system to learn the number of nodes needed to solve a task. The use of a new technique for removing nodes in low probability density regions can separate clusters with low-density overlaps and dynamically eliminate noise in the input data. The design of two-layerneural network enables this system to represent the topological structure of unsupervised on-line data, report the reasonable number of clusters, and give typical prototype patterns of every cluster without prior conditions such as a suitable number of nodes or a good initial codebook. q 2005 Elsevier Ltd. All rights reserved.
Keywords: On-line unsupervised learning; Stationary environment;Non-stationary environment; Clustering; Topology representation

1. Introduction One objective of unsupervised learning is construction of decision boundaries based on unlabeled training data. Unsupervised classification is also known as data clustering and is defined as the problem of finding homogeneous groups of data points in a given multidimensional data set (Jain & Dubes, 1988). Each of thesegroups is called a cluster and defined as a region in which the density of objects is locally higher than in other regions. Clustering algorithms are classifiable into hierarchical clustering, partitional clustering, fuzzy clustering, nearestneighbor clustering, artificial neural networks for clustering, etc. (Murty, Jain, & Flynn, 1999). Hierarchical clustering algorithms such as single-link (Sneath &Sokal, 1973),
* Corresponding author. Tel.: C81 45 924 5180; fax: C81 45 924 5175. E-mail addresses: furaoshen@isl.titech.ac.jp (S. Furao), hasegawa@isl. titech.ac.jp (O. Hasegawa). URL: http://www.isl.titech.ac.jp/hasegawalab/shen/index.htm.

0893-6080/$ - see front matter q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.neunet.2005.04.006

complete-link (King, 1967), and CURE (Guha,Rastogi, & Shim, 1998) usually find satisfiable clustering results but suffer from computational overload and the requirement for much memory space (Murty et al., 1999). Hierarchical clustering algorithms are therefore unsuitable for large data sets or on-line data. BIRCH is an extremely efficient hierarchical clustering algorithm (Zhang, Ramakrishnan, & Livny, 1996), but is properly applicable to datasets consisting only of isotropic clusters: a two-dimensional (2D) circle, or a three-dimensional (3D) sphericity, etc. Specifically, chain-like and concentric clusters are difficult to identify using BIRCH (Guha et al., 1998; Oyang, Chen, & Yang, 2001). Most partitional clustering algorithms run in linear time and work on large data sets (Lin & Chen, 2002). The k-means algorithm, a conventionallyused partitional clustering algorithms, suffers from deficiencies such as dependence on initial starting conditions (Lozano, Pena, & Larranaga, 1999) and a tendency to result in local minima. Likas, Vlassis, and Verbeek (2003) proposed a global k-means algorithm, an incremental approach to clustering that dynamically adds one cluster center at a time through a

DTD 5

ARTICLE IN PRESS
S....
Leer documento completo

Regístrate para leer el documento completo.

Conviértase en miembro formal de Buenas Tareas

INSCRÍBETE - ES GRATIS