Clustering
http://java-ml.sourceforge.net/print/book/export/html/72
Published on Java Machine Learning Lib rary (Java-ML) (http://java-ml.sourceforge.net)
Home > Clustering
By Thomas AbeelCreated 1 2/16/2008 - 12:02
Clustering
This chapter will provide documentation on clustering algorithms, cluster evaluation measures
and other topics related to clustering data. We assume that youare already familiar with the
topics discussed in the Getting started [1] chapter.
Clustering basics
A clustering algorithm creates a division of the orginal dataset. In Java-ML this is done withthe
method cluster of the Clusterer interface.
Creating and running a clustering algorithm
1.
2.
3.
4.
5.
6.
7.
8.
/* Load a dataset */
Dataset data = FileHandler.loadDataset(new File[2]("iris.data"), 4, ",");
/* Create a new instance of the KMeans algorithm, with no options
* specified. By default this will generate 4 clusters. */
Clusterer km = new KMeans();
/* Cluster thedata, it will be returned as an array of data sets, with
* each dataset representing a cluster. */
Dataset[] clusters = km.cluster(data);
[Documented source code] [3]
The code above will load theexample iris data set. Next it creates an instance of the K-means
algorithms and uses it to cluster the data. The results are returned in an array of Datasets
where each Dataset represents a cluster.Note that there is no guarantee that all original Instances will occur in the clusters or that each
Instance occurs only once. Some algorithms allow overlapping clusters, some algorithms
allowthat 'noisy' datapoints are removed. This is algorithm specific and you can find more
information on the API page for each algorithm.
8/20/2009 1:54 PM
2 of 3http://java-ml.sourceforge.net/print/book/export/html/72
Cluster evaluation
Java-ML provides a large number of cluster evaluation measures that are provided in the
package net.sf.javaml.clustering.evaluation. All scores are...
Regístrate para leer el documento completo.