Cluster assignment remapping -
i have test classification datasets uci machine learning repository labelled.
i stripping of labels , using data benchmark few clustering algorithm , planning use external validation methods. run algorithm different initial configurations, say, 50 times , take mean value. 50 iterations algorithm labels data points of 1 single cluster different numbers. because in each run cluster labels can change, because each iteration might have different cluster assignments, how somehow remap each of clusters 1 uniform numbering.
primary idea remap checking how many of points in class labels intersect maximum in actual labels , making remap based on that, can incorrect remappings because when classes have more or less equal number of points, not work.
another idea keep labels while clustering, make clustering algorithm ignore it. way cluster data have label tags. doable have have benchmarked cluster assignment data processed therefore trying avoid modifying , re-benchmarking implementation (which take quite time , cpu) of cluster analysis algorithms , include label tag vectors , ignore it.
is there way can compute average accuracy cluster assignments have right now?
edit:
the domain in studying (metaheuristic clustering algorithms) not find paper comparing these indexes. paper compares seems incorrect in values. can point me paper clustering results compared using of these indexes?
what do when number of clusters doesn't agree?
do not try map clusters.
instead, use proper external validation measures clustering, not require 1:1 correspondence of clusters. there plenty, details see wikipedia.
Comments
Post a Comment