QUICK CLUSTER
QUICK CLUSTER VAR_LIST
[/CRITERIA=CLUSTERS(K) [MXITER(MAX_ITER)] CONVERGE(EPSILON) [NOINITIAL]]
[/MISSING={EXCLUDE,INCLUDE} {LISTWISE, PAIRWISE}]
[/PRINT={INITIAL} {CLUSTER}]
[/SAVE[=[CLUSTER[(MEMBERSHIP_VAR)]] [DISTANCE[(DISTANCE_VAR)]]]
The QUICK CLUSTER
command performs k-means clustering on the
dataset. This is useful when you wish to allocate cases into clusters
of similar values and you already know the number of clusters.
The minimum specification is QUICK CLUSTER
followed by the names of
the variables which contain the cluster data. Normally you will also
want to specify /CRITERIA=CLUSTERS(K)
where K
is the number of
clusters. If this is not specified, then K
defaults to 2.
If you use /CRITERIA=NOINITIAL
then a naive algorithm to select the
initial clusters is used. This will provide for faster execution but
less well separated initial clusters and hence possibly an inferior
final result.
QUICK CLUSTER
uses an iterative algorithm to select the clusters
centers. The subcommand /CRITERIA=MXITER(MAX_ITER)
sets the maximum
number of iterations. During classification, PSPP will continue
iterating until until MAX_ITER
iterations have been done or the
convergence criterion (see below) is fulfilled. The default value of
MAX_ITER is 2.
If however, you specify /CRITERIA=NOUPDATE
then after selecting the
initial centers, no further update to the cluster centers is done. In
this case, MAX_ITER
, if specified, is ignored.
The subcommand /CRITERIA=CONVERGE(EPSILON)
is used to set the
convergence criterion. The value of convergence criterion is
EPSILON
times the minimum distance between the initial cluster
centers. Iteration stops when the mean cluster distance between one
iteration and the next is less than the convergence criterion. The
default value of EPSILON
is zero.
The MISSING
subcommand determines the handling of missing
variables. If INCLUDE
is set, then user-missing values are considered
at their face value and not as missing values. If EXCLUDE
is set,
which is the default, user-missing values are excluded as well as
system-missing values.
If LISTWISE
is set, then the entire case is excluded from the
analysis whenever any of the clustering variables contains a missing
value. If PAIRWISE
is set, then a case is considered missing only if
all the clustering variables contain missing values. Otherwise it is
clustered on the basis of the non-missing values. The default is
LISTWISE
.
The PRINT
subcommand requests additional output to be printed. If
INITIAL
is set, then the initial cluster memberships will be printed.
If CLUSTER
is set, the cluster memberships of the individual cases are
displayed (potentially generating lengthy output).
You can specify the subcommand SAVE
to ask that each case's cluster
membership and the euclidean distance between the case and its cluster
center be saved to a new variable in the active dataset. To save the
cluster membership use the CLUSTER
keyword and to save the distance
use the DISTANCE
keyword. Each keyword may optionally be followed by
a variable name in parentheses to specify the new variable which is to
contain the saved parameter. If no variable name is specified, then
PSPP will create one.