As each row of the map is traversed, the value of the current pixel
is compared to the value of the previous pixel in that row (west neighbor)
and the pixel value in the same column of the previous row (north
neighbor). Abiding by the NEWS neighborhood rule, if all pixels are in the same cluster,
the smallest cluster label is assigned to all three pixels and any changes in the
status of the west and north neighbors are recorded in the working arrays.
The smallest of the two values is chosen to reduce the memory requirements
of the csize array where the maximum size is a function of the density
of the map and the pixel clustering patterns. For the best use of memory
by the csize array, the best case scenario is a map
entirely of 1's or 0's which would contain only 1 cluster of size
.
The csize array size would be 1. Conversely, the worst case scenario is a
checkerboard pattern, alternating 1's and 0's throughout the map, resulting in
clusters each containing only 1 pixel. The csize array
size would be
. When the algorithm completes, each index in the csize
array holds one of three values: a zero, the number of pixels in the cluster
identified by the csize index value, or a pointer to another csize index.
The last two values are differentiated by the arithmetic sign of the
value---positive represents the pixel count in the indexed cluster
whereas negative represents a pointer to another csize array index.
De-referencing the pointer, which could follow a non-circular, multi-step path,
is accomplished by indexing into the csize array using the absolute value
of the pointer.
The HK algorithm does not guarantee a properly labeled map; however, relabeling,
or adjusting pixel values to accurately reflect cluster membership can
be accomplished in linear time using the working array produced by
the HK algorithm. Figure 6 illustrates the results of the working array
csize after merging the temporary cluster labeled 2 from the top row into the
temporary cluster labeled 1.
Before the merge, temporary cluster 1 contained 9 pixels, and temporary cluster 2 contained 2 pixels. After the addition of the current pixel (1 pixel) and the pixels from temporary cluster 2 (2 pixels), temporary cluster 1 now contains 12 pixels (9 + 2 + 1), and temporary cluster 2 shows a pointer to temporary cluster 1 in the csize working array. It is possible to process extremely large maps in linear time using a divide-and-conquer approach to cluster identification because at a point in time, the algorithm only requires access to the working array csize, the current pixel, the north neighboring pixel, and the pixel value of the west neighbor. In this work, both an improved sequential implementation of the HK algorithm and a parallel implementation on the Thinking Machines CM-5 are presented.