Term-document matrices are sparse, nonsymmetric and typically, highly overdetermined. As mentioned above, it is desirable that these matrices be reordered so that nonzeroes are clustered evenly about a line from the upper left to the lower right corner of the matrix. This line, though visually a diagonal, is not the conventional diagonal of a nonsquare matrix. We define metrics suitable for evaluating reorderings by adapting some well established metrics used in symmetric matrix computations.
The bandwidth () and envelope size (
) are two measures
used in the context of symmetric sparse matrices reordered to a band form.
Let C be an
symmetric matrix;
, the bandwidth of row i, is the
difference between i (the row number) and the smallest column number j
such that
. Let
be the maximum of bandwidth values
over all rows, i.e.,
.
The bandwidth is defined as
.
We chose this definition over other alternatives
(such as defining the bandwidth as
)
because its natural extension to overdetermined term-document matrices
seems more suitable as a metric for evaluating reorderings.
The envelope of C incorporates the variation in
bandwidth over all rows. The envelope size
is defined by

Observe that these terms capture the distance from the diagonal to the farthest nonzero on each row of the matrix.
To extend these definitions to evaluate reorderings of
a nonsymmetric
term-document matrix
A, we consider the straight line from the upper left
(row 1, column 1) to the lower right (row m, column n) corner.
The equation to this visual diagonal line can be easily computed;
the diagonal subscript
of a row
is defined as the abscissa
obtained using
as the ordinate value in this equation.
To account for nonsymmetry, we define
, the bandwidth of row i, as the
largest difference between
and any column number j
such that
.
The bandwidth and the envelope size
are as defined
as earlier but using the the new definition of
.
Defining as twice the maximum over row bandwidths
measures how evenly the nonzero clusters are
centered about the diagonal.
We also provide values of
, a quantity which measures the size of
the nonzero band but unlike does not take into account
the displacement from the diagonal.
Let
be the difference
between the largest and smallest nonzero subscript in row
i of the matrix. Define
as the maximum of
over all rows; now
differs from in
not being relative to the diagonal. However, the sum
of
over all rows is still the same
as
which was defined earlier in terms of
.