The envelope minimization problem for a term-by-document (or hypertext) matrix can be formulated and solved in purely symbolic terms by reordering vertices in a suitable graph representation of the matrix. The graph methods we describe in this section are based on reorderings for sparse symmetric matrices for Cholesky factorization.
Perhaps the most widely used envelope minimization method for symmetric
sparse matrices
is the Reverse Cuthill-McKee (RCM) method of Alan George [Geo71] which is
applied to the graph of the matrix.
For an
symmetric matrix B, the graph
is undirected with n vertices each corresponding to
a row or column and edges corresponding to each nonzero, i.e.
iff
. The RCM method generates
a new labeling or ordering of the rows and columns of B. Observe that
if
,
, row u has been labeled, but
rows v and z have not, then, to
minimize the bandwidth of row u, v should be numbered as soon as possible. Furthermore,
to minimize the bandwidth of row z, z should also be numbered as soon as possible
after u and v. In terms of
, notice that z is adjacent to v which
is in turn adjacent to u.
The RCM method makes use of this observation. The main
step involves a modified breadth first search (level search) from a
designated starting vertex; the modification to breadth first search is
that neighbors of a given vertex are explored in increasing order of
degree. The RCM
numbering is obtained by reversing the breadth first search numbering,
i.e., if vertex u is the i-th vertex to be explored then its RCM labeling
is n-i+1. This reversal was shown to produce a better envelope
[LS76]. The choice of the starting vertex is very significant and
a peripheral vertex is desired. The implementation of
of RCM [GL81] uses an approximation to a peripheral vertex by
choosing a vertex of high eccentricity, i.e., a
vertex whose distance to some other vertex in the graph is
close to the maximum distance between any two vertices in the graph.
For nonsymmetric overdetermined hypertext matrices, bipartite graphs
provide a natural extension of the graph model for symmetric matrices.
For the
hypertext matrix A, the associated undirected bipartite graph is
denoted by
and has m row vertices and n column vertices. The row
vertices are labeled
and the column vertices
are labeled
.
The graph has an edge
between row vertex
and column
vertex c for each
.
To compute reorderings of A we apply RCM to H but we maintain
two distinct numbering sequences during modified breadth first search: one for
the row vertices and another for the column vertices. We obtain
the final reordering by reversing each of these sequences.
For example, if
is a row (column) vertex
numbered
(l) during the search, then it is given the final
number
(n-l+1).
Figure 2 illustrates the main step in RCM for
the
term-by-document matrix from Section 2.2 and
Table 5 shows the reordered matrix.
example;
the search number, and the final RCM number are shown in parentheses
for row vertices.
term-by-document
matrix of the technical memoranda titles using RCM on a bipartite graph
representation.The complexity of the RCM for ordering H is proportional to the product of the maximum degree of any vertex in H and the total number of edges (nonzeroes in the matrix A). For hypertext matrices with small maximum degree, the method would be extremely fast. The strength of the method is its low time complexity but it does suffer from certain drawbacks. The heuristic for finding the starting vertex is influenced by the initial numbering of vertices and so the quality of the reordering can vary slightly for the same problem for different initial numberings. Next, the overall method does not accommodate dense rows (e.g., a common link used in every document), and if a row has a significantly large number of nonzeroes it might be best to process it separately; i.e., extract the dense rows, reorder the remaining matrix and augment it by the dense rows (or common links) numbered last.