As indicated by values in Tables 8 through
10,
, , and
are very large for the
hypertext matrices in their natural ordering.
Values in Table 8 show that
is
substantially reduced for all orderings with the largest
envelope
reduction obtained by the Fiedler vector approach discussed in
Section 3.4. With respect to , however,
the RCM ordering achieves the greatest bandwidth reduction
(see Table 9). Notice that for some matrices
( CCE-A, NHSE), the
value of
shown in Table 10 is
significantly lower than that of . This is due to the
clustering of nonzeros in a narrow band but the band itself
is significantly displaced from the diagonal. Also,
the orderings using Correspondence Analysis without
distances
(see Section 3.5) tend to produce
's more similar to those
of RCM than the Fiedler approach.
Table 11 illustrates the effects of choosing
different pairs of principal axes for Correspondence
Analysis with
distances (CACS). Since
the 8-largest generalized singular values
(see Equation 9) for these matrices
were all approximately equal to 1 (i.e., form
a cluster of generalized singular values near 1),
it is not clear which pairs
of principal axes best explains the variation in Equation
(11). By selecting the 10-th pair (or 10-th
largest) of principal axes, a reduction in
of
43% and 17% can be obtained for MAN1 and
NHSE, respectively. CACS(10) also achieves an average
reduction of 25% for and 29% for
for
these matrices.
) of the hypertext
matrices after reorderings; CACS(2) denotes the
use of Correspondence Analysis using
distances and the second principal axes, and
CANC(1) denotes the use of Correspondence Analysis with no
distances and the first principal axes.
distances and the second principal axes, and
CANC(1) denotes the use of Correspondence Analysis with no
distances and the first principal axes.
) of the hypertext
matrices after reorderings; CACS(2) denotes the
use of Correspondence Analysis (CA) using
distances and the second principal axes, and
CANC(1) denotes the use of CA with no
distances and the first principal axes.
distances. CACS(2) and CACS(10) denote
the use of the second and tenth principal axes, respectively.
Figures 3 through 5
illustrate the reorderings obtained for a few of the sample
hypertext matrices discussed in Section 3.2. Of particular
interest is the similarity of reorderings produced by the pairs
(RCM, CANC(1)) and (Fiedler, CACS(2)) for the MAN1
and MAN2 matrices. For the other two matrices
( CCE-A, NHSE) whose average number of documents
per link (see
from Table 3) is smaller,
the similarities were not as prominent. As shown
in Figure 4, the near-diagonal clusterings
obtained are quite different. In particular, CANC(1) produces
a definite block-diagonal pattern of nonzeros but at the
expense of the largest bandwidths ( and
)
among all reorderings of the four test matrices (see Tables
9 and 10). The reduction in bandwidth
achieved by CACS(10), i.e., using the 10-th largest pair of
principal axes or left and right generalized singular vectors of
A in Equation (9), is illustrated in Figure
5 for the MAN1 matrix.
The proper selection of principal axes for Correspondence
Analysis with
distances when the hypertext matrix
A has clustered generalized singular values (
's
from Equation (11)) is problematic. Nevertheless,
an iterative procedure which cycles through a subset of the largest
generalized singular vectors of A as computed by a Lanczos
or block Lanczos SVD method (see [B+93]) is
plausible. Future research in the use of principal axes for bandwidth
reduction in the presence of clustered spectra is warranted.