Next: Introduction
Sparse Matrix Reordering Schemes for Browsing Hypertext
Michael W. Berry
Department of Computer Science,
107 Ayres Hall,
University of Tennessee,
Knoxville, TN 37996-1301,
berry@cs.utk.edu
Bruce Hendrickson
Sandia National Laboratories,
M/S 1110,
Albuquerque, NM 87185-1110,
bah@cs.sandia.gov
Padma Raghavan
Department of Computer Science,
107 Ayres Hall,
University of Tennessee,
Knoxville, TN 37996-1301,
padma@cs.utk.edu
Proceedings of the AMS-SIAM Summer Seminar on Mathematics of Numerical
Analysis: Real Number Algorithms, Park City, UT, July 17 - August
11, 1995. Published in
Lectures in Applied Mathematics (LAM)
Vol. 32: The Mathematics of Numerical
Analysis, J. Renegar, M. Shub, and S. Smale (eds.),
American Mathematical Society, (1996), pp. 99-123.
Abstract
Many approaches for retrieving documents
from electronic databases depend on the literal matching of
words in a user's query to the keywords defining
database objects. Since there is great diversity in
the words people use to describe the same object, literal- or lexical-
based methods can often retrieve irrelevant documents. Another approach
to exploit the implicit higher-order structure
in the association of terms with text objects is to compute the
singular value decomposition (SVD)
of large sparse term by text-object matrices. Latent Semantic
Indexing (LSI) is a conceptual indexing method which employs
the SVD to represent terms and objects by dominant singular
subspaces so that user queries can be matched in a lower-rank
semantic space. This paper considers a third, intermediate approach
to facilitate the immediate detection of document (or term) clusters.
We demonstrate that both traditional sparse matrix reordering
schemes (e.g., Reverse Cuthill-McKee) and spectral-based
approaches (e.g., Correspondence Analysis or Fiedler vector-based
spectral bisection) can be used to permute original
term by document (hypertext) matrices
to a narrow-banded form suitable for the detection
of document (or term) clusters. Although this approach would
not exploit the higher-order semantic structure in the database,
it can be used to develop browsing tools for hypertext and
on-line information at a reduced computational cost.
The first author's research was supported by the
National Science Foundation under grant Nos. NSF-CDA-91-15428 and
NSF-ASC-92-03004.
The second author's research was supported by the
U.S. Dept. of Energy, Office of Energy Research
(Mathematical Sciences program) under contract No. DE-AC04-76DP00789.
The third author's research was supported by the
National Science Foundation under grant No. NSF-ASC-94-11394, and by
the Advanced Research Projects Agency under grant Army-DAAL03-91-C-0047.
Michael W. Berry (berry@cs.utk.edu)
Mon Jan 29 14:30:24 EST 1996