Next: Reordering Techniques
Up: Background
Previous: Singular Value Decomposition
For purposes of comparing the reordering schemes discussed
in the next section, consider
the small database of Bellcore technical memoranda
first presented in [DDF+90]. In Table 1,
a total of nine titles of technical memoranda with five of
them ( c1- c5) related to human-computer interaction and
four of them ( m1- m4) related to graph theory.
All the bold-faced
words in Table 1 denote keywords
which are used as referents to the titles. The parsing
rule used for this sample database required that keywords appear
in more than one title. Of course, alternative parsing
strategies can increase or decrease the number of indexing keywords (or terms).
Table 1: Database of titles from Bellcore technical
memoranda. Bold-faced keywords appear in more than one
title.
Table 2: The
term-by-document
matrix corresponding to the technical memoranda titles in Table
2 .
Corresponding to the text in Table 1 is the
term-by-document matrix shown in Table 2.
The elements of this matrix are the frequencies in which a term occurs in
a document or title. For example,
in title c5, the fifth column of the term-by-document matrix,
response, time, and user
all occur once. For simplicity, term weighting was not used to construct
this sample matrix.
Michael W. Berry (berry@cs.utk.edu)
Mon Jan 29 14:30:24 EST 1996