next up previous
Next: Weighting Up: Latent Semantic Indexing Previous: Latent Semantic Indexing

2.2.1 Term-Document Representation

 

In the LSI model, terms and documents are represented by an incidence matrix A. Each of the m unique terms in the document collection are assigned a row in the matrix, while each of the n documents in the collection are assigned a column in the matrix. A non-zero element , where

indicates not only that term i occurs in document j, but also the number of times the term appears in that document. Since the number of terms in a given document is typically far less than the number of terms in the entire document collection, A is usually very sparse [5].



Michael W. Berry (berry@cs.utk.edu)
Tue Jul 23 08:47:48 EDT 1996