next up previous
Next: Relevance Feedback Up: Latent Semantic Indexing Previous: Computing the SVD

2.2.4 Query Projection and Matching

 

In the LSI model, queries are formed into pseudo-documents that specify the location of the query in the reduced term-document space. Given q, a vector whose non-zero elements contain the weighted (using the same local and global weighting schemes applied to the document collection being searched; see Section 2.2.2) term-frequency counts of the terms that appear in the query, the pseudo-document, , can be represented by

Thus, the pseudo-document consists of the sum of the term vectors () corresponding to the terms specified in the query scaled by the inverse of the singular values (). The singular values are used to individually weight each dimension of the term-document space [5].

Once the query is projected into the term-document space, one of several similarity measures can be applied to compare the position of the pseudo-document to the positions of the terms or documents in the reduced term-document space. One popular similarity measure, the cosine similarity measure, is often used because, by only finding the angle between the pseudo-document and the terms or documents in the reduced space, the lengths of the documents, which can affect the distance between the pseudo-document and the documents in the space, are normalized. Once the similarities between the pseudo-document and all the terms and documents in the space have been computed, the terms or documents are ranked according to the results of the similarity measure, and the highest-ranking terms or documents, or all the terms and documents exceeding some threshold value, are returned to the user [5].



Michael W. Berry (berry@cs.utk.edu)
Tue Jul 23 08:47:48 EDT 1996