next up previous
Next: Query Projection and Up: Latent Semantic Indexing Previous: Weighting

2.2.3 Computing the SVD

 

Once the matrix A has been created and properly weighted, a rank-k approximation () to A, , is computed using an orthogonal decomposition known as the singular value decomposition (SVD) [14]. The SVD of the matrix A is defined as the product of three matrices,

where the columns of U and V are the left and right singular vectors, respectively, corresponding to the monotonically decreasing (in value) diagonal elements of which are called the singular values of the matrix A. As illustrated in Figure 1, the first k columns of the U and V matrices and the first (largest) k singular values of A are used to construct a rank-k approximation to A via . The columns of U and V are orthogonal, such that , where r is the rank of the matrix A. A theorem due to Eckart and Young [15] suggests that , constructed from the k-largest singular triplets of A (a singular value and its corresponding left and right singular vectors are referred to as a singular triplet), is the closest rank-k approximation (in the least squares sense) to A [5].

 Figure 1

With regard to LSI, is the closest k-dimensional approximation to the original term-document space represented by the incidence matrix A. As stated previously, by reducing the dimensionality of A, much of the ``noise'' that causes poor retrieval performance is thought to be eliminated. Thus, although a high-dimensional representation appears to be required for good retrieval performance, care must be taken to not reconstruct A. If A is nearly reconstructed, the noise caused by variability of word choice and terms that span or nearly span the document collection won't be eliminated, resulting in poor retrieval performance [5].

In LSI, the left and right singular vectors specify the locations of the terms and documents, respectively, in the reduced term-document space. The singular values are often used to scale the term and document vectors, allowing clusters of terms and documents to be more readily identified. Within the reduced space, semantically-related terms and documents presumably lie near each other since the SVD attempts to derive the underlying, semantic structure of the term-document space [5].



next up previous
Next: Query Projection and Up: Latent Semantic Indexing Previous: Weighting



Michael W. Berry (berry@cs.utk.edu)
Tue Jul 23 08:47:48 EDT 1996