next up previous
Next: Conclusions Up: Results and Conclusions Previous: Scalability

5.5 Future Work

Although LSI++ and the LSI++ WWW interface provide the flexibility and efficiency needed to search a document collection using LSI, a great deal of work remains to make LSI a viable model for large-scale information retrieval. In particular, the existing tools for preprocessing the document collection are fairly slow and resource-intensive. Currently, each time a document collection changes, the entire preprocessing phase, including computing the SVD of the term-document matrix, must be repeated if the singular vectors are to remain orthogonal, keeping the retrieval performance of LSI from degrading. Rather than processing the entire document collection each time it changes, updating (adding documents to the collection) and downdating (removing documents from the collection) would require less time and memory, especially for rapidly-changing collections. A prototype of an LSI updating algorithm was described in [20], although it was never fully integrated into the LSI system. Downdating the SVD is an interesting problem that has not yet been attempted with LSI. With efficient updating and downdating facilities, LSI will become useful for large, rapidly changing document collections like the World-Wide Web and financial and law databases.



Michael W. Berry (berry@cs.utk.edu)
Tue Jul 23 08:47:48 EDT 1996