The software originally described in Todd A. Letsche's
MS Thesis (August 1996) entitled
Toward Large-Scale Information Retrieval Using Latent
Semantic Indexing is available upon request (Click
here). This C++ software
(referred to as LSI++) was developed at the University of
Tennessee (Department of Computer Science) and constitutes a client/server
application for document retrieval. This public domain software is provided
on a strict at your own risk basis. It can be used
with any indexing scheme (especially those employing a vector
space model) but those wishing to implement Latent Semantic Indexing (LSI)
must be familiar with the LSI document files created by the
Telcordia
Technologies LSI software.
Please note that the LSI++ software will not construct
an index but will facilitate query matching for a previously
indexed collection. Users should also be aware
of Content Analyst's
Patent : Computer information retrieval using latent semantic
structure (U. S. Patent No. 4,839,853, June 13, 1989)
before initiating any commerical product development based on LSI.
This software has been tested under Solaris 5.5.1 using
gcc version 2.7.2.1; © 1998, T.A. Letsche, D.W. Martin,
M.K. Hughey, and M.W. Berry, University of Tennessee.
General Text Parser (GTP) is an object-oriented (C++,Java) integrated software
package for creating data structures and encodings needed by IR models.
Developed by S. Howard, H. Tang, M. Berry, and D. Martin at the University of
Tennessee (Department of Computer Science), this software can be used to
- parse ASCII files/directories (in a recursive fashion),
- provide several term weighting options for both local and global scope,
- create sparse term-by-document matrices (compressed column sparse
format),
- produce vector encodings for both terms and documents in
k-dimensional space via matrix decompositions such as the
SVD and SDD.
- perform query-matching using term and document encodings and
return a cosine-ranked list of documents and/or terms.
This software has been tested under