Software

(updated 03/28/07)


See AR Greenhouse for Telcordia Technologies LSI Demo (sh / csh / C)

SVDPACK/SVDPACKC libraries for sparse SVD via Netlib (Fortran 77 or ANSI C)

MC Toolkit and spkmeans (clustering) by Inderjit Dhillon at the Univ. of Texas (C++, Solaris).

LSI/SDD by Jason Dowling (C/C++)

InfoMap Project, Computational Semantics Laboratory at Stanford University

SVDLIBC Revision of Lanczos SVD from SVDPACKC by Doug Rhode (MIT)

SenseClusters Word Sense Discrimination system at Univ. of Minnesota

InfoVis CyberInfrastructure Repository at Indiana University/SLIS (Katy Borner)

Text to Matrix Generator (TMG) by D. Zeimpekis and S. Gallopoulos at Univ. of Patras, Greece (Matlab)

LIBMYSSVD C++ Wrapper for SVDPACKC by Suvrit Sra at Univ. of Texas at Austin

CLUTO Family of Data Clustering Software Tools by the Karypis Lab at the Univ. of Minnesota (Download)

GUP GTP Usability Prototype (developed by F. Brouns, Open University of the Netherlands, 2006) New!


Warning: All requests for software from UTK/CS require complete information on the online forms provided. Failure to provide 1) your complete name, 2) institution name, 3) actual postal address, 4) e-mail address, 5) phone and fax numbers, and 6) paragraph description of the intended use of the requested code constitutes an invalid software request and no further processing will occur.


The software originally described in Todd A. Letsche's MS Thesis (August 1996) entitled Toward Large-Scale Information Retrieval Using Latent Semantic Indexing is available upon request (Click here). This C++ software (referred to as LSI++) was developed at the University of Tennessee (Department of Computer Science) and constitutes a client/server application for document retrieval. This public domain software is provided on a strict at your own risk basis. It can be used with any indexing scheme (especially those employing a vector space model) but those wishing to implement Latent Semantic Indexing (LSI) must be familiar with the LSI document files created by the Telcordia Technologies LSI software.

Please note that the LSI++ software will not construct an index but will facilitate query matching for a previously indexed collection. Users should also be aware of Content Analyst's Patent : Computer information retrieval using latent semantic structure (U. S. Patent No. 4,839,853, June 13, 1989) before initiating any commerical product development based on LSI.

This software has been tested under Solaris 5.5.1 using gcc version 2.7.2.1; © 1998, T.A. Letsche, D.W. Martin, M.K. Hughey, and M.W. Berry, University of Tennessee.


General Text Parser (GTP) is an object-oriented (C++,Java) integrated software package for creating data structures and encodings needed by IR models. Developed by S. Howard, H. Tang, M. Berry, and D. Martin at the University of Tennessee (Department of Computer Science), this software can be used to

  1. parse ASCII files/directories (in a recursive fashion),
  2. provide several term weighting options for both local and global scope,
  3. create sparse term-by-document matrices (compressed column sparse format),
  4. produce vector encodings for both terms and documents in k-dimensional space via matrix decompositions such as the SVD and SDD.
  5. perform query-matching using term and document encodings and return a cosine-ranked list of documents and/or terms.

This software has been tested under

check Debian/Linux 2.6.8-2-686 using gcc version 3.4.4; Minor changes to accommodate ANSI C++ standard (September 2005); parser and query processing modules updated (April 2005); out-of-core SVD capability added (April 2005); new term-ranking and document subset options for query module added (October 2005) © 2001, 2002, 2003, 2004, 2005, M. Berry, S. Howard, H. Tang, D. Martin, J. Giles, K. Heinrich. University of Tennessee.


check Javac, JVM 1.4 available
query processing modules and GUI included (October 2002)
query processing modules updated (January 2003)
GUI modules updated (July 2003) © 2002, 2003, M. Berry, J. Giles, L. Wo, P. Lynn, University of Tennessee.


check Windows XP/SP2 using the MinGW GCC compiler
access to the Windows GDBM port is reguired (March 2007)
Click here for GPL software to access the GDBM port (Sourceforge)
© 2007, M. Berry, S. Howard, H. Tang, and D. Martin, J. Giles, K. Heinrich, Barry Britt, University of Tennessee. New!

Parallel General Text Parser (PGTP) is an enhancement of GTP which employs MPI for the parallel/distributed computation of the SVD required by LSI (no parallel SDD provided in current release). This particular software has been tested under Solaris 5.7 using gcc version 2.8.1 and MPICH verion 1.2.0.
© 2001, D. Martin and M. Berry, University of Tennessee.

Click here for the Horizon Day presentation of GTP at Indiana University on March 10, 2000.