LSI++ is written in C++, a language that provides the ability to encapsulate data and functions while still providing the efficiency and maintainability that are essential to a large software system.
The lsi class and its derived classes represent the Application Programming Interface for the LSI++ search engine. It serves as a base class from which more specific, architecture dependent classes are derived. It allows applications to compose a query, perform the search, and set various attributes that influence the search (for example, the number of terms or documents to return, the number of factors to use in the query, etc.). In addition, it stores the terms and documents that were used to compose the query, the terms in the query that remain after the common words and words not in the document collection are removed, and the total number of terms and documents in the collection. By accessing only the lsi class and its derived classes, the application programmer is able to pose a query to the search engine and receive the results of the query without knowing the underlying LSI algorithms or implementations.
Both the lsiSerial class and the lsiParallel class are derived from the lsi base class. Both classes provide the same basic services, and both support the API described in the lsi base class. The lsiSerial class is designed for applications that will execute on a single machine. To minimize memory requirements, the term vectors, document vectors, and singular values are incrementally loaded from secondary storage as they are needed. The lsiParallel class is designed for applications that require exceptionally fast query matching. It can use any number of processing elements, and it is written to use the Message Passing Interface (MPI) [22], making it highly portable. Since the time required to start a parallel application tends to be large [18], an application using the lsiParallel class will probably be started as a continuously-running backend server, with one or more clients passing queries to it and receiving the results.
The GNU version of C++, g++, includes an excellent library [12] of low-level classes to aid the programmer. The library, called libg++, contains classes that allow easy and intuitive string manipulation (with regular expression matching and extraction), input/output, and data storage (such as sets, bags, and associative arrays), among other data structures. LSI++ uses libg++ extensively for low-level data manipulation and storage, although it is not dependent on the features of libg++. Any library implementing basic data structures will satisfy the demands of LSI++. See [18] for details on the design and use of all the classes listed in Figure 3.