Four LSI search engines were tested to determine their relative execution efficiency while processing queries. Each search engine received its input from CGI and wrote an HTML document as output. The LSI search engines tested included:
To measure the performance of lsiFinder, lsiQuery, lsiBackend, and plsiBackend, twenty-five random queries were selected for each of four document collections (see Table 1 for a description of each document collection).
Each query consisted of one to five terms known to be in the document collection and zero to four terms not in the document collection. In addition, approximately half of the queries were relevance feedback queries, with one to five documents added to the query. Each set of twenty-five queries was posed to the search engines twice, once to find related terms and again to find related documents. Table 2 summarizes the queries selected for each document collection.
The total wall-clock time for each search was recorded by the search engine. Each search engine used the gettimeofday() system call to record the time the search began and the time the search finished. The total wall-clock time was found by subtracting the starting time from the ending time. Since Perl lacks adequate timers, lsiFinder determined the starting and ending times by calling an external program that reported the results of a call to gettimeofday().
Due to the configuration of the machines on which the performance timings were taken, an HTTP server was not available to pass queries to the search engines. In order to perform the tests, the HTTP server was simulated by setting the environment variables required by CGI and passing the queries to each search engine as a CGI-encoded string. Thus, the time required by the HTTP server to receive a query from a remote browser and spawn either lsiQuery or lsiRemote is not included in the timings given in the following sections.