CS 494/594 Final Exam, due Monday, December 11, 1995 at 5:00pm


Directions: This examination is a take-home exam. You should not discuss or work on this exam with other class members. To receive credit, you must show all your work. If you find it necessary to make assumptions, please explicitly state them. Before submitting your completed exam to either Dr. Berry or Dr. Browne by 5:00pm on December 11, you should write the disclaimer below at the top of your first page and provide your signature afterwards. Failure to acknowledge the originality of the answers on your exam may constitute a substantial reduction of your score.


Disclaimer: I have neither received/given assistance from/to any other CS 494/594 student on this final examination. Signed _____________________

  1. [20Pts]
    (a) Experiment with some WAIS systems that combine ranking with Boolean operators and explain, using some examples, how they evaluate a query that contains both. (See Homework 2 for some example WAIS systems).
    (b) Compare and contrast advantages, disadvantages, and limitations of the WAIS technique for combining ranking and Boolean operations with the MMM model described in Chapter 15 of your textbook.

  2. [60Pts] Consider the problem of routing user queries to a distributed set of document collections. Assume you receive the following information from each collection nightly: Once the initial information has been received for a collection, only changes are sent in updates. The term frequency information should be used to determine a weight w_ij for term i in collection j. The weights should have both local and global components. You need not use all of the available information.
    (a) Choose and explain a weighting scheme, explaining why you chose a particular strategy.
    (b) Describe what data structures you would use for an inverted index and how you would update them efficiently.
    (c) Given a user query, explain how you would compute a ranked list of collections for that query.
    (d) Assume expert relevance judgements have been obtained for a set of sample queries -- i.e., for each of these queries, an expert has determined which collections are relevant and which are not. Explain how you could incorporate this relevance information into your weighting scheme.
    (e) After some number of collections has been selected and searched for a query, and each has returned a list of documents ranked by relevance to the query, explain how to merge the ranked results into a single ranking so that documents from collections previously selected as more relevant are ranked higher.
    (f) Describe how you would evaluate the retrieval effectiveness of your distributed search system.

  3. [20Pts] Consider the following similarity (not dissimilarity) matrix for a collection of six documents. Show the complete link clusters that would be produced at decreasing levels of similarity by the Voorhees algorithm, assuming the document pairs are sorted lexicographically within each similarity level. Show the sorted list of document-document similarities as well.

    2 .5
    3 .8 0
    4 0 .9 0
    5 .7 0 .4 .2
    6 0 .5 .7 .5 0
    1 2 3 4 5


Total points possible on this exam : 100 Points