CS 494/594 Final Exam, due Monday, December 11, 1995 at 5:00pm
Directions: This examination is a take-home exam.
You should not discuss or work on this exam with other class members.
To receive credit, you must show all your work. If you find it necessary
to make assumptions, please explicitly state them. Before submitting
your completed exam to either Dr. Berry or Dr. Browne by 5:00pm on December 11,
you should write the disclaimer below at the top of your first page and
provide your signature afterwards. Failure to acknowledge the originality
of the answers on your exam may constitute a substantial reduction of your score.
Disclaimer: I have neither received/given assistance from/to any other CS 494/594
student on this final examination. Signed _____________________
- [20Pts]
- (a) Experiment with some WAIS systems that combine ranking with Boolean
operators and explain, using some examples, how they evaluate a query
that contains both. (See
Homework 2 for some example WAIS systems).
- (b) Compare and contrast advantages, disadvantages, and limitations
of the WAIS technique for combining ranking and Boolean operations
with the MMM model described in Chapter 15 of your textbook.
- [60Pts]
Consider the problem of routing user queries to a distributed set
of document collections. Assume you receive the following information
from each collection nightly:
- number of unique index terms in collection j: n_j
- document frequency of each term i in collection j:
df_ij
- maximum document frequency of any term in collection j:
max_df_j
- total frequency of each term i in collection j:
tf_ij
- maximum total frequency of any term in collection j:
max_tf_j
Once the initial information has been received for a collection,
only changes are sent in updates.
The term frequency information should be used to determine a weight
w_ij for term i in collection j.
The weights should have both local and global components.
You need not use all of the available information.
- (a) Choose and explain a weighting scheme, explaining
why you chose a particular strategy.
- (b) Describe what data structures you would use for an inverted
index and how you would update them efficiently.
- (c) Given a user query, explain how you would compute a
ranked list of collections for that query.
- (d) Assume expert relevance judgements have been obtained
for a set of sample queries -- i.e., for each of these queries,
an expert has determined which collections are relevant
and which are not. Explain how you could incorporate this
relevance information into your weighting scheme.
- (e) After some number of collections has been selected and
searched for a query,
and each has returned a list of documents ranked
by relevance to the query, explain how to merge the
ranked results into a single ranking so that documents from
collections previously selected as more relevant are
ranked higher.
- (f) Describe how you would evaluate the retrieval effectiveness
of your distributed search system.
- [20Pts]
Consider the following similarity (not dissimilarity)
matrix for a collection of six documents. Show the complete link
clusters that would be produced at decreasing levels of similarity
by the Voorhees algorithm, assuming the document pairs are
sorted lexicographically within each similarity level. Show
the sorted list of document-document similarities as well.
| 2 | | .5 |
| 3 | | .8 | 0 |
| 4 | | 0 | .9 | 0 |
| 5 | | .7 | 0 | .4 | .2 |
| 6 | | 0 | .5 | .7 | .5 | 0 |
| | 1 | 2 | 3 | 4 | 5 |
Total points possible on this exam : 100 Points