| Purpose: | To be exposed to real-world uses of Perl and to explore the client-server model of communication via Common Gateway Interface (CGI) scripts. |
| Available: | Tuesday, September 5, 1995 |
| Due: | Wednesday, September 13, 1995 at 11:59 pm |
One of the most important aspects of an information retrieval system (from the user's perspective, anyway) is the interface. The interface is used to hide the details of the query engine and make it easier for the user to enter queries and interpret the results of the query. However, graphical interfaces are inherently difficult to write. In this class, we are less concerned with the interface than the algorithms that perform the queries themselves. On the other hand, every query engine we write will need some sort of interface. WWW browsers such as Netscape provide a convenient, easy way to produce relatively good interfaces.
The WWW client can collect information from the user and pass it to
the HTTP server. The HTTP server has the ability to execute scripts
and programs that will process the information and send the results
back to the user. The forms processing ability of the server is
called the
Pat Ryan's Introduction to Perl (handed out in lab last Wednesday)
and one of the many Let's look at each part in detail:
Anything on this page that needs to be submitted to the HTTP
server must be enclosed within the
If you would like to test whether your interface is working
correctly, you can use a test server that has been set up
specifically for that purpose. To do this, change your ACTION URL
to http://hoohoo.ncsa.uiuc.edu/htbin-post/post-query.
When you submit your query, it will tell you what your form
actually submitted to the server.
As with all HTML documents you create, you should validate your
page at the
HALSoft HTML Validation Service. Like lab 1, you should set
the level of conformance to "Mozilla". Also, remember to make
query.html world readable. Otherwise, the server will not
be able to access it.
When the server receives a CGI request, it uses the ACTION URL
to determine the name and location of the script that should
be invoked. Once the script is started, it sends the user
input (the keywords that make up the query, in this case)
to the script in one of two ways:
In part 1 of this lab, we specified that you should
use METHOD="POST". Since some flavors of UNIX might
truncate environment variables longer than an arbitrary
length (like 256 characters), POSTing is usually the safest
way to send input to the CGI script or program.
Luckily, though, we don't have to worry about the details
of retrieving user input from the HTTP server (although it
certainly is something of which one ought to be aware). Steven
Brenner has created a library of Perl subroutines that
simplify CGI scripts. You should copy the file
~cs494/public/cgi-lib.pl to your
~/www-home/cgi-bin directory and read the comments
in that file to determine how the library should be used.
Note that the library includes a subroutine called
PrintHeader. PrintHeader specifies
the content-type of the information that will be sent
back to the client. YOU MUST CALL PRINTHEADER
BEFORE YOU ATTEMPT TO SEND ANYTHING BACK TO THE CLIENT.
Otherwise, the client will assume it never received any
worthwhile information and provide an error message stating
it received an empty document.
Since cgi-lib.pl will be needed to execute
your query engine, you should make it world readable.
To use it in your Perl script, you should include the
line The ReadParse subroutine in cgi-lib
is probably the most important routine. When
ReadParse is called, it accepts the user input
from the HTTP server and places it in the associative
array %in. Thus, after ReadParse is
called, if your script needs the user input associated
with the NAME keywords, you can retrieve it
with $input = $in{keywords}.
The document database you will be searching can be found at
/cloud/homes/cs494/public/lab2-documents.
YOU SHOULD NOT MAKE A COPY OF THIS FILE! Instead, you
should open this file by providing the full path and
filename as listed above. This will allow us to easily
change the document database during grading.
Lab2-documents contains a small collection of
documents separated by newline characters. Each
document except the first and the last consists
of several lines of text surrounded by single newline
characters.
To search the database, you should open the file
and search for the user-specified search terms. You
should not call Unix utilities (for example, grep)
from within your search engine. Instead, you should
use the Perl pattern-matching commands to perform the search.
If the user has supplied more than one term, you should
consider the query to be the logical OR of all the
terms. Thus, the query "africa humans" should find
all documents that contain either the word "africa", the
word "humans", or both. If one of the user-supplied
search terms is found in a document, the term should be
surrounded by
Searching the database can be done in one of many
ways, but you should not load the entire document database
(that is, the entire lab2-documents file) into
memory at one time. You should find a way to incrementally
examine parts of the database (line-by-line, or
document-by-document, for example) that allows the
documents containing the user-supplied search terms to
be sent to the client with the search terms highlighted
without loading the entire document database into memory.
If the user doesn't supply any search terms, the script
should send an error message to the client.
Searching should be case-insensitive. In addition,
you should match full words (that is, when searching for
"Africa," occurrences of "African" should not be highlighted).
You should not make your query engine specific to
the document database in any way. During grading, we will
replace lab2-documents with a different and much
larger collection of documents. By doing this, we will
be able to determine whether your query engine is specific
to the documents in lab2-documents and whether you
are attempting to load the entire document database into memory
at one time.
You should note that the documents in lab2-documents
are not complete HTML documents on their own. Before your
script returns the correct documents to the client, the
script will have to write the header information required
for every HTML document
Ancient bones are the objective evidence of biological history. From my
standpoint as a paleontologist, they are vastly more informative about
extinct creatures than reconstructions or models, in whose creation art
plays at least as great a role as science. Yet I am also a museum
curator, and from that perspective I am keenly aware that nothing brings
the past alive in the public's eye like a well-crafted reconstruction.
For the average person, fossil bones are static things: beautiful or
majestic, perhaps, but hard to imbue with the attributes of a living,
breathing form.
When I was given the responsibility of curating the American Museum of
Natural History's new Hall of Human Biology and Evolution, it was
therefore evident to me and to Willard Whitson, the designer of the hall,
that we needed to include some reconstructions of early humans in the
exhibition. Furthermore, we wanted to portray these figures dynamically in
the context of situations that our ancestors might have faced long ago.
Only thus, we thought, could we truly bring these long-departed relatives
back to some semblance of life. We hoped that clever sculpting and modern
casting materials could provide us with a level of realism rivaling that
of the spectacular dioramas of modern animals in the adjacent galleries.
The objectivity of modern maps of the world is so taken for granted that
they serve as powerful metaphors for other sciences, on occasion even for
scientific objectivity itself. The canonical history of Western
cartography reinforces that assumption of objectivity. The history tells
of a gradual progression from crude Medieval views of the world to
depictions exhibiting contemporary standards of precision. In actuality,
all maps incorporate assumptions and conventions of the society and the
individuals who create them. Such biases seem blatantly obvious when one
looks at ancient maps but usually become transparent when one examines
maps from modern times. Only by being aware of the subjective omissions
and distortions inherent in maps can a user make intelligent sense of the
information they contain.
The putative history of cartography typically begins in earnest at the
time of the Egyptian and Babylonian mapmakers. The scene quickly shifts
to ancient Greek and Roman contributions, followed by an acknowledgment
of those of the Arabs during the Middle Ages. Mapmaking in Medieval
Europe has been long regarded as the nadir of the craft. From the 15th
century forward, cartography smoothly advanced, culminating in present
maps that benefit from sophisticated optics, satellite imaging and digital...
Debugging the CGI part of this lab can be somewhat tricky,
especially when you don't know whether the search engine itself
actually works. Therefore, you should write the search engine
first, and then attempt to make it work with CGI. Since the server
doesn't return any error messages to the client, it is often
difficult to determine why the search engine failed. To aid in
debugging, we can circumvent both the client and the server so we
can actually see the error messages that are produced. To do
this, set the following environment variables in the window in
which you would like to run the Perl script:
In addition, when you are not actually working on the script and form
themselves, you should read-protect them so others won't be tempted
to surf through directories to find a project that has already been
completed.
Expect to spend at least a few frustrating hours doing this lab. CGI
scripts are notoriously difficult to debug since the WWW browser
does not report errors in CGI scripts (well, this isn't quite true -
a log file for http errors is kept by the server, but mere mortals do not
have read access to this file on our system). Because of this, you should
start early.
Example 1: Suppose the user types the following query...


Example 2: On the other hand, if the user types the following...

Query Results
Query String:
ancient history
Evolution Comes to Life - SCIENCE IN PICTURES (August 1992)
by Ian Tattersall
======================
The Power of Maps - SCIENCE IN PICTURES (May 1993)
by Denis Wood
======================
Hints/tips in case you have trouble:
Debugging tips:
Suppose you named the text box on your interface (the HTML form
we named query.html earlier in the lab) "keywords". Then,
we can simulate the server by typing the following on the Unix command
line:
Lab2.cgi should act exactly as though the user had
typed "africa" in the text box on your interface and submitted the form.
Other resources that might be of interest:
A note about ethics: Yes, scripts and forms that do
everything in this lab can be found on the web. You may use any
references you desire for this lab (except other people's projects),
but you are here to learn. Part of learning, unfortunately, is struggling
to find solutions to problems you encounter. Please do not cheat
yourself and the rest of the class by relying on web sites that have
already implemented query engines similar to this lab.
Summary:
| Due: | Wednesday, September 13, 1995 at 11:59 pm |
| Deliverables: | By the due date, send the full URL of your validated interface
(the query.html file) to hudgens@cs.utk.edu.
When you send the URL to Watts, please use the subject line and put the full URL, only, at the beginning of the message, before any other words of greeting, or other comments. This will assist with grading and it will help Watts better differentiate between your lab submissions and other email he receives. |
| Grading: |
|
| Points: | This lab is worth 50 points. |