CS140 -- Lab 8


Thu Oct 11 11:40:12 EDT 2007
This is a lab for you to get practice with hashing. As usual, there is a working executable in the lab directory.

Scoreproc

This is a program that helps you process score files. The syntax is:
scoreproc hash-table-size composite-file score-files
A score file is a file where each line is either blank (in which case it should be ignored) or it has a name and a score on it. The name can be multiple words with any amount of white space between them. You should convert all names to strings with just one space between each word. The last word on each line is a score, which is a floating point number (as always, use a double and error check if the last word is not a number).

Now, scoreproc takes a list of score files on the command line. It reads each score in every file, and for each name, it computes the average score for that name. In other words, a name can have multiple entries in a score file, and different score files can have different scores with the same name.

For example, the files sc-ex-1.txt and sc-ex-2.txt are two simple score files (quarterback passing ratings from 2007 and 2006):

UNIX> cat sc-ex-1.txt
Tom Brady      128.7
Jake Delhomme      111.9
Peyton Manning      108.6
Jeff Garcia      103.6
Kurt Warner      102.3
UNIX> cat sc-ex-2.txt
Peyton Manning 101.0
Damon Huard 98.0
Drew Brees 96.2
Jeff Garcia 95.8
Donovan McNabb 95.5
UNIX>
If we call scoreproc with both files as command line arguments, the program will keep track of eight names:

Now, first, scoreproc prints how many names it found, and then it prints out the minimum and maximum average scores along with all names that have those averages. Note, more than one name can have the same average. Also, if there is more than one name for the minimum or maximum, the names may be printed in any order (you'll see that below with the minimum average):

UNIX> scoreproc 100 composite.txt sc-ex-1.txt sc-ex-2.txt
# of Names:         8
Minimum average:   95.500
  Drew Brees
  Donovan McNabb
Maximum average:  128.700
  Tom Brady

...
At this point, scoreproc creates the composite-file, and prints all the names into it, in the following format:
average nscores name
Average should be printed with "%8.3lf", and nscores should be printed with "%8.0lf". The names may be printed in any order.

Here is composite.txt in the above example. The ordering of the names has no significance.

 104.800        2 Peyton Manning
  99.700        2 Jeff Garcia
 111.900        1 Jake Delhomme
 102.300        1 Kurt Warner
  95.500        1 Drew Brees
  98.000        1 Damon Huard
  95.500        1 Donovan McNabb
 128.700        1 Tom Brady

Then it asks the user to enter a name, and it gives the number of scores plus the average score for that name (padded to three decimal places). If the name wasn't specified in the any of the score files, then it says that the name isn't found. It repeats this until the user closes standard input, at which point scoreproc exits:

UNIX> scoreproc 100 composite.txt sc-ex-1.txt sc-ex-2.txt
# of Names:         8
Minimum average:   95.500
  Drew Brees
  Donovan McNabb
Maximum average:  128.700
  Tom Brady

Enter name: Peyton Manning
  Peyton Manning: Avg: 104.800   # Scores: 2
Enter name: Tom Brady
  Tom Brady: Avg: 128.700   # Scores: 1
Enter name: Dr. Plank
  Dr. Plank not found
Enter name:  <CNTL-D>
UNIX> cat composite.txt
 104.800        2 Peyton Manning
  99.700        2 Jeff Garcia
 111.900        1 Jake Delhomme
 102.300        1 Kurt Warner
  95.500        1 Drew Brees
  98.000        1 Damon Huard
  95.500        1 Donovan McNabb
 128.700        1 Tom Brady
UNIX> 

You should use a hash table with separate chaining to keep track of the names/scores. See the hints file for help in laying out all your data structures.

For the hashing function, you should use hash4, hash6 or hash8 from the hashing lecture notes.


I have some fun input files in these directories:

UNIX> scoreproc 200 composite.txt ANTM/*
# of Names:       100
Minimum average:    2.600
  Yoanna-2
Maximum average:   14.000
  Brita-4

Enter name: Jaslene-8
  Jaslene-8: Avg: 3.250   # Scores: 12
Enter name:  <CNTL-D>
UNIX> sort -n composite.txt | head -10
   2.600       10 Yoanna-2
   2.700       10 Mercedes-2
   3.000        8 Elyse-1
   3.000        9 Adrianne-1
   3.000        9 Shannon-1
   3.000       12 Eva-3
   3.000       12 Nik-5
   3.083       12 Danielle-6
   3.083       12 Joanie-6
   3.167       12 CariDee-7
UNIX> 

UNIX> scoreproc 50 composite.txt 2007-MLB/*
# of Names:        30
Minimum average:    4.122
  Washington
Maximum average:    6.006
  NY Yankees

Enter name: Philadelphia
  Philadelphia: Avg: 5.506   # Scores: 164
Enter name: Colorado
  Colorado: Avg: 5.289   # Scores: 166
Enter name:  <CNTL-D>
UNIX> 

UNIX> scoreproc 50 composite.txt 2007-Super-14/* < /dev/null
# of Names:        14
Minimum average:   13.462
  Lions
Maximum average:   29.692
  Crusaders

Enter name: UNIX> sort -nr composite.txt
  29.692       13 Crusaders
  29.538       13 Bulls
  28.692       13 Chiefs
  27.308       13 Blues
  26.385       13 Sharks
  21.077       13 Force
  20.692       13 Cheetahs
  20.308       13 Waratahs
  19.154       13 Stormers
  18.846       13 Hurricanes
  18.385       13 Highlanders
  18.000       13 Brumbies
  15.538       13 Reds
  13.462       13 Lions
UNIX> 

UNIX> scoreproc 1000 composite.txt 2007-Knox-Summer/* 
# of Names:       167
Minimum average:   31.830
  Rae Parker - Brenda Nichols
Maximum average:   65.830
  Robert Heller - Brooks McNeely

Enter name:  <CNTL-D>
UNIX> wc composite.txt
     167    1195    8040 composite.txt
UNIX> sort -nr composite.txt | cat -n | grep Plank
    12    59.545        2 Susan Plank - James Plank
UNIX> 

Note, your composite score file can be in any order (it depends on your hash function and the hash table size).