CS140 -- Lab 3 -- Reading Baseball Scores



Using lynx

Lynx is a text-based web browser, which is nice when you want to pull web pages and write programs that process them. You will be writing such a program for this lab. If you say:
lynx -dump -width 200 URL
Then lynx will download the URL and print it on standard output in a readable form. The "-width 200" flag says to assume that your reading device is at least 200 characters wide. This is nice because sometimes lynx wraps text when you'd rather it not.

Baseball scores on ESPN

ESPN provides web pages that have the scores and other information for Major League baseball games played in recent years. The format of the URL is:

http://sports.espn.go.com/mlb/scoreboard?date=yyyymmdd

where yyyy is the year in question, mm is the month, and dd is the day. Therefore, if you want to see the baseball scores for August 29, 2007, you would point your browser to http://sports.espn.go.com/mlb/scoreboard?date=20070829. Go ahead and look at that page. You'll see that 15 games were played that day. The first game listed is the Toronto-Oakland game, where Oakland won 5 to 4.

Use lynx to grab that page to a local file. You'll need to put the URL in single quotes so that the shell does not try to do something fancy with the question mark:

UNIX> lynx -dump -width 200 'http://sports.espn.go.com/mlb/scoreboard?date=20070829' > score.txt
I've done this -- look at your score.txt and make sure that it looks like this score.txt.

When you look at the file, you'll note that there's a lot of junk in it. In fact, there are 222 lines of stuff before you see your first score. Here is a listing of the first two games in the file -- the listing starts at line 219 of the file. These games are Oakland beating Toronto 5 to 4 and the LA Angels beating Seattle 8 to 2.

   Season:
   [2007]

   - American League -
   - National League -
   Final
   [374]Toronto (67-66, 28-39 away)
   [375]Oakland (66-69, 34-34 home)
   3 4 5 6 7 8 9 10 11
   0 1 1 0 0 0 2 0  0
   2 0 0 0 0 0 0 0  1
   R H E
   4 7 0
   5 9 0
   [376]Hannahan's game-ending single helps A's snap losing streak 
   [377]Box Score | [378]Play-By-Play | [379]Watch
   Balls: Strikes: Outs:
   Pitching: () 0 IP, 0 ER, 0 K
   Batting: () 0-0
   End of the 11th
   TOR: (0-0, 0.00 ERA)
   OAK: (0-0, 0.00 ERA)
   W: [380]Lugo (5-0) L: [381]Frasor (1-4) S: (0)
   HR ? TOR: [382]M. Stairs (17), [383]L. Overbay (9), [384]A. Rios (21), [385]A. Hill (14)
   Final
   [386]LA Angels (79-54, 35-34 away)
   [387]Seattle (73-58, 41-27 home)
   1 2 3 4 5 6 7 8 9
   2 0 0 1 1 0 0 2 2
   0 1 0 1 0 0 0 0 0
   R H  E
   8 17 1
   2 7  0
   [388]Weaver outduels Hernandez and Angels sweep M's
   [389]Box Score | [390]Play-By-Play | [391]Watch
   Balls: Strikes: Outs:
   Pitching: () 0 IP, 0 ER, 0 K
   Batting: () 0-0
   End of the 9th
   LAA: [392]Weaver (10-6, 3.79 ERA)
   SEA: [393]Hernandez (10-7, 4.08 ERA)
   W: [394]Weaver (10-6) L: [395]Hernandez (10-7) S: (0)
   HR ? LAA: [396]V. Guerrero (22), [397]J. Mathis (2)

Here's how you can read this score file. First, there will be no scores until you see a line that contains the string "- American League -", "- National League -", "- Games in AL Stadiums -" or "- Games in NL Stadiums -". Then, you can recognize a game score in the following way:


MLB-Scores

Your job is to write the program MLB-Scores. It will read a game file, which should be the output of lynx on one of these baseball game files. It reads this on standard input. There are no command line parameters.

Now, for each game, MLB-Scores will print out a line in the following format:

Team-1-Name   Team-2-Name        Whether-Team-1-Won    Team-1-Score   Team-2-Score
Specifically:

After that, MLB-Scores should print out a blank line, and then the following information:

  • The average number of total runs scored in a game (that's the average total runs. For example, the first game in our example had 9 runs scored and the second had 10). Print this to two decimal places.
  • The most number of runs scored in a game by any one team.
  • So, for example:

    UNIX> lynx -dump -width 200 'http://sports.espn.go.com/mlb/scoreboard?date=20070829' | MLB-Scores
    Toronto        Oakland         Loss  4  5
    LA Angels      Seattle          Win  8  2
    Boston         NY Yankees      Loss  3  4
    Tampa Bay      Baltimore        Win  5  4
    Minnesota      Cleveland       Loss  3  4
    Detroit        Kansas City     Loss  0  5
    White Sox      Texas           Loss  4  5
    Washington     LA Dodgers      Loss  9 10
    Colorado       San Francisco    Win  8  0
    NY Mets        Philadelphia    Loss  2  3
    Cincinnati     Pittsburgh       Win  8  0
    Atlanta        Florida          Win  7  4
    Milwaukee      Cubs             Win  6  1
    St. Louis      Houston         Loss  0  7
    Arizona        San Diego       Loss  1  3
    
    Average number of runs scored in a game: 8.33
    Maximum number of runs scored by a team: 10
    UNIX> lynx -dump -width 200 'http://sports.espn.go.com/mlb/scoreboard?date=20070612' | MLB-Scores
    Washington     Baltimore        Win  7  4
    Colorado       Boston          Loss  1  2
    Milwaukee      Detroit         Loss  0  4
    Arizona        NY Yankees      Loss  1  4
    San Diego      Tampa Bay       Loss  4 11
    St. Louis      Kansas City     Loss  1  8
    Atlanta        Minnesota       Loss  3  7
    White Sox      Philadelphia    Loss  3  7
    Texas          Pittsburgh      Loss  5  7
    Cleveland      Florida         Loss  0  3
    LA Angels      Cincinnati      Loss  3  5
    Seattle        Cubs             Win  5  3
    Oakland        Houston         Loss  4  5
    NY Mets        LA Dodgers      Loss  1  4
    Toronto        San Francisco   Loss  2  3
    
    Average number of runs scored in a game: 7.80
    Maximum number of runs scored by a team: 11
    
    If you look up that last file, you'll see it had inter-league games, and so had the string "- Games in AL Stadiums -" to delimit the start of the scores.

    Assumptions


    Hints

    The best way to structure things is to have an integer called state, that starts with a value of zero. Then the main loop of your program can look like:
       while (get_line(is) >= 0) {
         
          if (state == 0) {
             /* Look for "- American League -", etc and set the state to 1 
              if you see it */
          } else if (state == 1) {
            if (is->NF == 1 && strcmp("Final", is->fields[0]) == 0) {
                 ....
                 ....
          }
    
    You get the picture. Since I have limited the number of total games, you never need to call malloc() in this program. I called strdup() but you don't have to.

    In my program, I used strcpy(), strcmp(), strchr() and strstr(), and I used both the fields and text1 fields of the IS struct.