CS302 Lecture notes -- Topological Sort / Cycle Detection

  • Jim Plank (source code modified by Brad Vander Zanden)
  • Directory: /sunshine/homes/bvz/cs302/notes/GraphIntro
  • Lecture notes: http://www.cs.utk.edu/~bvz/cs302/notes/GraphIntro/index.html
    Here's a little code for topological sort and cycle detection. Before going into them, whenever you are dealing with representing graphs in files, you have to decide how you are going to format them. I like to basically specify vertices with names and then to specify edges between vertices. An example is a file that specifies the prerequisite structure for courses. For example, CS140 is a prerequisite for CS302. In schedule.ts is a file that defines (more or less) the prerequisite structure of all the computer science classes:
    CLASS CS140
      PREREQ CS102
    
    CLASS CS160
      PREREQ CS102
    
    CLASS CS302
      PREREQ CS140
    
    CLASS CS311
      PREREQ MATH300
      PREREQ CS302
    ...
    
    Note, the class names are vertices, and are defined either by a ``CLASS'' line, or by a ``PREREQ'' line (for example, there is no ``CLASS'' line for CS102). The ``PREREQ'' lines also define an edge from the specified class to the most recently defined class. For example, the first ``PREREQ'' line above defines an edge from CS102 to CS140, because CS102 is a prerequisite for CS140.

    Thus, the above file defines a directed graph.

    The first challenge when dealing with a graph is to read it in. The program GraphReader.cpp does just that. First, it defines classes for vertices and edges:

    class Vertex {
    protected:
        string name;
        list<Edge *> edges;
        list<Edge *>::iterator edgesIter;
    public:
      Vertex(string n) {
        name = n;
      }
      string getName() { return name; }
      void firstEdge() { edgesIter = edges.begin(); }
      bool endOfEdges() { return edgesIter == edges.end(); }
      void nextEdge() { edgesIter++; }
      Edge *getEdge() { return *edgesIter; }
    };
    
    class Edge {
    public:
      Edge(Vertex *vtx1, Vertex *vtx2) {
        v1 = vtx1;
        v2 = vtx2;
      }
      Vertex *getVertex1() { return v1; }
      Vertex *getVertex2() { return v2; }
    
    protected:
        Vertex *v1;
        Vertex *v2;
    };
    
    Note, this is an adjacency list representation. Also note that we were forced into a somewhat awkward decision to use our own set of methods for returning edges. The STL design for iterators would force us to expose the adjacency list's representation as a list if we wanted to export an iterator for the edges, because we would have to declare the iterator as being a list iterator. To give us the flexibility to change our edge implementation at a later date, we instead provide our own methods for iterating through the edges.

    Now, a graph is simply a list of vertices. Since we are accessing vertices by name in the specification file, it is actually better to maintain them in a data structure that will provide random access. A hash table is probably best but since hash tables are not part of the STL standard, the STL map class that implements red-black trees is an acceptable data structure for holding the vertices. The map class will allow us to access a vertex by name in log(n) time. (Obviously, n is the number of vertices). A hash table would have been preferable because then the access time would be O(1) time.

    Ok -- here is GraphReader.cpp. Pretty straightforward. When it is done, it prints out all the nodes and their edges.

    main()
    {
      Fields *f;
      map g;
      Vertex *v;
      Vertex *v2;
      Edge *e;
      string s;
    
      v = 0;
    
      f = new Fields();
    
      while (f->get_line() >= 0) {
        if (f->get_NF() > 0) {
          if (f->get_field(0) == "CLASS") {
            if (f->get_NF() != 2) {
              fprintf(stderr, "%d: CLASS name\n", f->get_line_number());
              exit(1);
            } 
            s = f->get_field(1);
    	v = g[s];
            if (v == 0) {
    	  v = new Vertex(s);
              g[s] = v;
            } 
          } else if (f->get_field(0) == "PREREQ") {
            if (f->get_NF() != 2) {
              fprintf(stderr, "%d: PREREQ class\n", f->get_line_number());
              exit(1);
            } 
            if (v == 0) {
              fprintf(stderr, "%d: PREREQ -- no current vertex\n", 
                      f->get_line_number());
              exit(1);
            } 
            s = f->get_field(1);
    	v2 = g[s];
            if (v2 == 0) {
    	  v2 = new Vertex(s);
              g[s] = v2;
            } 
            e = new Edge(v2, v);
            v2->addEdge(e);
        
          } else {
              fprintf(stderr, "%d: lines must be CLASS or PREREQ\n", 
                       f->get_line_number());
              exit(1);
          }
        }
      }
    
      map::iterator vertices;
      for (vertices = g.begin(); vertices != g.end(); vertices++) {
        v = vertices->second;
        printf("Class %s\n", v->getName().c_str());
        for (v->firstEdge(); !v->endOfEdges(); v->nextEdge()) {
          e = v->getEdge();
          printf("   is a prereq for %s\n", e->getVertex2()->getName().c_str());
        }
        printf("\n");
      }
    }
    

    Topological Sort

    The book describes topological sort. Read it. In the example of classes and prerequisites, a topological sort will return a schedule of classes that does not violate the prerequisite structure. As the book says, a simple way to do this is to first find a class with no incoming edges (i.e. no prerequisites). Print that out, and then remove it and its outgoing edges from the graph. Repeat until the graph is empty.

    TS1.cpp does this. First, it adds a field nincident to each vertex. This is the number of prerequisites that the vertex has. This is set when the graph is created from the input file. Then, the routine find_zero_incident() returns a pointer to the RBNode of a vertex with no prerequisites. Note, it does this by traversing the tree.

    Make sure you understand this code. This is very simple graph code. Test it on schedule.ts:

    UNIX> TS1 < schedule.ts
    CS102
    CS140
    CS160
    CS302
    CS340
    CS360
    CS365
    CS530
    CS560
    MATH231
    CS370
    MATH300
    CS311
    CS380
    CS411
    CS580
    UNIX> 
    
    Ok, now an easier way to do topological sort is to use depth-first search and to enumerate nodes in reverse topological order. This code is in TS2.cpp. Again, make sure that you can trace through this. Note that when you run it, the output is different than the output for TS1. Both lists represent topological orders and show that topological orders are not necessarily unique.


    Cycle Detection

    One problem with both TS1,cpp and TS2.cpp is that if you give them an input file with a cycle, such as cycle.ts, then they can't work:
    UNIX> TS1 < cycle.ts
    MATH231
    CS370
    MATH300
    Problems.....
    UNIX> TS2 < cycle.ts
    MATH300
    MATH231
    CS370
    
    The standard way to recognize cycles in a graph is to do a depth-first search, marking vertices along the way. If you hit a vertex that you have already marked, then you have a cycle. This is done in CycleTest1.cpp. The important code is visit(), which does the depth-first search. Note that we allow visited to be one of three values so that we can distinguish between the case where it is being visited, and hence could be in a cycle, and is done being visited, and hence cannot be in a cycle:
    const int NOT_VISITED = 0;
    const int BEING_VISITED = 1;
    const int DONE_VISITED = 2;
    
      void Vertex::visit()
      {
        Edge *e;
        Vertex *v2;
    
        if (visited == BEING_VISITED) {
          printf("Cycle detected\n");
          exit(1);
        }
        if (visited == DONE_VISITED) return;
    
        visited = BEING_VISITED;
        for (edges.first(); !edges.endOfList(); edges.next()) {
          e = edges.get();
          e->getVertex2()->visit();
        }
        visited = DONE_VISITED;
        return;
      }
    };
    
    Finally CycleTest2.cpp actually prints out the cycle that is detected. This is done by a simple modification to visit():
      Vertex *Vertex::visit()
      {
        Edge *e;
        Vertex *v2;
    
        if (visited == BEING_VISITED) {
    -->      printf("Cycle: %s", name.c_str());
    -->      return this;
        }
        if (visited == DONE_VISITED) return 0;
    
        visited = BEING_VISITED;
        for (edges.first(); !edges.endOfList(); edges.next()) {
          e = edges.get();
          v2 = e->getVertex2()->visit();
    -->      if (v2 != 0) {
    -->	printf(" <- %s", name.c_str());
    -->	if (v2 == this) { printf("\n"); exit(1); }
    -->	return v2;
          }
        }
        visited = DONE_VISITED;
        return 0;
      }
    };
    
    See how it works on cycle.ts:
    UNIX> CycleTest2 < cycle.ts
    Cycle: CS102 <- CS580 <- CS380 <- CS311 <- CS302 <- CS140 <- CS102
    UNIX>