Graph connectivity

For these exercises you will be looking at some algorithms on graphs.

This topic of "graph algorithms" is huge, and very interesting. Some aspects of it relate to management maths and optimisation. Graphs (in some representation or other) are used throughout programming and computing. For example: think how a SatNav stores information about a road network and how it can calculate the shortest distance between two points. Graph algorithms are obviously important in combinatorics, but they are also important in other areas of mathematics, such as logic, algebra, and elsewhere.

Note that the algorithms described in this web page are slightly simplified for these particular exercises, and may not be optimal for "real world applications". Please do not use them for "mission critical" applications.

1. Representation of graphs

A graph (studied in first year combinatorics) is a set of vertices (singular: vertex) with a set of edges. An edge connects two vertices together, and we say two vertices that are joined by some edge are adjacent.

There are various varieties of graphs: finite graphs and infinite graphs; graphs with or without loops (a loop is an edge connecting a vertex to itself); graphs with or without multiple edges (more than one edge connecting the same pair of vertices); graphs where the edges have a direction; and graphs where edges have no direction. A simple graph is one where the edges do not have direction and there are no loops or multiple edges. The system of representation to be given here will allow rather more general finite graphs to be represented than just the simple graphs.

To represent graphs on the computer we must first say what the vertex set will be. For this project, the vertex set will always be V = 0 , 1 , , n - 1 where n is a C++ "int" (and n will be sufficiently small so overflow problems do not occur). We will always use the letter n to denote the number of vertices.

To represent the edges we clearly need some sort of array. In fact we will use a "double array", i.e. an array of arrays, because each edge connects two vertices. Quite a lot of the ideas here will be similar to the previous exercises on representing sets in C++. I will repeat any information that is absolutely essential for this project here, but you may find it useful to review that material.

Thus a graph g will be a double vector of ints declared as follows,

vector< vector<int> > g;

where the size of the graph g, calculated as g.size() and always denoted here as n , is the number of vertices of the graph.

For such a g, and given i in the range 0,..,n-1, the vector g[i] will be a list of the edges that start at the vertex i. For example, if the vector<int> object g[i] has size k equal to 5, and this vector has the int entries

  2,6,8,0,2

then this vertex i has five edges leaving it, going to vertices 2,6,8,0, and 2 respectively. (The order of listing these vertices is not intended to have any significance.) Note that these are directed edges and there are two edges from i to 2.

This enables you to represent graphs with directed edges. Note that you can give g[4] an array such as

  2,4,5,6

so there is a loop from 4 to 4.

Your code therefore should work for digraphs (i.e. directed graphs or graphs with directed edges) that may have loops and multiple edges. For a simple graph, loops and multiple edges are not allowed, and all edges are undirected, so if you really want to use this representation to give a simple graph (with single undirected edges and no loops) you should make sure that each coming out of i is a single edge, does not go to i, and if it goes to j there had better be a single edge going from j back to i. Thus a simple graph is represented by a structure with exactly 2 e "directed" edges, where e is the number of "undirected" edges.

Exercise.

Your first task is to write a function that determines when two vertices i, j in a graph are adjacent in the sense that there is an edge from i to j. (Note that the order of i, j matters: say i is adjacent to j if there is an edge from i to j. This is not necssarily the same as saying j is adjacent to i.) Your function will look like

/* 
 * returns true when i,j are adjacent in g, 
 * i.e. when there is an edge from vertex i to vertex j, 
 * i.e. when j is an entry in the vector g[i] 
 */
bool adjacent( vector< vector<int> >& g, int i, int j ) {
  // your code goes here
}

Note: You should start by checking that inputs i,j are in the right range and returning "false" if they are not. If you don't do this some user might give your code bad values and get unexpected (and incorrect) answers. You should not assume that the input graph g is simple.

Hint: Because "adjacent" is searching a vector g[i] to see if it contains a value j, the basic idea in elementof in the section on representing sets in C++ is going to be similar.

The following "template" can be used for your program. It contains other example graph functions for you to study as examples, with comments saying what they all do. Note that all the code here depends on your function adjacent and will not work if there is a problem in adjacent.

In particular, issimple checks if a graph is simple or not, and printgraph prints a graph in a reasonably nice way, including a check whether it is simple or not. Both these functions depend on your adjacent function to work properly. The function randomdigraph randomly generates a digraph with a certain probability of an edge, and randomsimplegraph creates a "random" graph for testing, which should be simple and for each i,j has an edge between i,j with a particular probability. You can use any of these functions in your "main" for testing and experiments.

#include <iostream>
#include <vector>
#include <cstdlib>
#include <ctime>

using namespace std;

/* 
 * returns true when i,j are adjacent in g, 
 * i.e. when there is an edge from vertex i to vertex j, 
 * i.e. when j is an entry in the vector g[i] 
 */
bool adjacent( vector< vector<int> >& g, int i, int j ) {
  // your code goes here
}

/* using "adjacent", determines if the graph g is simple
 * i.e. has no loops, no multiple edges, and has 
 * for each edge (i,j) an edge (j,i)
 */
bool issimple( vector< vector<int> >& g ) {
  int n = g.size();
  for (int i=0; i<n; i++) {
    // test for double edges
    for (int j=0; j<g[i].size(); j++) {
      for (int k=j+1; k<g[i].size(); k++) {
	if (g[i][j] == g[i][k]) return false; // double edge
      }
    }
    // test for loops
    if (adjacent(g,i,i)) return false; // has loop
    // test for directed edges
    for (int j=0; j<n; j++) {
      if (adjacent(g,i,j) && !adjacent(g,j,i)) return false; // directed
    }
  }
  return true;
}

/* 
 * determines number of edges in g; this is the
 * "correct" number of edges if the graph g is simple
 * or else the total number of directed edges otherwise
 */
int nedges( vector< vector<int> >& g ) {
  int n = g.size();
  int nedges = 0;
  for (int i=0; i<n; i++) {
    nedges = nedges + g[i].size(); // number of edges incident with vertex i
  }
  if (issimple(g)) {
    return nedges/2; // edges all counted twice
  } else {
    return nedges;
  }
}

/* 
 * prints the graph g to cout, checking for simplicity and
 * printing g appropriately if it is simple 
 */
void printgraph( vector< vector<int> >& g ) {
  int n = g.size();
  bool simp = issimple(g);
  if (simp) { cout << "Simple "; }
  cout << "Graph";
  if (simp) { cout << " with " << nedges(g) << " edges"; }
  else { cout << " with " << nedges(g) << " directed edges"; }
  cout << ": V = {";
  for (int i=0; i<n-1; i++) { cout << i << ","; }
  cout << (n-1) << "};";
  cout << " E = {";
  bool started = false;
  for (int i=0; i<n; i++) {
    int s = g[i].size();
    for (int j=0; j<s; j++) {
      if (g[i][j]<=i && simp) continue; // don't repeat edges if it is simple
      if (!started) { started=true; } else { cout << ","; } 
      cout << "(" << i << "," << g[i][j] << ")";
    }
  }
  cout << "}" << endl; 
}

/* 
 * sets g to be a random digraph (with no multiple edges) for which 
 * (i,j) is an edge with probability p, for all i,j 
 */
void randomdigraph( vector< vector<int> >& g, int n, double p ) {
  vector<int> empty(0);
  g.resize(n);
  for (int i=0; i<n; i++) {
    g[i] = empty; // initialise list for vertex i
    // make new edges
    for (int j=0; j<n; j++) {
      double q = double(rand())/RAND_MAX; // a random value in 0..1
      if (q<p) {
	g[i].push_back(j);
      }
    }
  }
}

/* 
 * sets g to be a simple graph (no loops or directed edges) for which 
 * (i,j) is an edge with probability p, for all i<j 
 */
void randomsimplegraph( vector< vector<int> >& g, int n, double p ) {
  vector<int> empty(0);
  g.resize(n);
  for (int i=0; i<n; i++) {
    g[i] = empty; // initialise list for vertex i
    // make new edges
    for (int j=i+1; j<n; j++) {
      double q = double(rand())/RAND_MAX; // a random value in 0..1
      if (q<p) {
	g[i].push_back(j);
      }
    }
    // check for old edges connected to i
    for (int j=0; j<i; j++) {
      // did we already add edge j:i ?
      for (int k=0; k<g[j].size(); k++) {
	if (g[j][k]==i) {
	  // yes we did
	  g[i].push_back(j);
	  break; // break "for (int k"
	}
      }
    }
  }
}

int main() {
  srand(time(NULL)); // initialialise random numbers against the time
  vector< vector<int> > g; // declare a variable for the graph
  randomsimplegraph(g,10,0.5);
  printgraph(g);
  if (adjacent(g,1,2)) {
    cout << "vertex 1 is adjacent to vertex 2" << endl;
  } else {
    cout << "vertex 1 is not adjacent to vertex 2" << endl;
  }
  return 0;
}

2. Connectivity of graphs

You will already know the following definitions.

Definition.

A trail in a graph g is a sequence of vertices v 0 , v 1 , , v k so that for all i k - 1 there is an edge from vertex v i to vertex v i + 1 . The length of this trail is the number of edges in it, i.e. k . A path is a trail with no repeated vertices.

Note that trails and paths are directed. They go from one vertex to another via directed edges traversed in the correct direction.

Important. In this module we allow trails and paths to have length 0, i.e. to consist of a single vertex with no edges. This will be assumed from now on. It makes the definitions and code much shorter. It will have the very reasonable consequence that every vertex is connected to itself.

Definition.

The distance from vertex v to vertex w in a graph g is the length of a shortest trail from v to w . If there is no such trail this distance is .

Therefore the distance from a vertex to itself is zero.

Definition.

A (di)graph g is connected if for every pair of vertices v and w in g the distance from v to w is finite.

Note: In the case when the graph is directed, some people emphasise this kind of connectivity (and the direction of the paths, etc.) and say the graph is strongly connected if the definition above holds. (We will not look at the alternative "weak" connectivity at all here.) Just remember that is it possible for a vertex v to be connected to w and that w is not connected to v .

To calculate the distance between two vertices on the computer it is necessary to calculate many such trails, and it isn't really possible to calculate the distance from vertex v to w efficiently and directly without also looking at lots of other distances.

Exercise.

Your next task, and the main one for this exercise sheet, is to write a C++ function that calculates distances in a (di)graph. Specifically you need to write a function as follows.

/* 
 * computes a vector "dist" of size n = size of g
 * that gives the distance of each vertex u from v0 as dist[u]
 * 
 * here, the distance -1 represents "infinity"
 */
void distances( vector< vector<int> >& g, int v0, vector<int>& dist) {
  int n = g.size();
  dist.resize(n);
  // the rest is over to you ...			  
}

Note that since "int" does not have a value "infinity" and since distances (if finite) are always nonnegative, we can use the "distance" of -1 to represent "infinity". This does mean that the < and <= operators on "int" don't always do exactly what you want when you are comparing finite and infinite values (for example x < y will be false is x is infinite and y is finite) so be careful! Note also that we make the definition that a graph with no vertices is deemed to be connected.

The distances function is a bit more complicated, and details of a suitable algorithm are given below. If you want, you can use the adjacent function you have written, but you might find it just as straightforward to write your code without using it.

Here is a suggested algorithm, written in words. You must read it and write it in C++.

There is a loop in the first of the two bullet points here, and a treble loop in the other one. You'll need to consider how to break out of the outermost loop over values d. For testing, you can try your function out on a "random" graph, like before. It will probably be a good idea if you use some code to write out the vector of distances, something like,

vector<int> dist;
distances(g,2,dist); // using 2 as an example vertex here
for (int i=0; i<dist.size(); i++) {
  cout << "distance from 2 to " << i << " is " << dist[i] << endl;
}

You may want to experiment with different values for the probability of an edge. (This is the 0.5 in randomsimplegraph(g,10,0.5);).

Exercise.

Use your distances function to write a function that determines if a graph is connected.

/* 
 * determines if the graph g is connected
 * i.e. in g, every vertex is connected to every other vertex
 * 
 * here you should NOT assume g that is simple:  you need to check
 * that every vertex is connected to every other vertex.
 */
bool connected( vector< vector<int> >& g ) {
  // over to you
}

The idea is simple: a graph is connected if every vertex has finite distance to every other vertex. You'll have to declare a "dist" vector inside the "connected" function, but your "distances" function should be used to initialise it.

Remark.

The notion of connectedness of a simple graph is slightly different and a bit easier to compute. This will be addressed in Quiz questions elsewhere. If you have time you can also implement the following function.

/* 
 * determines if the graph g is simple and connected
 * returns 
 *   false if g is not simple or not connected, 
 *   true otherwise 
 *
 * if skipgissimple==true then this function skips the test to see if
 * g is simple and assumes g is simple.  If skipgissimple==false then
 * the test to see if g is simple is performed first
 */
bool connectedsimple( vector< vector<int> >& g, bool skipgissimple=false ) {
  // ...
}

Exercise.

If the graph is simple, the algorithm for testing connectedness will be easier. How can we use simplicity to improve the connectedness algorithm?

To answer this question fully you need to analyse (a) the time it takes to determine a graph is simple and (b) how much time is saved calculating distances by the improved method when simplicity is known.

3. An application

Given a "random" simple graph g with n vertices such that for each i , j ( i < j ) these are adjacent with probability p and not adjacent otherwise, it might be interesting to know what the probability is that the whole graph is connected. This can be calculated experimentally with the following "main":

int main() {
  srand(time(NULL)); // initialialise random numbers against the time
  vector< vector<int> > g; // declare a variable for the graph

  // set these three parameters for testing however you like:
  int nvertices = 20;     // number of vertices
  int samplesize = 1000;  // number of graphs to test
  double probedge = 0.15; // probability of an edge
  
  int maxnedges = nvertices*(nvertices-1)/2; // maximum possible number of edges
  int conn = 0; // count of number of connected graphs
  int ne = 0;   // count of number of edges
  for (int i=0; i<samplesize; i++) {
    randomsimplegraph(g,nvertices,probedge);
    // printgraph(g); // optional: you won't want this for large samplesize!
    if (connected(g)) { conn++; }
    ne = ne + nedges(g);
  }
  double avenedges = ((double)ne/(double)samplesize); // average number of edges
  double aveconn = ((double)conn/(double)samplesize); // average number of connected graphs
  cout << "Testing simple graphs with " << nvertices << " vertices and at most " << maxnedges <<  " edges." << endl ;
  cout << "Out of " << samplesize << " samples with p=="<< probedge <<", " << conn <<  "(==" << samplesize << "*" << aveconn << ") were connected." << endl;
  cout << "Average " << avenedges << "(==" << maxnedges << "*" << (avenedges/maxnedges) << ") edges per graph." << endl;

  return 0;
}

Note here the C++ technique of converting an int variable n to double by preceding it with (double).

Enjoy this program!

Some more information on graphs on the computer is provided in the next page.