Arrays and Vectors in C++

1. Introduction

In computer programming, an array a is a special variable containing a list of normal variables so that you can access the $n$ th item with a[n]. This is a very powerful feature and allows a great number of interesting things to be programmed.

The very best general kind of array to use in C++ to use in the one called vector. Vectors have a great deal of useful functions, including the ability to shrink or expand the vector as needed.

Remark.

C++ also shares the same method of writing arrays with its predecessor C. These arrays are very efficient and fast to use but for many reasons are quite inconvenient. For example a C array has a fixed size which cannot grow.

For people who already know C:

People that already know C's arrays sometimes think that this means that there is just something else to learn (groan) and it is "easy enough" to code up a few of the "vector features" in one's own program using the familiar C-style arrays. This is silly and very bad style: the people that wrote "vector" knew what they were doing and did it well. It is silly to duplicate their work, especially when your version is likely to be inconvenient, buggy, or inefficient, or all three, due to you not thinking of all the problems that might occur or not dealing with them properly. The correct advice is: even if you know C-style arrays, the C++ "vector" is better.

Advice.

Unless you are doing something special and really know you need C's arrays, always use vector in C++. Any possible (very slight) performance disadvantage is easily worth the improvements in memory usage, ease to program, and portability and robustness of your code. Vectors are much less error prone than C arrays.

Advice.

For more advanced work, if C++'s vector really doesn't do what you want, then C++ provides a wide range of alternatives. You can look them up and select the best option and you should do so before coding your own.

2. Vectors

Here is an example program that uses a vector of strings.

// months.cpp (click here to download)

// illustrates vectors and sorting
#include <iostream>
#include <string>
#include <vector>

using namespace std;

/*
  quick and easy sorting, but not recommended for anything other
  than very short arrays and non-critical applications
*/  
void sort( vector<string>& v ) {  
  for (int i = 0; i < 12; i++ ) {
    // try each index i and compare v[i] with each v[j] for j>i to
    // find the smallest
    for (int j = i + 1; j < 12 ; j++ ) {
      // see if v[j] is smaller than v[i]
      if (v[i] > v[j]) {
	// if it is, swap
	string s = v[i];
	v[i] = v[j];
	v[j] = s;
      }
    }
  }
}

int main() {
  // create a vector of string with 12 entries
  vector<string> months(12);

  // fill the entries
  months[0] = "January";
  months[1] = "February";
  months[2] = "March";
  months[3] = "April";
  months[4] = "May";
  months[5] = "June";
  months[6] = "July";
  months[7] = "August";
  months[8] = "September";
  months[9] = "October";
  months[10] = "November";
  months[11] = "December";

  // sort the vector
  sort(months);

  // print it
  for (int i = 0; i < 12; i++) {
    cout << months[i] << endl;
  }
}

Explanation.

Here we use "strings" and "vectors". So we have to write #include <string> and also #include <vector>. A vector called months is used in this program. It is a vector of strings, so its type is vector<string>. The (12) initialises the vector so that it initially contains 12 places. (Alternatively vector<string> months; would initialise the vector to be initially empty. If we wanted to initialise the vector to initially have 12 copies of the string "unknown" we could use vector<string> months(12,"unknown");.)
The indices of the items in the vector months are the numbers 0,1,2,...,11. Computing, like logic and a number of other mathematical disciplines uses $ℕ$ "correctly" and 0 is the first counting number. If you have a habit of always starting at 1, please try to get out of the habit. In this module, $0 \in ℕ$ .
The rest of the program should be studied carefully.

Note that, in a vector, all elements of the vector have the same type, here "string". It would be easy to have a vector of int or double values. Just use vector<int> or vector<double> instead.

The function sort in the last example is worth studying. It takes a vector-of-strings argument called v and sorts it alphabetically. Note that the argument is called by reference i.e. v is declared with vector<string>& v. This is normal for vectors. Copying a large vector and using call-by-value is possible but usually very slow. In any case, functions that do something to vectors are very useful.

Vectors and strings have a number of useful functions associated with them. The syntax is vectorname.functionname(functioninputs). You can look up these functions at sites such as www.cplusplus.com or www.cppreference.com but we'll only use a very small number of these here. The main one is vectorname.size() which returns the size (i.e. number of elements currently in the vector). A list of some of the other useful ones is given later in this page.

Example.

There is a small problem with sort which is that the "12" (the size of the vector) is "compiled in", which means this sorting function cannot be used for vectors of any other size. This is easy to fix using special function for a vector that returns its size.

void sort( vector<string>& v ) {
  int s = v.size();
  for (int i = 0; i < s; i++ ) {
    for (int j = i + 1; j < s ; j++ ) {
      if (v[i] > v[j]) {
	string s = v[i];
	v[i] = v[j];
	v[j] = s;
      }
    }
  }
}

Technical note: unsigned values

There is a technical issue with the last version of "sort" that you might come up against at some point. This is that the size of the vector is represented by a slightly different kind of "int", usually called "size_t" instead of "int". (On the computers you are using "size_t" values probably go from $0$ to $2^{64} - 1$ .) It is just conceivable that this "size_t" is too big to go in a "int" variable in which case the i and j would have to be "size_t" too, as in,

void sort( vector<string>& v ) {
  size_t s = v.size();
  for (size_t i = 0; i < s; i++ ) {
    for (size_t j = i + 1; j < s ; j++ ) {
      if (v[i] > v[j]) {
	string s = v[i];
	v[i] = v[j];
	v[j] = s;
      }
    }
  }
}

The compiler can and sometimes does warn about this problem (the option -Wconversion in GCC). No doubt good programmers should heed these warnings, but the chances of having a vector of having a size more than $2^{31} - 1$ in this module is very slim indeed.

Technical note: sorting several types in the same function

There is a further issue with the last version of "sort" that it only sorts vectors of strings and not vectors of doubles or vectors of ints. C++ does have a really interesting feature called "templates" that would enable you to sort a vector of any type just using a single function, but this takes us well beyond the scope of this module.

3. Sorting

The algorithm for sorting you see here is just about the simplest one that is usable. It's called "selection sort". There are a number of clever algorithms for sorting, but this isn't one of them! To sort a large vector selection sort wouldn't be a particularly good method, but it is OK for small vectors of size 12 or so. We will look at some better sorting routines later.

To be brief, what selection sort does is: for each index i, the outer for loop selects the $i$ th smallest entry and places it at index $i$ . It does this by comparing with entries i+1 up to the size of the vector, and this requires an inner for loop. The contents of the inner "if" statement just swap two values round.

Have a go to check you understand why it works.

4. Beginner's traps with arrays and vectors.

Unless you are absolutely sure you know what you are doing and understand the costs of copying a vector, always use call-by-reference when you want to pass a vector as an argument. Similarly, don't have several copies of essentially the "same" vector in memory: try to make do with just one copy, to save memory and copying costs.
By far the most important issue is be to careful that indices to vectors are always in range. If in doubt, add extra code to check the index.

I want to explain the last point here in more detail because it is so important.

In C++ (unlike some other languages) the index to a vector or array is not checked to see if it is in the right range. (This is compatible with C++'s philosophy of doing things in a way that is a fast as possible. Normal code, such as the function "sort" above, already contains all the range checking that is required. So it would be wasteful of computer time if the compiler should automatically add additional index-checking as well.)

What this means is that , if you do months[i] = "silly"; and i is negative or greater than 11, something strange will certainly happen. In principle, some random area of memory on the computer will be written to in an unexpected way. In practice, serious problems of this type may be trapped by the system, but if they are not trapped then your program could behave in a strange and unpredictable way. Even things like mystring = months[i]; could give unexpected and unpredictable results when the value of i is out of range.

It is difficult to over estimate the problem of programs behaving in strange and unpredictable ways. These programs are often almost impossible to debug, certainly debugging them is costly in time and resources. If you are the programmer it is your responsibility to debug it, and if you write in this style then you will waste masses of time.

One the other hand, compiler error messages, or even run-time error messages that you build into your own code, are frustrating but indicate rapidly there is a problem and roughly where it is. These problems can be fixed fairly rapidly without wasting time.

If you are writing a program and still at the debugging stage, you can insert checks on your indices.

So if you want to add a new month name, instead of months[i] = "Januarember"; consider writing instead,

  if (i<0 || i>=months.size()) { cerr << "Something bad has happened. Index out of range: " << i << endl; }
  months[i] = "Januarember";

(cerr is just like cout except it is for error messages: the messages might go to a special place to help with debugging, and the computer is instructed to regard these messages as "important", print them immediately, and not queue them for printing later, as can happen with cout.)

If you want to write mystring = months[i]; consider writing instead,

  if (i<0 || i>=months.size()) { cerr << "Error setting mystring. Index out of range: " << i << endl; }
  mystring = months[i];

You don't have to do this in places you are sure are correct, but are you really sure? Why are you sure?

Of course you can make the error message anything you want, and if you know about exceptions or have some other error-processing feature in your program you can use that. The above is enough for this module, however.

This may seem like a lot of extra work, but it is really worth it. Debugging programs in C++ where array indices have gone wrong is extremely painful and difficult. When you know your program is correct and want to speed it up you can remove the error checking.

5. Other functions available for vectors

As indicated, C++ provides a lot of convenient functions for vectors that are not available for old-style C arrays. Here we suppose "vector<sometype> v" is a vector of objects of type "sometype" and list some of the things you can do with v.

n = v.size(): Set int n to be the number of items in the vector.
if (v.empty()) { ... }: Test if the vector is empty.
v.clear(): Reset the vector to be empty.
v.resize(n): Resize the vector to have size n.
v.resize(n,d): Resize the vector to have size n and with each entry initialised to d.
v.front(): The first element in the vector.
v.back(): The last element in the vector.
v.push_back(x): Add a new item x to the end of the vector, changing its size.
v.pop_back(): Remove the last item in the vector, changing its size.

Even more operations (such as inserting and erasing specific elements in the middle of a vector) are possible via iterators, but this is outside the scope of this web page.

6. Another example

Example.

Here is a program that takes two numbers n,k and computes the set $\{i < n: \exists j i = k^{j} mod n\}$ . Results from this program can be interesting. I leave the program for you to read and understand and try out.

// modn.cpp (click here to download)

// computes the multiplicative group generated by k mod n
#include <iostream>
#include <vector>

using namespace std;

/* sort a vector of ints into increasing order */
void sort( vector<int>& v ) {
  int n = v.size();
  for (int i = 0; i < n; i++ ) { 
    for (int j = i + 1; j < n ; j++ ) {
      if (v[i] > v[j]) {
	int s = v[i];
	v[i] = v[j];
	v[j] = s;
      }
    }
  }
}

int main() {

  // get modulus, n
  int n = 0;
  while (n <= 0) {
    cout << "Enter positive modulus:" << endl;
    cin >> n;
  }
  
  // get generator, k
  int k;
  cout << "Enter generator:" << endl;
  cin >> k;
  if ( k < 0 ) {
    // make k nonnegative
    int r = -k / n;
    k = k + (r + 1)*n;
  }
  k = k % n; // make k < n
  //
  // note that k = k % n would not work on its own
  // because C++ defines -20%6==-2 (not +4) etc.
  
  vector<int> values;
  int x = k;
  while (true) {
    // add values found to the vector
    values.push_back(x);
    // get the next value, k*lastvalue (mod n)
    x = ( x * k ) % n ;
    // if we return to k we have finished
    if ( x == k ) break;
  }

  // the size of the vector is the number of elements in the subgroup
  int size = values.size();

  // list the elements in the order they were found
  cout << "In order of generation: " << endl;  
  for (int i = 0; i < size; i++) {
    cout << values[i] << " ";
  }
  cout << endl;

  // sort them
  sort(values);

  // list the elements in sorted order
  cout << "Sorted: " << endl;
  for (int i = 0; i < size; i++) {
    cout << values[i] << " ";
  }
  cout << endl;
  
  return 0;
}

7. Strings

Strings are lists of characters. Internally they are very similar to arrays (so you can think of them as arrays of characters). In fact the size() function and the [ ] operators work in strings perfectly well as well as a few others for extracting substrings and modifying strings.

Example.

Some of these functions on strings are illustrated in the following.

// teststring.cpp (click here to download)

// illustrates strings
#include <iostream>
#include <vector>

using namespace std;


int main() {
  
  string quotation = "I propoze to consider the question, \"Can machines think?\"";
  cout << quotation << endl;
  
  cout << "length == " << quotation.length() << endl;
  cout << "size (same thing!) == " << quotation.size() << endl;
  
  quotation[7] = 's'; // fix spelling error
  cout << quotation << endl;

  cout << quotation.substr(37,19) << endl;
  cout << quotation.substr(36) << endl;

  quotation.insert(40, "\'t");
  cout << quotation << endl;
  
  quotation.replace(13,8,"examine");
  cout << quotation << endl;
  
  quotation.erase(56,2);
  quotation.append(" straight?\"");
  cout << quotation << endl;
  
  return 0;
}

There are many functions available. Go to the documentation for more details.