The new C++ standard, called C++11, is finally here.
It enriches both the language and its standard library, bringing some features that many users awaited, like lamdas, the “auto” type, and so on. But I’m not going to talk about these, there are a lot of good references on the net.

Some days ago, after viewing Bartosz Milewski’s excellent tutorial on C++11 concurrency, I started playing with the language additions, trying to mimic the behavior of some OpenMP directive. (Again, if you want to learn more about OpenMP, surf the internet. You can start from the official about page).

I’ll show you some of these experiments. Of course we aren’t going to fully implement even a single directive, but maybe we can learn something about the new standard. Readers’ comments and suggestion are welcome!

testing the code

C++11 is a young standard, and the compilers still don’t support it fully. I’ve used GCC 4.7 to compile the code below, but you can try with your favorite C++ compiler. Here’s a nice table summarizing support for the new features in various popular compilers: C++0xCompilerSupport.

To enable support for the new features in g++, add the switch -std=c++0x , to compile OpenMP code add -fopenmp too.

parallel

With OpenMP, a programmer can introduce parallelism adding compiler directives and using its library functions. It uses a fork-join execution model. The simplest way to enable parallel execution of a region of code is via the parallel directive. Here’s a very simple example:

#include <iostream>
#include <omp.h>

using namespace std;

int main() {

  #pragma omp parallel num_threads(4)
  {
    cout << "I'm thread number " << omp_get_thread_num() << endl;
  }
  cout << "This code is executed by one thread\n";

  return 0;
}

The example is self-explanatory: the code block after the OpenMP pragma "parallel" is executed by 4 threads.
At the end of the region there's an implicit barrier, so the last cout is executed only when all the threads have left the parallel region.
Copy this code in a file, compile it, run it and look at the output (eg. g++ -std=c++0x -fopenmp para1.c -o para1; ./para1)

Let's see how to emulate this behavior using C++'s std::thread.

threads in C++11

To start a new thread, in C++11 we just need to create a std::thread object. The simpest (and useless!) example I can imagine is this:

#include <iostream>
#include <thread>

using namespace std;

void hello() {
  cout << "Hello from a thread\n";
}

int main() {
  thread aThread(&hello);
  aThread.join();

  return 0;  
}

We can avoid passing a function pointer in line 11 and make the thing nicer using a lambda:
(From now on I'm going to omit some of the includes and other repeated code for brevity. It should be easy to add the missing parts.)

int main() {
  thread aThread([]() {
    cout << "Hello from a thread\n";
  });
  aThread.join();

  return 0;  
}

let's use the threads

Ok, we can use threads to execute some work in parallel. Let's write a trivial thread pool class for the purpose.

using namespace std;

typedef void (*task) ();
class thread_pool {
  private:
    vector the_pool;

  public:
    thread_pool(unsigned int num_threads, task tbd) {
      for(int i = 0; i < num_threads; ++i) {
        the_pool.push_back(thread(tbd));
      }
    }

    void join() {
      for_each(the_pool.begin(), the_pool.end(), 
                 [] (thread& t) {t.join();});
    }

    void nowait() {
      for_each(the_pool.begin(), the_pool.end(), 
                 [] (thread& t) {t.detach();});
    }
};

It's just a wrapper over a vector of threads, with some method we'll find useful later.
We can use it this way:

thread_pool pool(4, []() {
  cout << "Here I am: " << this_thread::get_id() << endl;
});
cout << "I can do other things before waiting for them to finish!" << endl;
pool.join();

Put these lines in a main and run this example. As you can see:

  • Four threads are stared, each of them executes the code in the lambda (says "Here I am")
  • The main thread can get some other work done before joining them

syntactic sugar

With a bit of syntactic sugar, we can make the code to resemble the OpenMP version more closely. A couple of macros will help us:

// class thread_pool omitted...

#define parallel_do_(N) thread_pool (N, []()
#define parallel_end ).join();

int main() {

    parallel_do_(4)
    {
      cout << "Here I am: " << this_thread::get_id() << endl;
    }
    parallel_end

    return 0;
}

a bit more

As I said you at the beginning, we are not trying to emulate the parallel construct fully, it does a lot more and has a lot of clauses that control its behavior. However, we can easily add support for a couple of nice things:

  1. Let the system choose an appropriate number of threads
  2. Avoid the implicit barrier at the end of the parallel region

Omit the number of threads

When you don't specify the num_threads clause, OpenMP figures out itself the number of threads to start, based on the hardware resources available (and a lot of other things!). We can achieve a similar result using thread::hardware_concurrency() as a default value for num_threads.

Don't wait at the barrier

The nowait clause instructs OpenMP to not generate a barrier at the end of the parallel region. We can do this by detaching from the threads in the pool instead of joining them.

The following listing shows the new code and a sample of use.

// class thread_pool omitted...

#define parallel_do_(N) thread_pool (N, []()
#define parallel_do parallel_do_(thread::hardware_concurrency())
#define parallel_end ).join();
#define parallel_end_nowait ).nowait();

int main() {

    parallel_do_(4)
    {
      cout << "Here I am: " << this_thread::get_id() << endl;
    }
    parallel_end_nowait

    cout << "[MASTER] I can do other things while they complete...\n";

    //With default number of threads
    parallel_do
    {
      cout << "Let's count ourselves. I'm  " 
             << this_thread::get_id() << endl;
    }
    parallel_end

    cout << "[MASTER] Goodbye.\n";
    return 0;
}

Today we will stop here. I hope you enjoyed the reading.
Share your thoughts in the comments.

Tags: , , ,

One Response to “OpenMP-style constructs in C++11”

  1. [...] Last time, we coded a small OpenMP-style parallel construct using some macro directives and a class wrapping a vector of threads. [...]