Domenico on September 15th, 2011

…what were we talking about?

Last time, we coded a small OpenMP-style parallel construct using some macro directives and a class wrapping a vector of threads.

This time we will add a replacement for 2 OpenMP library functions: omp_get_num_threads() and omp_get_thread_num(). These are among the most used (and useful) OpenMP functions.

I’ll show you several implementations, but to start we need a little clean-up in the thread_pool class.

first try: adding methods

Following there’s the source for the thread_pool class, with a few modifications I’ll explain later. I suppose that you put it in a file called “thread_pool.h” in your working directory, if you put it elsewhere, change the #include in the samples to make them work.

#include <thread>
#include <algorithm>
#include <vector>
#include <iostream>
#include <functional>

using namespace std;

typedef function <void ()> task;
class thread_pool {
  private:
    vector<thread> the_pool;

  public:
    thread_pool(unsigned int num_threads, task tbd) {
      for(int i = 0; i < num_threads; ++i) {
        the_pool.push_back(thread(tbd));
      }
    }

    void join() {
      for_each(the_pool.begin(), the_pool.end(), 
        [] (thread& t) {t.join();});
    }

    void nowait() {
      for_each(the_pool.begin(), the_pool.end(), 
        [] (thread& t) {t.detach();});
    }
    
    int get_num_threads() { return the_pool.size(); }
    
    int get_thread_num() {
      for(int i = 0; i < the_pool.size(); ++i)
        if(the_pool[i].get_id()==this_thread::get_id()) return i;
      return -1;
    }

};

#define parallel_do_(N) thread_pool (N, []()
#define parallel_do parallel_do_(thread::hardware_concurrency())
#define parallel_end ).join();
#define parallel_end_nowait ).nowait();

The first important change in the thread_pool class is the definition of task, it's no more a function pointer but ‐ more correctly ‐ a std::function returing void and taking no arguments. This way it works with function pointer and lambda arguments, and allows us to capture variables in lambdas.

The mandatory usage example:

#include "thread_pool.h"

int main() {

    thread_pool p(4, [&p] () {
        cout << "I'm thread n. " << p.get_thread_num() 
             << " in a pool of " << p.get_num_threads() << endl;
    });

    // You could do other things before joining...
    p.join();

    return 0;
}

As you see, there's a thread_pool instance named p, and in the code passed to the constructor (and executed by four threads), the object p itself is used to call the methods get_num_threads() and get_thread_num(). This "magic" is made possible because the variable p has been captured. The square brackets in lamdas are used for this purpose (as usual, I'm not explaining the whole thing, but there's plenty of information on the net).

This solution works, but requires our pools to be named, so we should modify our macro definition to include the pool name as a parameter. We can do better.

second try: the global map

I want to say it loud and clear: I don't like this second solution at all, it's not elegant and uses a global object. I'm not even going to show you a complete example, just a modified version of the thread_pool class to give you the idea of how it could be done.

//... includes omitted
int get_thread_num();
int get_num_threads();

typedef function <void ()> task;
class thread_pool {
  private:
    typedef map<thread::id, thread_pool*> thread_map;
    static thread_map allthreads;
    friend int get_thread_num();
    friend int get_num_threads();
    vector<thread> the_pool;

  public:
    thread_pool(unsigned int num_threads, task tbd) {
      for(int i = 0; i < num_threads; ++i) {
        the_pool.push_back(thread(tbd));
        allthreads.insert(
          thread_map::value_type(the_pool[i].get_id(), this));
      }
    }
    
    ~thread_pool() {
       for_each(the_pool.begin(), the_pool.end(), [] (thread& t) { 
         allthreads.erase(t.get_id()); 
        });
    }

    void join() {
      for_each(the_pool.begin(), the_pool.end(), 
        [] (thread& t) {t.join();});
    }

    void nowait() {
      for_each(the_pool.begin(), the_pool.end(), 
        [] (thread& t) {t.detach();});
    }
    
    int get_num_threads() { return the_pool.size(); }
    
    int get_thread_num() {
      for(int i = 0; i < the_pool.size(); ++i)
        if(the_pool[i].get_id()==this_thread::get_id()) return i;
      return -1;
    }

};

int get_thread_num() {
  thread_pool * p = thread_pool::allthreads[this_thread::get_id()];
  return p->get_thread_num();
}

int get_num_threads() {
  thread_pool * p = thread_pool::allthreads[this_thread::get_id()];
  return p->get_num_threads();
}

// This should be in a .cc file!
map<thread::id, thread_pool*> thread_pool::allthreads;

How it works:

  • the object allthreads associate every thread in a thread_pool to its pool.
  • thread_pool's constructor and destructor take care of adding and removing entries to the map
  • the static friend functions get_num_threads() and get_thread_num() use allthreads to get a pointer to the thread's pool and invoke the homonymous instance methods

I'm not going to complete or discuss further this example, because I want to show you a better way.

a better way: thread local storage

Thread-local storage is a way to let each thread mantain its own version of a global variable or memory region. The idea is to use two thread-local variables to store num_threads and thread_num.

C++11 introduces the storage specifier thread_local to declare thread-local variables. Sadly, many compilers don't support it yet, and GCC is one of them, so I'll use the __thread builtin for this compiler, but the principle is the same.

Here is the resulting thread_pool class.

#include <thread>
#include <algorithm>
#include <vector>
#include <iostream>
#include <functional>

using namespace std;

#ifdef __GNUG__
static __thread int thread_num;
static __thread int num_threads;
#else
static thread_local int thread_num;
static thread_local int num_threads;
#endif 

typedef function <void ()> task;
class thread_pool {
  private:
    vector<thread> the_pool;

  public:
    thread_pool(unsigned int n_threads, task tbd) {
      for(int i = 0; i < n_threads; ++i) {
        the_pool.push_back(thread([=] () {
          thread_num = i;
          num_threads = n_threads;
          tbd();
        }));
      }
    }
    
    void join() {
      for_each(the_pool.begin(), the_pool.end(), 
        [] (thread& t) {t.join();});
    }

    void nowait() {
      for_each(the_pool.begin(), the_pool.end(), 
        [] (thread& t) {t.detach();});
    }
    
};

#define parallel_(N) thread_pool (N, []()
#define parallel parallel_(thread::hardware_concurrency())
#define parallel_end ).join();
#define parallel_end_nowait ).nowait();
#define single if(thread_num==0) 

The local copy of num_threads and thread_num are initialized in thread_pool's constructor. Again, we are using variable capture to access the referenced variables inside the thread code.

Example:

#include "thread_pool.h"
#include <iostream>

int main() {

    parallel_(4)
    {
      cout << "I'm thread " << thread_num << " of " 
           << num_threads << endl;
      single
      {
        cout << "This region is executed only by thread " 
             << thread_num << endl;
      }
    }
    parallel_end
    
    return 0;
}

In the example we have a parallel region executed by four threads, with a nested region executed only by the first thread in the pool. Thanks to the thread-local variables and to the macros, the code is both readable and concise.

That's it. Leave a comment to ask a question, suggest an improvement or share a thought.

Tags: , , ,

Domenico on September 12th, 2011

The new C++ standard, called C++11, is finally here.
It enriches both the language and its standard library, bringing some features that many users awaited, like lamdas, the “auto” type, and so on. But I’m not going to talk about these, there are a lot of good references on the net.

Some days ago, after viewing Bartosz Milewski’s excellent tutorial on C++11 concurrency, I started playing with the language additions, trying to mimic the behavior of some OpenMP directive. (Again, if you want to learn more about OpenMP, surf the internet. You can start from the official about page).

I’ll show you some of these experiments. Of course we aren’t going to fully implement even a single directive, but maybe we can learn something about the new standard. Readers’ comments and suggestion are welcome!

testing the code

C++11 is a young standard, and the compilers still don’t support it fully. I’ve used GCC 4.7 to compile the code below, but you can try with your favorite C++ compiler. Here’s a nice table summarizing support for the new features in various popular compilers: C++0xCompilerSupport.

To enable support for the new features in g++, add the switch -std=c++0x , to compile OpenMP code add -fopenmp too.

parallel

With OpenMP, a programmer can introduce parallelism adding compiler directives and using its library functions. It uses a fork-join execution model. The simplest way to enable parallel execution of a region of code is via the parallel directive. Here’s a very simple example:

#include <iostream>
#include <omp.h>

using namespace std;

int main() {

  #pragma omp parallel num_threads(4)
  {
    cout << "I'm thread number " << omp_get_thread_num() << endl;
  }
  cout << "This code is executed by one thread\n";

  return 0;
}

The example is self-explanatory: the code block after the OpenMP pragma "parallel" is executed by 4 threads.
At the end of the region there's an implicit barrier, so the last cout is executed only when all the threads have left the parallel region.
Copy this code in a file, compile it, run it and look at the output (eg. g++ -std=c++0x -fopenmp para1.c -o para1; ./para1)

Let's see how to emulate this behavior using C++'s std::thread.

threads in C++11

To start a new thread, in C++11 we just need to create a std::thread object. The simpest (and useless!) example I can imagine is this:

#include <iostream>
#include <thread>

using namespace std;

void hello() {
  cout << "Hello from a thread\n";
}

int main() {
  thread aThread(&hello);
  aThread.join();

  return 0;  
}

We can avoid passing a function pointer in line 11 and make the thing nicer using a lambda:
(From now on I'm going to omit some of the includes and other repeated code for brevity. It should be easy to add the missing parts.)

int main() {
  thread aThread([]() {
    cout << "Hello from a thread\n";
  });
  aThread.join();

  return 0;  
}

let's use the threads

Ok, we can use threads to execute some work in parallel. Let's write a trivial thread pool class for the purpose.

using namespace std;

typedef void (*task) ();
class thread_pool {
  private:
    vector the_pool;

  public:
    thread_pool(unsigned int num_threads, task tbd) {
      for(int i = 0; i < num_threads; ++i) {
        the_pool.push_back(thread(tbd));
      }
    }

    void join() {
      for_each(the_pool.begin(), the_pool.end(), 
                 [] (thread& t) {t.join();});
    }

    void nowait() {
      for_each(the_pool.begin(), the_pool.end(), 
                 [] (thread& t) {t.detach();});
    }
};

It's just a wrapper over a vector of threads, with some method we'll find useful later.
We can use it this way:

thread_pool pool(4, []() {
  cout << "Here I am: " << this_thread::get_id() << endl;
});
cout << "I can do other things before waiting for them to finish!" << endl;
pool.join();

Put these lines in a main and run this example. As you can see:

  • Four threads are stared, each of them executes the code in the lambda (says "Here I am")
  • The main thread can get some other work done before joining them

syntactic sugar

With a bit of syntactic sugar, we can make the code to resemble the OpenMP version more closely. A couple of macros will help us:

// class thread_pool omitted...

#define parallel_do_(N) thread_pool (N, []()
#define parallel_end ).join();

int main() {

    parallel_do_(4)
    {
      cout << "Here I am: " << this_thread::get_id() << endl;
    }
    parallel_end

    return 0;
}

a bit more

As I said you at the beginning, we are not trying to emulate the parallel construct fully, it does a lot more and has a lot of clauses that control its behavior. However, we can easily add support for a couple of nice things:

  1. Let the system choose an appropriate number of threads
  2. Avoid the implicit barrier at the end of the parallel region

Omit the number of threads

When you don't specify the num_threads clause, OpenMP figures out itself the number of threads to start, based on the hardware resources available (and a lot of other things!). We can achieve a similar result using thread::hardware_concurrency() as a default value for num_threads.

Don't wait at the barrier

The nowait clause instructs OpenMP to not generate a barrier at the end of the parallel region. We can do this by detaching from the threads in the pool instead of joining them.

The following listing shows the new code and a sample of use.

// class thread_pool omitted...

#define parallel_do_(N) thread_pool (N, []()
#define parallel_do parallel_do_(thread::hardware_concurrency())
#define parallel_end ).join();
#define parallel_end_nowait ).nowait();

int main() {

    parallel_do_(4)
    {
      cout << "Here I am: " << this_thread::get_id() << endl;
    }
    parallel_end_nowait

    cout << "[MASTER] I can do other things while they complete...\n";

    //With default number of threads
    parallel_do
    {
      cout << "Let's count ourselves. I'm  " 
             << this_thread::get_id() << endl;
    }
    parallel_end

    cout << "[MASTER] Goodbye.\n";
    return 0;
}

Today we will stop here. I hope you enjoyed the reading.
Share your thoughts in the comments.

Tags: , , ,

Domenico on August 26th, 2010

People learning Git usually realize very soon that it can make a lot easier their daily development tasks and collaboration with other developers.
This is true even for people proficient with traditional version control systems (like SVN), that lack the advanced branching and merging functionalities typical of modern DVCSes.

You can be even more productive by setting up your client and environment to make it fit better your needs. Here are some tips, based on my configuration, hope you find them useful too!

Tweaking the configuration

Configuration basics

Every Git repository has a configuration file called config located in the .git directory. Moreover there’s a global config file called .gitconfig and located in the user’s home directory.
Options specific to the each repository (eg. remotes and branches configuration) go in .git/config, while general options should be put in .gitconfig. For instance, I’ve put my name and email there:

[user]
        name = Domenico Rotiroti
        email = xxxxx@xxxx.com

Git uses these informations to fill the author’s data every time I do a commit.

Configuration options can be read and modified by editing the files by hand or using the git config command. So I could set my name this way:

git config --global user.name Domenico Rotiroti

If you’re new to git I recommend that you read the git-config man page to see how many things you can configure!

A prettier output

One of the most used (and useful) git command is “diff”. If you add the --color option you will see that removals are printed in red, while additions in green.

@@ -1,6 +1,6 @@
#!/usr/bin/env php
 <?
-include(‘utils.inc’);
+include(‘repos-tools/utils.inc’);
 $vcs = array ( “git”, “svn”, “bzr”, “hg” );

Nice, isnt’t it?
If you like it, you can make diff output colored text by default by setting the color.diff option:

git config --global color.diff auto

Diff has another nice option, called --color-words, that will show the changed words in a line side-by-side instead of outputting two lines. Personally, I prefer the traditional format, but if you don’t agree I suggest you to add an alias:

git config --global alias.wdiff 'diff --color-words'

With this alias in place, just run ‘git wdiff’ to get a word-colored diff.
With aliases you can do a lot of nice things, for instance I found useful to a have an alias with my favorite options for the log command:

git config --global alias.oneline 'log --oneline --decorate'

Extending git

We’ve seen that new commands can be added to git using aliases. This is not the only way!
The command line client git acts as a command wrapper: whenever you run “git somecommand” it looks for git-somecommand in the path and executes it passing by all the parameters.
Adding new functionalities is really simple, indeed someone already did it! :)
I recommend you to take a look at visionmedia‘s git-extras here http://github.com/visionmedia/git-extras, it adds several useful commands.

repos-tools

Of course I couldn’t forget to mention our repos-tools, that could make you save time when working with multiple repositories.
If your repos are on GitHub, you can fork, watch, list issues and more from the command line.

Goodbye

For today, that’s it! Got tips to share? Write us a comment while they’re open.

Tags: , , , , ,

Domenico on July 29th, 2010

Repos-tools is the newborn in theSaguaros’ open source projects.

Really it is just a php script, runnable from the shell, to manage your code repositories, doing things on all of them at once.

For instance, to stay up-to-date with the changes made by others just run:

repost pull

Currently it can handle Git, Subversion and Bazaar repositories.

Read more at its page and fork it on GitHub.

Update (Aug, 4th): repos-tools is growing… I just added a command line client for GitHub and another action to repost. Watch the repository to follow its development!

Tags: ,

Kaikko on May 3rd, 2010

I’m trying XML Google Maps plugin to show my road trips and some pics… take a look!

Unfortunately some things don’t work yet and page is only in italian… but… stay tuned! :)

Tags: ,

Kaikko on April 14th, 2010

Say “hi” to our new theme…. and thanks to Vladimir Prelovac for “Amazing Grace” theme! :)

Kaikko on April 12th, 2010

NextGEN Gallery PlugIn installed! :)

And first gallery created!

Tags: , ,

Kaikko on April 12th, 2010

Welcome to the new version of  theSaguaros.com :)