What is openmp
easy multithreading programing for c++.
it is a simple C/C++/Fortran complier extension that allows to add parallelism into existing source code without significantly having to rewrite it.
Example
example for init an array1
2
3
4
5
6
7
8
9
10
11
12
13
int main()
{
vector<int> arr;
arr.reserve(1000);
for(int i=0;i<1000;i++)
{
arr[i] = i * 2;
}
return 0;
}
you can compile it like this
g++ tmp.cpp -fopenmp
if you remove the #pragma lines,the result is still a valid C++ program that runs and dose the expected thing.
if the compiler encounters a #pragma that it dose not support,it will ignore it.
The syntax
the parallel construct
it creates a team of N threads(where N is the number of CPU cores by default) . after the statement, the threads join back into one.1
2
3
4
5
{
// code inside thie region runs in parallel
printf("Hello\n");
}
Loop construct: for
the for construct splits the for-loop so that each thread in the current team handles a different portion of the loop1
2
3
4
5
for(int n=0;n<10;++n)
{
printf(" %d",n);
}
Note:#pragma omp for onlt delegates portions of the loop for different threads in current team. a team is the group of threads excuting the program.At program start,the team only consists the main thread.
To create a new team of threads,you need to specify the parallel keyword1
2
3
4
5
{
for(int n=0;n<10;n++) printf(" %d",n)
}
or use1
pragma omp parallel for
you can explicitly specify the number of threads to be created in the team. using num_threads1
pragma omp parallel for num_threads(3)
scheduling
The scheduling algorithm for the for-loop can explicity controlled
default1
pargma omp for schedule(static)
in the dynamic schedule,each thread ask the omp runtime library for an iteration number,then handles it.
the chunk size can also be specified to lessen the number of calls to the runtime library1
#pargma omp parallel for schedule(dynamic,3)
the ordered clause
it is possible to force that certain events within the loop happen in a predicted order, using ordered clause1
2
3
4
5
6
7#pargma omp parallel for ordered shcedule(dynamic)
for(int n = 0;i < 100;i++)
{
files[n].compress();
#pragma omp ordered
send(files[n]);
}
the collapse clause
when you have nested loops.you can use collapse1
2
3
4
5
6
7
8
for(int i = 0;i < 10;i++)
{
for(int j = 0;j < 10;j++)
{
doSth();
}
}
section
sometimes,it is handy to indicate that “this and this can run in parallel” the sectiongs is just for that1
2
3
4
5
6
7
8
9
10
11
12
13
14
15#pragma omp parallel sections
{
{
work1();
}
#pragma omp section
{
work2();
work3();
}
#pragma omp section
{
work4();
}
}
This code indicates that any of tasks work1,work2+work3,work4 can run in parallel.
Thread-safety
Atomicity
1 |
|
the atomic keyword in OpenMP specifies that denoted action happens atomically.
atomic read expression
1 |
|
atomic write expression
1 | #pragma omp atomic write |
atomic update expression
1 | #pragma omp atomic update |
atomic capture expression
capture expression combine the read and update features1
2
var = x++;
the critical construct
the critical construct restricts the execution of the associated statement / block to a single thread at a time1
2
3
4
{
doSth();
}
Note:the critical section names are global to the entire program.
locks
The openmp runtime library provides a lock type,omp_lock_t in omp.h
the lock type has five manipulator functions
omp_init_lock : initializes the lock
omp_destory_lock : the lock must be unset before the call
omp_set_lock: get the lock
omp_unset_lock: release the lock
omp_test_lock: attempts to set the lock.if the lock is already set by another thread it returns 0;if it managed to set the lock,it return 1
the flush directive
Even when variables used by threads are supposed to be shared,the compiler may take liberties and optimize them as register variables. This can skew concurrent observations of variable. The flush directive can be used to forbid this.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15/*first thread*/
b=1;
#pragma omp flush(a,b)
if(a == 0)
{
/* critical section*/
}
/*second thread*/
a = 1;
#pragma omp flush(a,b)
if(b==0)
{
/* critical section*/
}
Controlling which data to share between threads
1 | int a,b =0; |
a is private(each thread has their own copy of it) and b is shared(each thread accesses the same variable)
the difference between private and firstprivate
private does not copy the value of the variable that was in the surrounding context.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
int main()
{
std::string a = "x",b="y";
{
a+="k";
c+= 7;
std::cout << "A is " << a <<", b is "<< b;
}
}
// eaquls this below
OpenMP_thread_fork(2);
{ // Start new scope
std::string a; // Note: It is a new local variable.
int c; // This too.
a += "k";
c += 7;
std::cout << "A becomes (" << a << "), b is (" << b << ")\n";
} // End of scope for the local variables
OpenMP_join();
If you actually need a copy of the original value, use the firstprivate clause instead.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
int main()
{
std::string a = "x", b = "y";
int c = 3;
{
a += "k";
c += 7;
std::cout << "A becomes (" << a << "), b is (" << b << ")\n";
}
}
Execution synchronization
the barrier directive and the nowait clause
The barrier directive causes threads encoutering the barrier to wait until all the other threads in the same team have encountered the barrier.1
2
3
4
5
6
7
8#pragma omp parrllel
{
// all threads execute this
SomeCode();
#pragma omp barrier
// all threads execute this,but not before all threads have finished executing SomeCode()
SomeMoreCode();
}
Note:there is an implicit barrier at the end of each parallel block,at the end of each section for statement
1 |
|
the single and master constructs
The single construct specifies that the given statement/block is executed by only one thread. It is unspecified which thread. Other threads skip the statement/block and wait at an implicit barrier at the end of the construct.1
2
3
4
5
6
7
8
9#pragma omp parallel
{
Work1();
#pragma omp single
{
Work2();
}
Work3();
}
The master construct is similar, except that the statement/block is run by the master thread, and there is no implied barrier; other threads skip the construct without waiting.1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25static const char* FindAnyNeedle(const char* haystack, size_t size, char needle)
{
const char* result = haystack+size;
{
unsigned num_iterations=0;
for(size_t p = 0; p < size; ++p)
{
++num_iterations;
if(haystack[p] == needle)
{
result = haystack+p;
// Signal cancellation.
}
// Check for cancellations signalled by other threads:
}
// All threads reach here eventually; sooner if the cancellation was signalled.
printf("Thread %u: %u iterations completed\n", omp_get_thread_num(), num_iterations);
}
return result;
}
Loop nesting
this code will not do the excepted thing1
2
3
4
5
6
7
8
9
for(int y=0; y<25; ++y)
{
for(int x=0; x<80; ++x)
{
tick(x,y);
}
}
solution1
2
3
4
5
6
7
8#pragma omp parallel for collapse(2)
for(int y=0; y<25; ++y)
{
for(int x=0; x<80; ++x)
{
tick(x,y);
}
}
Readmore
http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
https://en.wikipedia.org/wiki/OpenMP