This three-step tutorial is designed to show you how to take advantage of compilers and libraries to quickly accelerate your codes with CPUs and GPUs so that you can spend more time on real breakthroughs.
This tutorial uses PGI OpenACC compiler for C, C++, Fortran, along with tools from the PGI Community Edition, but we encourage you to find compilers and tools that fit your project requirements among a variety of products offered by OpenACC members.
1. Analyze
Analyze your code using profiling tools. Identify functions and loops that will run faster on GPUs. A generated baseline CPU profile shows where an executable is spending the most time. Check if some operations identified by the profiler have been already accelerated on GPUs through existing GPU libraries and then proceed with OpenACC directives.
2. Parallelize
Now you can begin exposing parallelism starting with the functions and loops that take the most time on a CPU. OpenACC compiler will run GPU parts of the code identified by directives or pragmas. Use #pragma acc parallel to initiate parallel execution, #pragma acc kernel and loop to execute a kernel or surrounding loops on a GPU.
3. Optimize
Optimizing data movements can bring a significant performance increase. Use loop optimizations to achieve even faster results. Note that if you use a Pascal GPU, data movements will be performed by the GPU itself without a need to add additional directives.
Resources
Tutorials, guides, online courses, books and more. Learn More



