It’s been a year since we announced the completion of OpenACC 3.1 and I’m pleased to announce that we have now completed version 3.2. For the past several OpenACC releases we have taken the approach of releasing on an annual cadence so that our releases aren’t held up by unfinished business and so that our implementers don’t have to read and understand monolithic changes upon release. It’s sometimes heard on OpenACC technical calls that we’d rather miss a release to get the feature right than hold up releasing things that are complete just to finish one more thing.
Delve deeper into breakthroughs fueled by the most transformative technologies of our time.
NVIDIA’s GPU Technology Conference (GTC) is right around the corner, promising to be one the premier conferences of 2021.
A year ago the OpenACC organization put out version 3.0 of the specification, a major upgrade that, among other things, moved forward the support for our base languages (C, C++, and Fortran) to their latest versions. The technical committee didn’t stop working though, and I’m pleased to announce the release of OpenACC 3.1 for November 2020. It’s hard to follow a major release like 3.0, but I believe the changes we made this year will help to make OpenACC implementations better, more interoperable, and easier to use with modern C++ and Fortran.
We have brought the Data Encryption Standard (DES) block cipher out of retirement for a second career as a Pseudo Random Number Generator (PRNG). DES PRNG is intended for simulations that benefit from PRN generation at the granularity of lightweight (GPU) threads. An example of this type of simulation is Particle-in-Cell (PIC) with Monte-Carlo Collisions (MCC). In PIC-MCC a large set of charged particles are time evolved in self-consistent electromagnetic fields, with particle collisions against background neutral species modelled as probabilistic events.
OpenACC has provided a high-level option for GPU programmers for years. Application developers interested in GPU-accelerated performance without the details, complications, and overhead of programming in a language, such as CUDA, have found OpenACC to be an attractive solution. However, OpenACC's potential as an efficient option for other types of accelerators, such as Field Programmable Gate Arrays (FPGAs), is still under exploration.
In this blog article I describe the recent work in optimizing the computational performance of structural analysis software with OpenACC directives. Structural analysis of buildings with high-fidelity is computationally expensive in terms of required memory and time. As designers and “hazard engineers” sacrifice some accuracy by moving to low-fidelity simulations, the need for faster computing has risen in the construction industry and for regional structural assessment.
When OpenACC 3.0 was released in November 2019 the most exciting feature, in my opinion at least, is actually one that might easily be overlooked: updating our base languages. If you’re not familiar with this term, the base languages are the programming languages we, as a directive-based parallel programming model, support, namely C, C++, and Fortran. When we released OpenACC 1.0 in November of 2011 the most important programming languages in scientific and high performance computing were C99, C++98, and Fortran 2003.
Something that started as a preliminary investigation for an undergraduate class project has become the cover of a prestigious journal publication.
Discover the OpenACC talks, training and posters featured at GTC Digital
Accelerated computing is fueling some of the most exciting scientific discoveries today. For scientists and researchers seeking faster application performance, OpenACC’s directive-based programming model provides a simple yet powerful approach to accelerators without significant programming effort. With OpenACC, a single version of the source code will deliver performance portability across platforms.
The ever-increasing heterogeneity in supercomputing applications has given rise to complex compute node architectures offering multiple, heterogeneous levels of massive parallelism. Exploiting the maximum available parallelism out of such systems necessitates sophisticated programming approaches that can provide scalable and portable solutions without compromising on performance.