The number of available accelerators featured in high-performance computing (HPC) systems is rapidly increasing. As this trend continues, compute nodes are expected to become more heterogeneous and complex than ever before. Consequently, applications will require the right set of programming models and tools to program these systems to take full advantage of their massive performance. Thus, it is important to answer the question of how to provide performance, portability, and programmability in current and future HPC systems and applications.
General-purpose graphic processing units (GPGPUs) have dominated the accelerators market in HPC. The first generation of applications for these systems has used application programming interfaces (APIs) that are vendor-specific (e.g., CUDA and HIP). However, these solutions lack portability. In order to support multiple systems, it is necessary to maintain different versions of the source code, one for each API. As the number of vendors and devices increases, and new features are introduced, programming languages and frameworks that offer an abstraction that can be used across vendors and devices become more attractive.
One Workshop: Multiple Approaches
The Ninth Workshop on Accelerator Programming Using Directives (WACCPD 2022), held in conjunction with SC22, has traditionally centered around programming these devices using directives, with an emphasis on programming frameworks like OpenACC and OpenMP. However, in recent years the workshop has extended its scope to include other alternatives that provide scalable and portable solutions without compromising application performance, encouraging the submission of novel ideas around the use of Fortran/C++, SYCL, DPC++, Kokkos, and RAJA for programming accelerated devices.
The 2022 workshop was chaired by Sridutt Bhalachandra from Lawrence Berkeley National Laboratory, Sunita Chandrasekaran from the University of Delaware, and Guido Juckeland from Helmholtz-Zentrum Dresden-Rossendorf. The organization was led by Christopher Daley from Lawrence Berkeley National Laboratory, Jose M Monsalve Diaz from Argonne National Laboratory, and Verónica G. Melesse Vergara from Oak Ridge National Laboratory, and brought together a diverse group of leaders in the field of HPC that resulted in an outstanding program with 28 authors from different parts of the world participating.
Of the high-quality submissions received, 6 were accepted for publication and presentation, featuring OpenACC, Kokkos, OmpSS, and OneAPI with a wide range of contributions in applications, programming frameworks interoperability, verification and validation, and software tools. Since reproducibility is an important aspect of the WACCPD workshop, the organizers emphasized the importance of accompanying submissions with Artifact Description (AD) and Artifact Evaluation (AE) outlined in the SC conference guidelines. Five of the 6 papers presented were granted the 3 ACM badges for reproducibility version 1.1: Artifacts Available, Artifacts Evaluated, and Results Reproduced.
A Closer Look at the Accepted Papers
Kicking off WACCPD 2022 was the workshop’s “Best Paper” award recipient “KokkACC: Enhancing Kokkos with OpenACC ” presented by Pedro Valero Lara from Oak Ridge National Laboratory. This work showcased the integration of OpenACC into the Kokkos back end for device offloading (i.e., GPGPUs). Kokkos is a template metaprogramming model that presents a high-level abstraction for representing parallelism while allowing the execution of code in multiple device backends. In Kokkos, the user describes the parallelism and the compiler is in charge of lowering the high-level metaprogramming abstraction to vendor-specific code. For a given code, it is possible to target multiple backends in different programming frameworks. In this case, the authors demonstrated matching performance between the OpenACC backend and the CUDA backend and competitive performance vs. the OpenMP backend. For a deeper look at this work, consider attending the upcoming webinar on September 12th presented by the author.
Another highlight of the workshop was “Extending MAGMA Portability with OpenAPI,” presented by Anna Fortenberry, an undergraduate student in the Department of Computer Science and Engineering at the University of North Texas. This paper explored a DPC++ implementation of the GEMM kernel in the library Matrix Algebra on GPU and Multicore Architectures (MAGMA) and studied a comparison of this version vs. other implementations such as Intel’s MKL, OpenMP, and CUDA. The result of a research opportunity at the University of Tennessee working with Professor Stan Tomov, this paper demonstrates the importance of research experience for undergraduate programs.
Other presentations included:
- “OmpSs-2 and OpenACC interoperation” which illustrates a methodology for the clear separation of roles between the two directive-based programming models.
- “Analysis of Validating and Verifying OpenACC Compilers 3.0 and Above” highlighting results from the OpenACC Verification and Validation Testsuite.
- “SPEL: Software tool for Porting E3SM Land Model with OpenACC in a Function Unit Test Framework” presenting a tool for generating GPU-ready test modules for the E3SM Land Model.
- “GPU-Accelerated Sparse Matrix Vector Product based on Element-by-Element Method for Unstructured FEM using OpenACC” discusses the acceleration of the computationally expensive Sparse Matrix-Vector Product (SpMV) using OpenACC.
Proceedings for WACCPD 2022 are now available.
Don’t Miss WACCPD 2023
Missed the opportunity to participate in the 2022 workshop? Don’t despair! The WACCPD workshop will once again take place at SC23 this November. The tenth edition of the workshop has a slight change in name Workshop on Accelerator Programming and Directives, emphasizing the inclusion of other programming languages and models for accelerator devices, as well as heterogeneous systems. The call for participation is now open and the deadline for submission is August 4, 2023.