Beyond Directives: Unveiling the Latest Trends in Accelerator Programming at WACCPD 2022

2023

Beyond Directives: Unveiling the Latest Trends in Accelerator Programming at WACCPD 2022

The number of available accelerators featured in high-performance computing (HPC) systems is rapidly increasing. As this trend continues, compute nodes are expected to become more heterogeneous and complex than ever before. Consequently, applications will require the right set of programming models and tools to program these systems to take full advantage of their massive performance. Thus, it is important to answer the question of how to provide performance, portability, and programmability in current and future HPC systems and applications.

General-purpose graphic processing units (GPGPUs) have dominated the accelerators market in HPC. The first generation of applications for these systems has used application programming interfaces (APIs) that are vendor-specific (e.g., CUDA and HIP). However, these solutions lack portability. In order to support multiple systems, it is necessary to maintain different versions of the source code, one for each API. As the number of vendors and devices increases, and new features are introduced, programming languages and frameworks that offer an abstraction that can be used across vendors and devices become more attractive.

One Workshop: Multiple Approaches

The Ninth Workshop on Accelerator Programming Using Directives (WACCPD 2022), held in conjunction with SC22, has traditionally centered around programming these devices using directives, with an emphasis on programming frameworks like OpenACC and OpenMP. However, in recent years the workshop has extended its scope to include other alternatives that provide scalable and portable solutions without compromising application performance, encouraging the submission of novel ideas around the use of Fortran/C++, SYCL, DPC++, Kokkos, and RAJA for programming accelerated devices.

The 2022 workshop was chaired by Sridutt Bhalachandra from Lawrence Berkeley National Laboratory, Sunita Chandrasekaran from the University of Delaware, and Guido Juckeland from Helmholtz-Zentrum Dresden-Rossendorf. The organization was led by Christopher Daley from Lawrence Berkeley National Laboratory, Jose M Monsalve Diaz from Argonne National Laboratory, and Verónica G. Melesse Vergara from Oak Ridge National Laboratory, and brought together a diverse group of leaders in the field of HPC that resulted in an outstanding program with 28 authors from different parts of the world participating.

Of the high-quality submissions received, 6 were accepted for publication and presentation, featuring OpenACC, Kokkos, OmpSS, and OneAPI with a wide range of contributions in applications, programming frameworks interoperability, verification and validation, and software tools. Since reproducibility is an important aspect of the WACCPD workshop, the organizers emphasized the importance of accompanying submissions with Artifact Description (AD) and Artifact Evaluation (AE) outlined in the SC conference guidelines. Five of the 6 papers presented were granted the 3 ACM badges for reproducibility version 1.1: Artifacts Available, Artifacts Evaluated, and Results Reproduced.

A Closer Look at the Accepted Papers

Kicking off WACCPD 2022 was the workshop’s “Best Paper” award recipient “KokkACC: Enhancing Kokkos with OpenACC ” presented by Pedro Valero Lara from Oak Ridge National Laboratory. This work showcased the integration of OpenACC into the Kokkos back end for device offloading (i.e., GPGPUs). Kokkos is a template metaprogramming model that presents a high-level abstraction for representing parallelism while allowing the execution of code in multiple device backends. In Kokkos, the user describes the parallelism and the compiler is in charge of lowering the high-level metaprogramming abstraction to vendor-specific code. For a given code, it is possible to target multiple backends in different programming frameworks. In this case, the authors demonstrated matching performance between the OpenACC backend and the CUDA backend and competitive performance vs. the OpenMP backend. For a deeper look at this work, consider attending the upcoming webinar on September 12th presented by the author.

Recipients of the “Best Paper” Award at WACCPD 2022 — Recipients of the "Best Paper" Award at WACCPD 2022. Pictured from left to right: Seong Lee (ORNL), Joel Denny (ORNL), Verónica G. Melesse Vergara (ORNL), Pedro Valera-Lara (ORNL) and Marc Gonzalez-Tallada (ORNL). Not shown is Jeffrey S. Vetter (ORNL).

Another highlight of the workshop was “Extending MAGMA Portability with OpenAPI,” presented by Anna Fortenberry, an undergraduate student in the Department of Computer Science and Engineering at the University of North Texas. This paper explored a DPC++ implementation of the GEMM kernel in the library Matrix Algebra on GPU and Multicore Architectures (MAGMA) and studied a comparison of this version vs. other implementations such as Intel’s MKL, OpenMP, and CUDA. The result of a research opportunity at the University of Tennessee working with Professor Stan Tomov, this paper demonstrates the importance of research experience for undergraduate programs.

Don’t Miss WACCPD 2023

Missed the opportunity to participate in the 2022 workshop? Don’t despair! The WACCPD workshop will once again take place at SC23 this November. The tenth edition of the workshop has a slight change in name Workshop on Accelerator Programming and Directives, emphasizing the inclusion of other programming languages and models for accelerator devices, as well as heterogeneous systems. The call for participation is now open and the deadline for submission is August 4, 2023.

Author

Jose M. Monsalve Diaz

Jose M. Monsalve Diaz is a postdoctoral appointee at Argonne National Laboratory working on exploring innovative ideas for future computer architectures. He obtained his Ph.D. and Master in Electrical and Computer Engineering from the University of Delaware. Throughout the years, he has worked as a research assistant of the CAPSL research group for Prof. Guang R. Gao, and the CRPL research group for Prof. Sunita Chandrasekaran. His areas of interest are parallel computer architecture design, parallel computer systems, and parallel programming models. He has worked on the validation and verification of OpenMP target offloading, as well as with OpenACC programming targeting CPU and heterogeneous systems based on GPGPUs. Other projects also involved unconventional Data-flow based programming models and computer architectures.