The Open Accelerated Computing Summit reflects our Organization's evolution and commitment to the helping the research and developer community advance science by expanding their accelerated and parallel computing skills.
This year's Summit will bring together preeminent researchers across national laboratories, research institutions and supercomputing centers worldwide to discuss work that aligns with our Organization's focus areas of developing and utilizing the OpenACC directives-based programming model to port, accelerate, or optimize scientific applications; sharing experiences and lessons learned from participating in a hackathon or bootcamp training event; and participating in ecosystem development through work that enables parallel programming models in compilers or tools and advances performance interoperability.
The 2023 Summit is scheduled October 4th through the 5th, and will include two keynote speakers, invited talks, tutorials, and interactive panel discussions. Agenda will be updated regularly and is subject to change.
KeynotesThe Long but “Straight” Road towards Integrations of Simulations, Data, and Learning on Oakforest - PACS II
Kengo Nakajima, University of Tokyo
Recently, supercomputing is changing dramatically. Integration and convergence of simulation, data, and learning (S+D+L) is important as we move towards Society 5.0, a concept proposed by the Japanese Government that enables the integration of cyberspace and physical space. In 2015, we started the Big Data & Extreme Computing (BDEC) project to develop supercomputers and software for the integration of (S+D+L). In May 2021, we began operation of Wisteria/BDEC-01, the first BDEC system to consist of computing nodes for computational science and engineering with A64FX (Odyssey) and those for Data Analytics/AI with NVIDIA A100 GPUs (Aquarius).
Additionally, we developed a software platform “h3-Open-BDEC” for integration of (S+D+L) on Wisteria/BDEC-01, designed to extract the maximum performance from the supercomputers with minimum energy consumption by focusing on (1) Innovative methods for numerical analysis by adaptive precision, accuracy verification, and automatic tuning; (2) Hierarchical Data-Driven Approach based on machine learning, and (3) Software for heterogeneous systems. Integration of (S+D+L) by h3-Open-BDEC enables significant reduction of computations and power consumption, compared to those by conventional simulations. In January 2025, we will start to operate the Oakforest-PACS II system (OFP-II) together with the University of Tsukuba. OFP-II will consist of NVIDIA H100 nodes with a total peak performance of 100-150 PFLOPS. This is our next platform for integration of (S+D+L). Since October 2022, we started support for our users to migrate their applications to the OFP-II with GPUs in collaboration with NVIDIA. This talk will describe and discuss our activities in the integration of (S+D+L) and efforts towards OFP-II.
Thomas Schulthess, ETH Zurich / The Swiss National Supercomputing Center (CSCS)
Compiler directives, like OpenACC and OpenMP, have played an important role in accelerated scientific computing, especially for imperative languages like Fortran. Directives typically provide a parallel programming model for legacy software, making them useful to explore new architectural features in a wide range of scientific applications. They are also thought to enhance portability of software. However, to achieve optimal performance on different architectures, algorithms and thus imperative implementations need to be changed. Sustainable, performance portable software development is thus better served with parallel programing constructs in standard languages (e.g. C++ or Fortran) or with descriptive programming models that can be implemented in Python with domain specific libraries. As these new technologies continue to mature, it will be natural and advantageous for the role played by directives to also evolve in managing the expression of parallelism in scientific software. This talk presents my views on the role of compiler directives in scientific computing moving forward into the future.
TutorialsGPU Programming with Python Tutorial
Aswin Kumar, NVIDIA and Mozhgan Kabiri Chimeh, NVIDIA
This two-hour tutorial covers the basics of GPU programming and provides an overview of how to port Python-based scientific applications to GPUs using CuPy and Numba. Throughout the session, attendees will learn how to analyze and accelerate Python codes on GPUs and apply these concepts to real-world problems. Participants will be given access to hands-on exercises and expert help during and after the tutorial. This tutorial will be held two times, once in an Asia-Pacific-friendly time zone and again in a North America and Europe-friendly time zone.
PanelsThe Accelerated Evolution of Programming
Jeff Larkin, Moderator,OpenACC Chair of the OpenACC Technical Committee, NVIDIA
The way we program computers is evolving at an accelerated pace. Classical programming languages are expanding, new programming languages emerging, and fundamentally different paradigms are on the horizon. This panel will discuss the historical context, rapid development, and future direction of how computers are programmed. Experts will share their insights on how classical programming languages and models have progressed to lay the groundwork for new approaches, examine newly emerged models and developer enablement, and lastly, look towards emergent areas such as Quantum Computing and Large Language Models and how they could completely change the relationship between the programmer and computer.Towards Sustainable Computing Competence through Mentorship
Open Hackathons help researchers and computational scientists advance science by pairing them with expert mentors to work collaboratively on AI and HPC applications. Our mentors are at the core of the hackathons’ and teams’ success.
This community-driven mentor panel will discuss the evolution of mentorship given the rapid advancement of high-performance systems used to accelerate traditional HPC as well as AI. We'll discuss different ways of examining and addressing diverse and newly emerged workloads that are participating in Hackathons. Lastly, we'll discuss the benefits of being part of a mentor program and what additional steps will help existing mentors enhance their knowledge, increase networking opportunities, and inspire new mentors.
Lucas Gasparino Ferreira da Silva, Barcelona Supercomputing Center
SOD2D is a code developed at BSC-CNS for performing large eddy simulation (LES) and direct numerical simulation (DNS) of compressible turbulent flows in realistic complex cases, with a particular focus on the aviation industry. It employs a high-order Spectral continuous Galerkin, SEM for short, discretization in conjunction with the entropy viscosity method adapted to SEM to achieve high accuracy and robustness at a reasonable computational cost. The aim of this project is to try and bridge the gap between industry needs and highly accurate flow simulations, which requires the use of high-fidelity models and large meshes. This in turn requires the use of HPC systems, which are increasingly relying on GPUs to achieve the required performance.
In this talk, we will present how we used OpenACC to allow our scale-resolving CFD code to run efficiently on multiple GPUs. Using OpenACC to port most of the code to GPUs (in particular, all of the compute algorithm), we were able to achieve excellent performance on NVIDIA GPUs, including the latest A100 and H100 architectures. As a preview of some of the most striking results, we were able to assess that a single H100 GPU can run a heavily refined mesh (~ 48M nodes) composed of 3rd order hexahedra at a cost of about 370ms per time-step, a feat that would require several nodes of the MareNostrum4 supercomputer at BSC-CNS. We will also present scalability data demonstrating how well the code performs when running on multiple GPUs, both when communicating data using MPI and NVIDIA Collective Communications Library (NCCL) in tandem with OpenACC directives.Porting an OpenACC Fortran HPC Code to the AMD GPU-Based Frontier System
Igor Sfiligoi, University of California, San Diego
NVIDIA has been the main provider of GPU hardware in HPC systems for over a decade. Many applications that benefit from GPUs have thus been developed and optimized for the NVIDIA software stack. Recent exascale HPC systems are, however, introducing GPUs from other vendors, such as the AMD GPU-based Oak Ridge Leadership Computing Facility (OLCF) Frontier system recently becoming available. AMD GPUs cannot be directly accessed using the NVIDIA software stack and require a porting effort by the application developers.
This talk provides an overview of our experience porting and optimizing the CGYRO code, a widely-used fusion simulation tool based on Fortran with OpenACC-based GPU acceleration. While the porting from the NVIDIA compilers was relatively straightforward using the CRAY compilers on the AMD systems, the performance optimization required more fine-tuning. In the optimization effort, we uncovered code sections that had performed well on NVIDIA GPUs but were unexpectedly slow on AMD GPUs. After AMD-targeted code optimizations, performance on AMD GPUs has increased to meet our expectations. Modest speed improvements were also seen on NVIDIA GPUs, which was an unexpected benefit of this exercise.GPU-Acceleration of the WEST Code for Large-Scale Many-Body Perturbation Theory
Victor Yu, Argonne National Laboratory
Many-body perturbation theory (MBPT) is a powerful method for simulating electronic excitations in molecules and materials. In this talk, we present a massively parallel, GPU-accelerated implementation of MBPT in the WEST code (www.west-code.org). Outstanding performance and scalability are achieved by employing a hierarchical parallelization strategy, nonblocking MPI communications, and mixed precision in selected portions of the code. The capability of the GPU version of WEST is demonstrated by large-scale MBPT calculations using up to 25,920 GPUs. Finally, we delve into our experience of switching our GPU programming model from CUDA to OpenACC, which is enabling us to attain enhanced performance portability.Clacc: OpenACC, Clang/LLVM, and Kokkos
Joel Denny, OpenACC Co-Chair of the Technical Committee, Oak Ridge National Laboratory
Clacc has developed OpenACC compiler, runtime, and profiling support for C and C++ by extending Clang and LLVM under the Exascale Computing Project (ECP). OpenACC support in Clang and LLVM can facilitate the programming of GPUs and other accelerators in HPC applications and provide a popular compiler platform on which to perform research and development for related optimizations and tools for heterogeneous computing architectures. A key Clacc design decision is to translate OpenACC to OpenMP to leverage the OpenMP offloading support that is actively being developed for Clang and LLVM. A benefit of this design is support for two compilation modes: a traditional compilation mode that translates OpenACC source to an executable, and a source-to-source mode that translates OpenACC source to OpenMP source. Clacc is hosted publicly on GitHub as part of the LLVM Department of Energy (DOE) Fork maintained at Oak Ridge National Laboratory (ORNL) (https://github.com/llvm-doe-org/llvm-project/wiki).
This talk presents the latest developments in the Clacc project as well as future plans in light of the end of ECP later this year. We will cover topics including recent Clacc support for OpenACC in C++, support for KokkACC (a new OpenACC backend for Kokkos which won the best paper award at WACCPD 2022), and a general summary of Clacc's current OpenACC feature support. We will also invite the community to give feedback on their interest in seeing OpenACC support in the LLVM ecosystem going forward.Porting CaNS Using OpenACC for Fast Fluid Dynamics Simulations at Scale
Pedro Costa, Delft University of Technology
Direct numerical simulations of the Navier-Stokes equations have greatly enhanced our understanding of turbulent fluid flows, impacting numerous environmental and industrial applications. Still, important unresolved issues remain, requiring massive computing power that just became within reach with GPU computing at scale.
This talk focuses on the GPU porting effort of the numerical solver CaNS, a code for fast massively parallel simulations of canonical fluid flows that has gained popularity throughout the years. CaNS is written in modern Fortran and recently featured a fresh GPU porting using OpenACC and a hardware-adaptive pencil domain decomposition library. We exploited OpenACC directives for host/device data movement, loop offloading, asynchronous kernel launching, and interoperability with CUDA and external GPU libraries. More importantly, in the porting process, several interesting practices have been found where standard Fortran combined with OpenACC secures a sustainable and flexible implementation while retaining the efficiency of the numerical tool. The exchange between domain-specific and GPU computing experts was key to the success of this effort.
We will cover the high-level implementation and performance of CaNS, including how OpenACC enabled swift interoperability with the external libraries that enabled a hardware-adaptive implementation. Additionally, we will discuss fine implementation details, highlighting simple yet impactful approaches applicable to other applications which have not been widely documented. Finally, we will demonstrate performance and show how CaNS is being used on the supercomputer Leonardo to simulate wall turbulence at unprecedented Reynolds numbers (i.e., high flow speeds).