GPU Technology Conference 2022

Meet the Experts Talks

GTC 2022 conference offers several opportunities to learn more about the intersection of HPC, AI and Data Science. Browse through a variety of talks, tutorials, and posters across topics such as OpenACC and programming languages, developer tools and industry-specific research and applications. View the on-demand recorded sessions after April 11, 2022. Learn more.

Connect with the Experts

Directive-Based GPU Programming with OpenACC

Monday, March 21 | 3:00 - 3:50 PM PDT

OpenACC is a programming model designed to help scientists and developers to start with GPUs faster and be more efficient by maintaining a single code source for multiple platforms. Ask our OpenACC experts how to start accelerating your code on GPUs, continue optimizing your GPU code, start teaching OpenACC, host or participate in a hackathon, and more! Connect with the Experts sessions are interactive sessions that give you a unique opportunity to meet, in either a group or one-on-one setting, with the brilliant minds behind NVIDIA’s products and research to get your questions answered.

Best Practices for Fortran on GPUs

Monday, March 21 | 11:00 - 11:50 AM PDT

NVIDIA software tools provide great support for direct programming and performance libraries for Fortran users. This Connect With Experts session will provide the latest information about all our programming model support as well as library interfaces found in the NVHPC software development kit. This includes established models like CUDA Fortran and OpenACC, but also more recent additions like OpenMP 4.5 and Fortran standard parallelism (StdPar). We'll also cover the mapping of Fortran standard intrinsics to NVIDIA performance libraries, as well as how to use CUDA library interfaces directly. You should be familiar with some dialect of Fortran, but don't need any knowledge of hardware architecture or any specific application domain.

GPU Performance Analysis and Optimization

Tuesday, March 22 at 3:00 - 4:00 AM PDT | Thursday, March 24 at 9:00 - 9:50 AM PDT

Come talk to experts in GPU programming and code optimization, share your experience with them, and get guidance on how to achieve maximum performance on NVIDIA's platform.

Talks

No More Porting: Coding for GPUs with Standard C++, Fortran, and Python

Jeff Larkin, HPC Architect, NVIDIA

Monday, March 21 | 9:00 - 9:50 AM PDT

CUDA C++, CUDA Fortran, and OpenACC are hugely successful approaches to GPU programming, but wouldn’t it be nice to write an application that can run on GPUs and multicore CPUs out of the box, without any additional APIs? The parallelism features available in ISO C++ and ISO Fortran enable developers to write their codes such that the baseline code is parallel and ready to run on any parallel platform they encounter. Using libraries like cuNumeric, Python developers can write to standard APIs like NumPy and scale to a full data center. We'll demonstrate the current state-of-the-art in writing application code that is parallel and ready to run on GPUs, CPUs, and more, using only C++, Fortran, and Python. See what’s possible and learn best practices in writing parallel code with standard language parallelism.

A Deep Dive into the Latest HPC Software

Timothy Costa, Group Product Manager, HPC Software, NVIDIA

Tuesday, March 22 | 10:00 - 10:50 AM PDT

Take a deep dive into the latest developments in NVIDIA software for high performance computing applications, including a comprehensive look at what’s new in programming models, compilers, libraries, and tools. We'll cover topics of interest to HPC developers, targeting traditional HPC modeling and simulation, quantum computing, HPC+AI, scientific visualization, and high-performance data analytics.

C++ Standard Parallelism

Bryce Adelstein Lelbach, Standard C++ Library Design Committee Chair, NVIDIA

Tuesday, March 22 | 2:00 - 2:50 PM PDT

Imagine writing parallel code that can run on any platform - CPUs, GPUs, DPUs, specialized accelerators, etc - without any language or vendor extensions, external libraries, or special compilation tools. It's no longer just a dream - you can do it today in Standard C++! Parallelism is increasingly common in software, from supercomputer simulations to mobile applications. But writing parallel code is increasingly challenging due to an explosion of diversity in hardware, a trend that's likely to continue. To meet this challenge, the C++ Committee has developed C++ Standard Parallelism, a parallel programming model for Standard C++ that is portable to all platforms, from your smartwatch to your supercomputer, and delivers reasonable performance and efficiency for most use cases. We'll dive into the roadmap for C++ Standard Parallelism and discuss what we already have today, what's coming down the line, and where the future may lead us.

JACC: Automatically Retargeting OpenACC Kernels for Multi-GPUs

Kazuaki Matsumura, Ph.D. Student, Barcelona Supercomputing Center

Thursday, March 24 | 7:00 - 7:25 AM PDT

Rapid development in computing technology has paved the way for directive-based programming models toward a principal role in maintaining software portability of performance-critical applications. However, optimizations are often challenging. We'll introduce JACC, an OpenACC runtime framework that enables the dynamic extension of OpenACC programs by serving as a transparent layer between the program and the compiler. We add a versatile code-translation method for multi-device utilization by which manually optimized applications can be distributed automatically while keeping original code structure and parallelism. We'll show nearly linear scaling in some cases on the part of kernel execution with the NVIDIA V100 GPUs. While adaptively using multi-GPUs, the resulting performance improvements amortize the latency of GPU-to-GPU communications.

Using OpenACC to Accelerate Wave Propagation Simulations Combining Equation-based and Data-driven Methods

Kohei Fujita, Associate Professor, The University of Tokyo

Wednesday, March 23 | 6:00 - 6:25 PM PDT

As an example of porting HPC applications comprising both equation-based and data-analytics methods to GPUs with directive-based parallel programming model OpenACC, we port an implicit solver with neural-network (NN) type preconditioner for large-scale wave propagation simulations. The target solver attains fast time-to-solution without loss of accuracy by converting equation-based computations to NN-type computations that are more suitable for recent computer architecture in a preconditioner of an iterative solver. Such an algorithm is especially suitable for GPUs; we attained 64.4% of FP64 hardware peak on A100 GPUs for the whole solver. On a compute node with a 14-fold difference between CPU and GPU hardware FLOP peak (eight A100 GPUs and dual 36-core Xeon CPUs), a 158-fold speedup was obtained using GPUs when compared to a conventional equation-based solver running on all CPU cores.

Multi-GPU Programming with MPI (a Magnum IO Session)

Jiri Krauss, Principal DevTech Compute, NVIDIA

Tuesday, March 22 | 1:00 - 1:50 PM PDT

Learn how to program multi-GPU systems or GPU clusters using the message-passing interface (MPI) and OpenACC or NVIDIA CUDA. We'll start with a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Then we'll cover advanced topics like CUDA-aware MPI and how to overlap communication with computation to hide communication times. We will also cover the latest improvements with CUDA-aware MPI, the multi-process service, and MPI support in NVIDIA performance analysis tools.

Parallel Scalability Optimization of a GPU-enabled Computational Fluid Dynamics Solver

Nicholson Koukpaizan, Postdoctoral Research Associate, Oak Ridge National Laboratory

Thursday, March 24 | 7:00 - 7:25 AM PDT

We present (parallel) performance analysis and optimization for a GPU-enabled computational fluid dynamics (CFD) code. The code is written in free format Fortran 90, solves Navier-Stokes equations on block-structured grids using a finite-volume formulation, and uses MPI for parallelism. Earlier work described the relevant OpenACC directives to target NVIDIA GPUs and reported the code performance up to 2,082 GPUs on the Summit supercomputer at the Oak Ridge Leadership Computing Facility. Recent developments have focused on improving the parallel efficiency to compute on more and more GPUs and comparing the OpenACC implementation with an OpenMP Offloading implementation for additional flexibility when evaluating different compilers or GPU architectures. We demonstrate a 10-15% performance gain by leveraging the GPU Direct and CUDA-aware MPI on the Summit supercomputer, and we'll discuss approaches such as overlapping communication and computation for strong scaling.

From Directives to DO CONCURRENT: A Case Study in Standard Parallelism

Miko Stulajter, Solar Physics Intern, Predictive Science, Inc.

Thursday, March 24 | 1:00 - 1:25 PM PDT

Recently, there's been a growing interest in using standard language constructs (for example, Fortran's DO CONCURRENT) for accelerated computing as an alternative to directive-based APIs (such as OpenMP and OpenACC). These constructs hold the potential to be more portable and reduce the length of codes, and NVIDIA’s HPC SDK already supports Fortran's DO CONCURRENT on the CPU and GPU. We'll look at the current capabilities, portability, and performance of replacing directives with DO CONCURRENT using two real-world applications. We replace as many directives as possible with DO CONCURRENT, testing various configurations and compiler options. We also test the portability with other compilers. We find that with NVIDIA's nvfortran many directives can be replaced without loss of performance or portability. We discuss limitations that might apply to more complicated codes, and future language additions that could mitigate them.

Shifting through the Gears of GPU Programming: Understanding Performance and Portability Trade-offs

Jeff Hamond, Principal Engineer, NVIDIA

Thursday, March 24 | 8:00 - 8:25 AM PDT

We'll show implementations of standard linear algebra algorithms in a range of programming models, including standard language parallelism, directives/pragmas, and CUDA, and how the performance and productivity varies across these. The Fortran, C, C++ and Python languages will all be covered, so that programmers can understand the trade-offs within a given base language, as well as across them.

Advanced Technologies and Techniques for Debugging CUDA GPU Applications

Nikolay Piskun, Director of Continuing Engineering, TotalView Products, Perforce Software

Thursday, March 24 | 1:00 - 1:50 PM PDT

Debugging and analyzing NVIDIA GPU-based software requires a tool that supports the demands of today’s complex CUDA applications. Not only must debuggers handle the extensive use of C++ templates, STL, the use of many shared libraries, and debugging optimized code, they also need to seamlessly support debugging both host and GPU code, debug Python and C/C++ mixed-language applications, and scale to the complexities of today’s multi-GPU supercomputers, such as Perlmutter. We'll exhibit the advanced technologies from the TotalView debugger, and how they are used to analyze and debug complex CUDA applications so that code is easily understood and difficult problems are quickly solved. Using TotalView’s intuitive user interface, you'll learn how to debug multi-GPU environments, leverage the new GPU Status View to see how your code is running on the GPU, debug OpenACC applications, and see a unified debugging view for Python applications that leverage C++ Python extensions.

First hands-on experiences using the NVIDIA Arm HPC Developer Kit

Filippo Spiga, Developer Relations Manager, NVIDIA | Ross Miller, Software Developer at the National Center for Computational Sciences, Oak Ridge National Laboratory | Sanjay Wandhekar, Senior Director , C-DAC | Jon Wakelin, Team Lead for Research Computing , University of Leicester | Steve Poole, Chief Architect of Next Generation Platforms , Los Alamos National Laboratory

Wednesday, March 23 | 10:00 - 10:50 AM PDT

The NVIDIA HPC developer kit allows the HPC community to prepare for future NVIDIA ARM CPUs and other ARM CPU-plus-GPU deployments. Dozens of institutions are currently working with the devkit to port and optimize a wide range of software and identify potential challenges. Some of these early adopters will present their positive and negative results, share wisdom, and help you understand what to do next.