GTC 2019 conference offers several opportunities for learning about OpenACC and collaborating with your fellow OpenACC users through a variety of talks, tutorials, posters, and meet-the-experts hangouts. In addition, you're invited to socialize with others interested in OpenACC at the OpenACC User Group meeting on a Tuesday night.

User Group Meeting

Tuesday, March 19th, 7:00-10:00PM, San Jose CA, Mosaic Restaurant.

The OpenACC User Group meets a few times a year during key HPC events to discuss training, provide feedback on the specification, collaborate on OpenACC-related research and activities, share experiences and best practices and have a good time with great company! Join us March 19th at GTC19 - food and drinks are on us. Seating is limited, please register in advance to attend.

GPU Bootcamp

Sunday, March 17th, 9:00-18:00, NVIDIA Endeavor, Santa Clara, CA

GPU bootcamp is a free full day event designed to teach scientists and researchers how to start quickly accelerating codes on GPUs. Participants will be introduced to available libraries, programming models, and platforms and will learn the basics of GPU programming through extensive hands-on collaboration based on real-life codes using the OpenACC programming model. Please apply to participate.

Connect with the Experts

Monday, March 18 - Wednesday, March 20, one hour daily

This session is designed for anyone who is either looking to start with GPUs or already accelerating their code with OpenACC on GPUs or CPUs. Join OpenACC experts and your fellow OpenACC developers to get an expert advice, discuss your code and learn how OpenACC Directives are used by others. Detailed schedule.

Talks

S9279 - BoF: OpenACC Programming Model - User Stories, Vendor Reaction, Relevance, and Roadmap

Tuesday, March 19th, 16:00 - 16:50,  Room 210F

Learn about the roadmaps and latest developments in the OpenACC specification, a directive-based high-level parallel programming model that has gained momentum among scientific application users. OpenACC is an entry-level programming model for the top 500 supercomputers such as Summit, Sunway Taihulight, and Piz Daint. The user-friendly programming model has facilitated acceleration on multiple platforms of over 130 applications, including CAM, ANSYS Fluent, Gaussian, VASP, and Synopsys. As part of our talk, we'll invite scientists, programmers, and researchers to discuss their experiences in adopting OpenACC for scientific applications.

S9275 - Panel: Quo Vadis Programmer - Which Accelerator Programming Ecosystem to Choose?

Thursday March 21st, 16:00 - 16:50, Room 220C

Emerging heterogeneous systems are opening up tons of programming opportunities. This panel will discuss the latest developments in accelerator programming where the programmers have a choice among OpenMP, OpenACC, CUDA, Kokkos or Alpaka for GPU programming. This panel will throw light on what would be the primary objective(s) for a choice of model, whether its availability across multiple platforms, its rich feature set or its applicability for a certain type of scientific code or compilers' stability or other factors. This will be an interactive Q/A session where participants can discuss their experiences with programming model experts and developers.

S9288 - Porting MURaM (Max Planck University of Chicago Radiative MHD) to GPUs using OpenACC

Tuesday, March 19th, 13:00 - 13:55 – Hilton Market 120

We'll discuss the Max Planck/University of Chicago Radiative MHD code (MURaM), the primary model for simulating the sun's upper convection zone, its surface, and the corona. Accelerating MURaM allows physicists to interpret high-resolution solar observations. We'll describe the programmatic challenges and optimization techniques we employed while using the OpenACC programming model to accelerate MURaM on GPUs and multicore architectures. We will also examine what we learned and how it could be broadly applied on atmospheric applications that demonstrate radiation-transport methods.

S9277 - OpenACC-based GPU Acceleration of Chemical Shift Prediction.

Tuesday, March 19th, 15:00 - 15:50 – Room 211B-D

The chemical shift of a protein structure offers a lot of information about the physical properties of the protein. Being able to accurately predict this shift is essential in drug discovery and in some other areas of molecular dynamics research. But because chemical shift prediction algorithms are so computationally intensive, no application can predict chemical shift of large protein structures in a realistic amount of time. We explored this problem by taking an algorithm called PPM_One and ported it to NVIDIA V100 GPUs using the directive-based programming model, OpenACC. When testing several different protein structures of datasets ranging from 1M to 11M atoms we observed ~45X average speedup between the datasets and a maximum of a 61X speedup. We'll discuss techniques to overcome programmatic challenges and highlight the scientific advances enabled by the model OpenACC.

S9263 - Taming the Hydra: Multi-GPU Programming with OpenACC

Thursday, March 21st, 10:00 - 10:50 – Room 210F

As GPU computing nodes begin packing in an increasing number of GPUs, programming to maximize performance across GPUs in a system is becoming a challenge. We'll discuss techniques to extend your GPU applications from using one GPU to using many GPUs. By the end of the session, you'll understand the relative trade-offs in each of these approaches and how to choose the best approach for your application. Some prior OpenACC or GPU computing experience is recommended for this talk.

S9626 - Optimizing Large Reductions in BerkeleyGW with CUDA, OpenACC , OpenMP 4.5 and Kokkos

Wednesday, March 20th, 16:00 - 16:50 – Room 210G

Learn how to optimize large complex-number reductions in material science code BerkeleyGW on NVIDIA GPUs. Our talk will showcase two BerkeleyGW kernels implemented with four frameworks — CUDA, OpenACC, OpenMP 4.5, and Kokkos. We'll share optimization techniques used to achieve decent performance across all four implementations. We'll also report on the status of OpenACC and OpenMP 4.5 compilers and compare the performance portability capabilities of OpenACC, OpenMP 4.5, and Kokkos.

S9731 - Combining Machine Learning and GPU Acceleration to Transform Atmospheric Science

Tuesday, March 19th, 15:00 - 15:50, Hilton Market 120

Scientific model performance has begun to stagnate over the last decade due to plateauing core speeds, increasing model complexity, and mushrooming data volumes. Learn how our team at the National Center for Atmospheric Research is pursuing an end-to-end hybrid approach to surmounting these barriers. We'll discuss how combining ML-based emulation with GPU acceleration of numerical models can pave the way toward new scientific modeling capabilities. We'll also detail our approach, which uses machine learning and GPU acceleration to produce what we hope will be a new generation of ultra-fast meteorological and climate models that provide enhanced fidelity with nature and increased value to society.

S9378 - Advanced Technologies and Techniques for Debugging CUDA GPU HPC Applications

Monday, Mar 18th, 11:00 - 11:50 – Room 210F

Debugging and analyzing NVIDIA GPU-Based HPC applications requires a tool that supports the demands of today's complex CUDA applications. Debuggers must deal with the extensive use of C++ templates, STL, many shared libraries, and debugging optimized code. They need to seamlessly support debugging both host and GPU code, Python, and C/C++ mixed-language applications. They must also scale to the complexities of today's multi-GPU cluster supercomputers such as Summit and Sierra. We'll discuss the advanced technologies provided by the TotalView for an HPC debugger and explain how they're used to analyze and debug complex CUDA applications to make code easily understood and to quickly solve difficult problems. We'll also show TotalView's new user interface. Learn how to easily debug multi-GPU environments and OpenACC, and see a unified debugging view for Python applications that leverage C++ Python extensions such as TensorFlow.

S9594 - Bringing State-of-the-Art GPU-Accelerated Molecular Modeling Tools to the Research Community

Wednesday, March 20th, 10:00 - 10:50 – Hilton San Carlos 91

We'll showcase the latest successes with GPU acceleration of challenging molecular simulation analysis tasks on the latest Volta and Turing GPUs paired with both Intel and IBM/OpenPOWER CPUs on petascale computers such as ORNL Summit. This presentation will highlight the performance benefits obtained from die-stacked memory, NVLink interconnects, and the use of advanced features of CUDA such as just-in-time compilation to increase the performance of key analysis algorithms. We will present results obtained with OpenACC parallel programming directives, as well as discuss current challenges and future opportunities. We'll also describe GPU-Accelerated machine learning algorithms for tasks such as clustering of structures resulting from molecular dynamics simulations. To make our tools easy to deploy for non-tradtional users of HPC, we publish GPU-Accelerated container images in NGC, and Amazon EC2 AMIs for GPU instance types.

S9770 - C++ Standard Parallel Algorithms for NVIDIA GPUs

Wednesday, March 20th, 10:00 - 10:50 – 210G

We'll discuss the C++17 parallel algorithms, which were designed to support GPU parallel programming. They include parallel versions of many existing algorithms, and a few new algorithms designed for efficient parallel execution of scans and reductions. The PGI C++ compiler has implemented these parallel algorithms for NVIDIA GPUs, making it possible in some cases to run standard C++ on GPUs with no directives, pragmas, or annotations. We will share our experiences and performance results for several of the parallel algorithms. We'll also explain the capabilities of the PGI implementation relative to CUDA, Thrust, and OpenACC.

S9665 - Acceleration of an Adaptive Cartesian Mesh CFD Solver in the Current Generation Processor Architectures

Wednesday, March 20th, 13:00 - 13:50 – Hilton Hotel San Carlos Room

We'll explore the challenges of accelerating an adaptive Cartesian mesh CFD Solver, PARAS-3D, in existing CPUs and GPUs. The memory-bound nature of CFD codes is an obstacle to higher performance, and the opt-tree structure of adaptive Cartesian meshes adds the challenge of data parallelism. Cartesian mesh solvers have higher memory bandwidth requirements due to their larger and varying stencil. We'll detail how redesigning and implementing a legacy Cartesian mesh CFD solver and improving algorithms and data structures helped us achieve higher performance in CPUs. We'll also explain how we used a structure of array-based data layout and GPU features like Unified Memory and Multi Process Service to improve GPU performance over a CPU-only version.

S9476 - MVAPICH2-GDR: High-Performance and Scalable CUDA-Aware MPI Library for HPC and AI

Tuesday, March 19th, 15:00 - 15:50 – Room 211A-C

Learn about advanced features in MVAPICH2 that accelerate HPC and AI on modern dense GPU systems. We'll talk about how MVAPICH2 supports MPI communication from GPU memory and improves it using the CUDA toolkit for optimized performance on different GPU configurations. We'll examine recent advances in MVAPICH2 that support large message collective operations and heterogeneous clusters with GPU and non-GPU nodes. We'll explain how we use the popular OSU micro-benchmark suite, and we'll provide examples from HPC and AI to demonstrate how developers can take advantage of MVAPICH2 in applications using MPI and CUDA/OpenACC. We'll also provide guidance on issues like processor affinity to GPUs and networks that can significantly affect the performance of MPI applications using MVAPICH2.

S9289 - PGI Compilers, The NVIDIA HPC SDK: Updates for 2019

Thursday, March 21st, 10:00 - 10:50 – Room 211A-C

Come hear the latest PGI news and learn about what we'll develop in the year ahead. We'll talk about the latest PGI OpenACC Fortran/C++ and CUDA Fortran compilers and tools, which are supported on x64 and OpenPOWER systems with NVIDIA GPUs. We'll discuss new CUDA Fortran features, including Tensor Core support and cooperative groups, and we'll cover our current work on half-precision. We'll explain new OpenACC 2.7 features, along with Beta true deep-copy directives and support for OpenACC programs on unified memory systems. The PGI compiler-assisted software testing feature helps determine where differences arise between CPU and GPU versions of a program or when porting to a new system. Learn upcoming projects, which include a high-performance PGI Subset of OpenMP for NVIDIA GPUs, support for GPU programming with standard C++17 parallel STL and Fortran, and incorporating GPU-Accelerated math libraries to support porting and optimization of HPC applications on NVIDIA GPUs.

Tutorials and Labs

S9262 - Tutorial: Zero to GPU Hero with OpenACC

Monday, March 18th, 9:00 - 10:20 – Room 210G

Learn how to take an application from slow, serial execution to blazing fast GPU execution using OpenACC, a directives-based parallel programming language that works with C, C++, and Fortran. By the end of this session participants will know the basics of using OpenACC to write an accelerated application that runs on multicore CPUs and GPUs with minimal code changes. No prior GPU programming experience is required, but the ability to understand C, C++, or Fortran code is necessary.

S9347 - Tutorial: Performance Analysis for Large-Scale GPU-Accelerated Applications and DL Frameworks

Monday, March 18th, 9:00 - 10:20 – Room 210F

Get your hands on the latest versions of Score-P and Vampir to profile the execution behavior of your large-scale GPU-Accelerated applications. See how these HPC community tools pick up as other tools (such as NVVP) drop off when your application spans multiple compute nodes. Regardless of whether your application uses CUDA, OpenACC, OpenMP or OpenCL for acceleration, or whether it is written in C, C++, Fortran or Python, you will receive a high-resolution timeline view of all program activity alongside the standard profiles to identify hot spots and avenues for optimization. The novel Python support now also enables performance studies for optimizing the inner workings of deep learning frameworks.

L9112 - Programming GPU-Accelerated POWER Systems with OpenACC

Wednesday, March 20th, 8:00 - 10:00 – Room LL21E

How To Prepare: All attendees must bring their own laptop and charger. We recommend using a current version of Chrome, Firefox, or Safari for an optimal experience. Create an account at http://courses.nvidia.com/join before you arrive. In this training you will learn how to handle the massive computing performance offered by POWER systems with NVLink-attached GPUs – the technology powering Sierra and Summit, two of the fastest supercomputers in the US. We will present IBM's POWER architecture and highlight the available software stack, before we dive into programming the attached GPUs with OpenACC. By using real-world examples we will get to know the hardware architectures of both CPU and GPU and learn the most important OpenACC directives on the way. The resulting GPU-accelerated program can easily be used on other GPU-equipped machines and architectures, by nature of OpenACC's portable approach. The lab requires the attendees to bring their own laptop. We will work on the training partition of Oak Ridge National Lab's Summit supercomputer.

DLIT903 - OpenACC - 2X in 4 Steps

Wednesday, March 20th, 10:00 - 12:00 – Room LL21A

How To Prepare: All attendees must bring their own laptop and charger. We recommend using a current version of Chrome, Firefox, or Safari for an optimal experience. Create an account at http://courses.nvidia.com/join before you arrive. Learn how to accelerate your C/C++ or Fortran application using OpenACC to harness the massively parallel power of NVIDIA GPUs. OpenACC is a directive-based approach to computing where you provide compiler hints to accelerate your code, instead of writing the accelerator code yourself. Get started on the four-step process for accelerating applications using OpenACC: Characterize and profile your application Add compute directives Add directives to optimize data movement Optimize your application using kernel scheduling Upon completion, you'll be ready to use a profile-driven approach to rapidly accelerate your C/C++ applications using OpenACC directives.

L9121 - How to Boost the Performance of HPC/AI Applications Using MVAPICH2 Library?

Wednesday, March 20th, 8:00 - 10:00 – Room LL21D

How To Prepare: All attendees must bring their own laptop and charger. We recommend using a current version of Chrome, Firefox, or Safari for an optimal experience. Create an account at http://courses.nvidia.com/join before you arrive. Learn the current wave of advances in HPC and AI technologies to improve the performance of applications in the respective domains on modern dense GPU-enabled systems in this instructor lead hands-on training. The training will begin with an introductory session that provides an overview of relevant technologies/concepts. We will also discuss the various exciting challenges and opportunities for HPC and AI researchers. Then, the instructors will guide the attendees on how to use the popular OSU micro-benchmark suite and example applications from HPC and AI to demonstrate how one can effectively take advantage of MVAPICH2 in HPC and AI applications using MPI and CUDA/OpenACC. Attendees will log-in to remote high-performance compute clusters to execute HPC and AI applications in a distributed environment.