## Higher Performance Superelement Stiffness Analysis with OpenACC

In this blog article I describe the recent work in optimizing the computational performance of structural analysis software with OpenACC directives. Structural analysis of buildings with high-fidelity is computationally expensive in terms of required memory and time. As designers and “hazard engineers” sacrifice some accuracy by moving to low-fidelity simulations, the need for faster computing has risen in the construction industry and for regional structural assessment. Hybrid BEM-FEM (boundary element and finite element methods) simulations have accurately represented high-rise structures, yet a computational bottleneck still exists for the stiffness computation of each floor. OpenACC-augmented Fortran programming is used for the calculation of these floors, to produce stiffness matrices and load vectors that accurately represent floors as superelements. The OpenACC implementation of algorithms used to compute these super-floor-elements on high-performance processors are tested on two architectures of NVIDIA GPUs and one multicore processor. Industrial level structural floors showed improved computation performance on GPUs, and thus could be integrated in hybrid models of actual whole structures much faster than before.

Accurate high-rise structural analysis models are possible through the development of software that include BIM-integration, hybrid BEM-FEM numerical modelling, superstructure and substructure coupling, stages of construction analysis, and high-performance computing. As structures become more intricate, floor stiffness calculations become extremely hefty whether it was done using accurate minute finite element meshes or serial super-floor-element computation. Also, with the dawn of AI-supported structural optimization and structural health monitoring of mega-cities, we need faster structural analysis software in order to attain vital results in the least time possible.

The developed technique [1] for modelling floors is compute intensive, as 12 different matrices and vectors need computation for the FEM-like super-floor-element. This becomes problematic when the structural floor is riddled with detail (openings, beams, drops, vertical elements, etc…). To reduce the computational cost of the superelement, OpenACC implementation was proposed. An academic code was developed that uses BEM discretization for structural floors to compute and assemble the stiffness matrices [2]. This code is written as a serial Fortran code and is now transformed to OpenACC Fortran, altering all kernels to transform dependent and independent nested loops into parallel routines: initially the calculation of BEM influence matrices [3], the solution to the system of linear equations, and the computation of the stiffness matrices and load vectors. Conventional BEM does not solve whole buildings on its own, and thus the benefit of the new method is that the super-floor-element could be easily integrated into finite element whole structure analysis. Analyzing with BEM requires solving the system of equations of the conventional method *(Hu=Gt)*, then postprocessing for internal domain values. The new proposed method is solved using a finite element-like system of equations *(Ku=F)*, then the internal domain of the floor only is also post-processed using BEM.

While maintaining the same accuracy, comparison of the performance of each kernel optimized by multicore CPU and GPU is set against an optimized CPU serial code that has the same kernels. **Figure 1** shows a flowchart of the superelement creation, including the stiffness matrices and load vectors for a structural floor.

As a simple example and to briefly describe how the directives were placed, **Algorithm 1** is a small pseudocode of the Fortran code where intensive nested loops could be computed on accelerators. With necessary accelerator data management and variable computation reductions, three directives are added to **Algorithm 2**. Data copied in do not need to be copied out and vice versa, therefore the *!$acc data copyin/copyout* are used. Parallel loops are possible through other directives such as *!$acc parallel loop* and *!$acc loop reduction(+:rsum)*. A reduction sum is needed to compute the vector uovertB0Fc. The publications [3, 4] describe in detail the meaning and ingredients of the matrices and load vectors of any superelement.

As a demonstration of how useful this implementation is, a multistory building is analyzed on a fixed base, which includes five typical floors with an area of 2144.56m^{2} each. The effect of increasing the number of floors is tested here, where each floor’s superelement stiffness and load cases are computed. Each floor has a uniform gravitational loading of 4t/m^{2}. Figure 2 displays the case study’s BIM 3D model, the PLGen model of the floor, the BEM elements, and the results after postprocessing of the whole structure using hybrid BEM-FEM. Structural analysis is performed four times: (CPU, Multicore CPU, GPU1, GPU2).

The whole serial code for computing the whole structure took almost 11 hours to complete, whereas the multicore CPU implementation took almost 2 hours. The same multistory building took 26 and 21 minutes with all the transfers between the host and the devices, on GPU1 and GPU2 respectively. The average computation time for EACH floor is presented in **Table 1**, when computed on each accelerator device, with GPUs appearing to be more superior for this example. It should be noted that, although its latency is higher, GPU2 (GeForce GTX Titan X) regularly outperforms GPU1 (Tesla K40c) as it is dedicated to solving single precision arithmetic problems better. As GPU architectures are improved, the developed code could still be utilized with simple compilation adjustments.

**In Closing**

The method of converting the code to be executed on parallel accelerators, such as multicore CPUs and GPUs, is performed through adding appropriate OpenACC directives. This conversion approach enables the benefits of high-performance computing with low development cost. The example demonstrated the feasibility of computing structural super-floor-elements and analyzing straining actions on parallel processors with OpenACC. We look forward to optimizing the computational performance of “whole structures” with nonlinear effects soon.

Acknowledgments

I would like to acknowledge the support of NVIDIA Corp. and to thank them for providing a TESLA K40c through their NVIDIA GPU Grant. I also acknowledge the financial support of the Science and Technology Development Fund, Egypt (STDF) through grant no. 14910.

References:

[1] A. A. Torky, Youssef F Rashed, High-performance practical stiffness analysis of high-rise buildings using superfloor elements, Journal of Computational Design and Engineering, Volume 7, Issue 2, April 2020, Pages 211–227, https://doi.org/10.1093/jcde/qwaa018.

[2] Y.F. Rashed, Boundary element modelling of flat plate floors under vertical loading, Int. J. Numer. Methods Eng. (2005). doi:10.1002/nme.1236.

[3] A.A. Torky, Y.F. Rashed, GPU acceleration of the boundary element method for shear-deformable bending of plates, Eng. Anal. Bound. Elem. (2017). doi:10.1016/j.enganabound.2016.10.006.

[4] M. Wagdy, Y.F. Rashed, Boundary element analysis of multi-thickness shear-deformable slabs without sub-regions, Eng. Anal. Bound. Elem. (2014). doi:10.1016/j.enganabound.2014.03.011.