1. Introduction¶
1.1. What is This Document¶
This document explains tuning techniques applicable to other programs in common, based on real cases presented in “Meetings for application code tuning on A64FX computer systems”. These techniques are practical ones experienced in real application programs shown below:
Application area |
Program name |
---|---|
Electromagnetic |
|
Fluid dynamics |
|
Molecular dynamics |
|
Quantum chromodynamics |
|
Weather, climate |
The techniques are grouped by objectives, i.e., tuning effects, so that readers can find out candidates from the techniques based on programs’ profiling data such as CPU performance reports.
1.2. Structure of This Document¶
Eight techniques are explained in this document and grouped by the following three objectives which should be focused on:
Each explanation for the techniques consists of the following pieces:
Motivation to apply the technique
Applied example showing performance improvement
Reference links to real cases presented in “Meetings for application code tuning on A64FX computer systems”
Reference links to related information such as compiler user’s guides and programming guides
Readers who have already profiled their program’s performance are recommended to look for applicable techniques which may match their program in terms of the above objectives.
Interested readers can learn more by following each technique’s reference links to related information such as published documents in “Meetings for application code tuning on A64FX computer systems” and tuning advices in programming guides.
1.3. Environment for Performance Measurement¶
Performance data shown in this document was measured under the following condition. Although C/C++ compilers were used in trad mode, ideas of the explained techniques in this document are also applicable under clang mode.
Measured date |
November 2023 |
---|---|
Machine |
Supercomputer Fugaku |
Language environment |
Fujitsu Fortran/C/C++ Compiler 4.9.0 (tcsds-1.2.37) |
Compiler optimization flags |
-Kfast,openmp,ocl |
Number of processes and threads at run time |
4 processes, 12 threads per process |
About usage of CPU performance reports which were used for observing results of performance improvement by explained techniques, please refer to the following documents such as profiler user’s guide.
Notice: Access rights for Fugaku User Portal are required to read the above documents.