5. Summary¶
This document explained tuning techniques applicable to other programs in common, based on real cases presented in “Meetings for application code tuning on A64FX computer systems”. These techniques showed the following speedup for loops in applied examples.
Objective |
Technique |
Speedup for applied loop |
---|---|---|
3.04 x |
||
2.02 x |
||
1.19 x |
||
1.78 x |
||
1.37 x |
||
Full-Unrolling of Innermost Loop with Non-Contiguous Data Access |
3.35 x |
|
1.51 x |
||
1.79 x |
Readers considering speedup of their program are recommended to look for applicable ones from these techniques, which may match the program, referring to the program’s profiling data such as CPU performance reports.