Navigation
next
Documentation of Tuning Techniques for A64FX Processors
»
Tuning Tips for A64FX Processors
Tuning Tips for A64FX Processors
ΒΆ
1. Introduction
1.1. What is This Document
1.2. Structure of This Document
1.3. Environment for Performance Measurement
2. Promoting Vectorization
2.1. Interchange of Innermost Loop with Data Dependency
2.2. Interchange of Innermost Loop with Small Iteration Count
2.3. Fission of Imperfectly Nested Loops
3. Reduction of Waiting Time for Calculation
3.1. Fission of Loop with Large Loop Body
3.2. Striping of Innermost Loop with Small Iteration Count
4. Reduction of Waiting Time for Cache Access
4.1. Full-Unrolling of Innermost Loop with Non-Contiguous Data Access
4.2. Interchange of Array Dimension for AoS Layout
4.3. Specifying CONTIGUOUS Attribute to Array Pointer
5. Summary
Next topic
1.
Introduction
Quick search
Navigation
next
Documentation of Tuning Techniques for A64FX Processors
»
Tuning Tips for A64FX Processors