Navigation

  • next
  • Documentation of Tuning Techniques for A64FX Processors »
  • Tuning Tips for A64FX Processors

Tuning Tips for A64FX ProcessorsΒΆ

  • 1. Introduction
    • 1.1. What is This Document
    • 1.2. Structure of This Document
    • 1.3. Environment for Performance Measurement
  • 2. Promoting Vectorization
    • 2.1. Interchange of Innermost Loop with Data Dependency
    • 2.2. Interchange of Innermost Loop with Small Iteration Count
    • 2.3. Fission of Imperfectly Nested Loops
  • 3. Reduction of Waiting Time for Calculation
    • 3.1. Fission of Loop with Large Loop Body
    • 3.2. Striping of Innermost Loop with Small Iteration Count
  • 4. Reduction of Waiting Time for Cache Access
    • 4.1. Full-Unrolling of Innermost Loop with Non-Contiguous Data Access
    • 4.2. Interchange of Array Dimension for AoS Layout
    • 4.3. Specifying CONTIGUOUS Attribute to Array Pointer
  • 5. Summary

Next topic

1. Introduction

Quick search

Navigation

  • next
  • Documentation of Tuning Techniques for A64FX Processors »
  • Tuning Tips for A64FX Processors
© Copyright 2023, RIKEN Center for Computational Science. Created using Sphinx 7.2.6.