4.3. Specifying CONTIGUOUS Attribute to Array Pointer

4.3.1. Motivation

In Fortran programs, array pointers specified as CONTIGUOUS can be associated only with contiguous targets. Therefore, when CONTIGUOUS attribute is specified to an array pointer, Fujitsu Fortran compiler optimizes object codes for array accesses under the assumption that the data pointed by the pointer is located contiguously in the memory.

It means that, when user can specify CONTIGUOUS attribute for an array pointer, array accesses are done with contiguous load/store instructions, busy time for cache access is reduced and it might lead to reduction of execution time.

4.3.2. Applied Example

Using an example based on STREAM benchmark, performance improvement by applying this technique is shown below. In this example, CONTIGUOUS attribute is specified for array pointers pa, pb and pc, which are associated with allocatable arrays.

Original
TYPE dtype
  REAL(KIND=8), DIMENSION(:), ALLOCATABLE :: a, b, c
END TYPE dtype
    TYPE(dtype), TARGET :: dtarg
    REAL(KIND=8), DIMENSION(:), POINTER :: pa, pb, pc

    pa => dtarg%a
    pb => dtarg%b
    pc => dtarg%c
!$OMP PARALLEL DO
!OCL NORECURRENCE
    DO i = 1, n
       pa(i) = pb(i) + scalar * pc(i)
    END DO
Technique applied
    TYPE(dtype), TARGET :: dtarg
    REAL(KIND=8), DIMENSION(:), POINTER, CONTIGUOUS :: pa, pb, pc

    pa => dtarg%a
    pb => dtarg%b
    pc => dtarg%c
!$OMP PARALLEL DO
!OCL NORECURRENCE
    DO i = 1, n
       pa(i) = pb(i) + scalar * pc(i)
    END DO

Results of cycle accounting for executions before/after applying the technique are shown in graphs below. Assuming blocking for L2 cache, a parameter for the loop execution was chosen as follows:

n = 196608

Comparing the right graph for the technique applied to the left graph for the original, busy time for L1D cache access was reduced by half, waiting time for L2 cache access became dominant and execution time was reduced by 44%.

_images/triadt.29503716.0.png _images/triadt.29503716.1.png

4.3.3. Real Cases

A real case related to this technique is presented in “Meetings for application code tuning on A64FX computer systems” as follows:

4.3.4. References

Notice: Access rights for Fugaku User Portal are required to read the above documents.