LLVM  6.0.0svn
Public Attributes | List of all members
hsa_ext_control_directives_s Struct Reference

The hsa_ext_control_directives_t specifies the values for the HSAIL control directives. More...

#include "Target/AMDGPU/AMDKernelCodeT.h"

Collaboration diagram for hsa_ext_control_directives_s:
Collaboration graph
[legend]

Public Attributes

hsa_ext_control_directive_present64_t enabled_control_directives
 This is a bit set indicating which control directives have been specified. More...
 
hsa_ext_exception_kind16_t enable_break_exceptions
 If enableBreakExceptions is not enabled then must be 0, otherwise must be non-0 and specifies the set of HSAIL exceptions that must have the BREAK policy enabled. More...
 
hsa_ext_exception_kind16_t enable_detect_exceptions
 If enableDetectExceptions is not enabled then must be 0, otherwise must be non-0 and specifies the set of HSAIL exceptions that must have the DETECT policy enabled. More...
 
uint32_t max_dynamic_group_size
 If maxDynamicGroupSize is not enabled then must be 0, and any amount of dynamic group segment can be allocated for a dispatch, otherwise the value specifies the maximum number of bytes of dynamic group segment that can be allocated for a dispatch. More...
 
uint32_t max_flat_grid_size
 If maxFlatGridSize is not enabled then must be 0, otherwise must be greater than 0. More...
 
uint32_t max_flat_workgroup_size
 If maxFlatWorkgroupSize is not enabled then must be 0, otherwise must be greater than 0. More...
 
uint32_t requested_workgroups_per_cu
 If requestedWorkgroupsPerCu is not enabled then must be 0, and the finalizer is free to generate ISA that may result in any number of work-groups executing on a single compute unit. More...
 
hsa_dim3_t required_grid_size
 If not enabled then all elements for Dim3 must be 0, otherwise every element must be greater than 0. More...
 
hsa_dim3_t required_workgroup_size
 If requiredWorkgroupSize is not enabled then all elements for Dim3 must be 0, and the produced code can be dispatched with any legal work-group range consistent with the dispatch dimensions. More...
 
uint8_t required_dim
 If requiredDim is not enabled then must be 0 and the produced kernel code can be dispatched with 1, 2 or 3 dimensions. More...
 
uint8_t reserved [75]
 Reserved. Must be 0. More...
 

Detailed Description

The hsa_ext_control_directives_t specifies the values for the HSAIL control directives.

These control how the finalizer generates code. This struct is used both as an argument to hsaFinalizeKernel to specify values for the control directives, and is used in HsaKernelCode to record the values of the control directives that the finalize used when generating the code which either came from the finalizer argument or explicit HSAIL control directives. See the definition of the control directives in HSA Programmer's Reference Manual which also defines how the values specified as finalizer arguments have to agree with the control directives in the HSAIL code.

Definition at line 210 of file AMDKernelCodeT.h.

Member Data Documentation

◆ enable_break_exceptions

hsa_ext_exception_kind16_t hsa_ext_control_directives_s::enable_break_exceptions

If enableBreakExceptions is not enabled then must be 0, otherwise must be non-0 and specifies the set of HSAIL exceptions that must have the BREAK policy enabled.

If this set is not empty then the generated code may have lower performance than if the set is empty. If the kernel being finalized has any enablebreakexceptions control directives, then the values specified by this argument are unioned with the values in these control directives. If any of the functions the kernel calls have an enablebreakexceptions control directive, then they must be equal or a subset of, this union.

Definition at line 227 of file AMDKernelCodeT.h.

◆ enable_detect_exceptions

hsa_ext_exception_kind16_t hsa_ext_control_directives_s::enable_detect_exceptions

If enableDetectExceptions is not enabled then must be 0, otherwise must be non-0 and specifies the set of HSAIL exceptions that must have the DETECT policy enabled.

If this set is not empty then the generated code may have lower performance than if the set is empty. However, an implementation should endeavour to make the performance impact small. If the kernel being finalized has any enabledetectexceptions control directives, then the values specified by this argument are unioned with the values in these control directives. If any of the functions the kernel calls have an enabledetectexceptions control directive, then they must be equal or a subset of, this union.

Definition at line 239 of file AMDKernelCodeT.h.

◆ enabled_control_directives

hsa_ext_control_directive_present64_t hsa_ext_control_directives_s::enabled_control_directives

This is a bit set indicating which control directives have been specified.

If the value is 0 then there are no control directives specified and the rest of the fields can be ignored. The bits are accessed using the hsa_ext_control_directives_present_mask_t. Any control directive that is not enabled in this bit set must have the value of all 0s.

Definition at line 216 of file AMDKernelCodeT.h.

◆ max_dynamic_group_size

uint32_t hsa_ext_control_directives_s::max_dynamic_group_size

If maxDynamicGroupSize is not enabled then must be 0, and any amount of dynamic group segment can be allocated for a dispatch, otherwise the value specifies the maximum number of bytes of dynamic group segment that can be allocated for a dispatch.

If the kernel being finalized has any maxdynamicsize control directives, then the values must be the same, and must be the same as this argument if it is enabled. This value can be used by the finalizer to determine the maximum number of bytes of group memory used by each work-group by adding this value to the group memory required for all group segment variables used by the kernel and all functions it calls, and group memory used to implement other HSAIL features such as fbarriers and the detect exception operations. This can allow the finalizer to determine the expected number of work-groups that can be executed by a compute unit and allow more resources to be allocated to the work-items if it is known that fewer work-groups can be executed due to group memory limitations.

Definition at line 256 of file AMDKernelCodeT.h.

◆ max_flat_grid_size

uint32_t hsa_ext_control_directives_s::max_flat_grid_size

If maxFlatGridSize is not enabled then must be 0, otherwise must be greater than 0.

See HSA Programmer's Reference Manual description of maxflatgridsize control directive.

Definition at line 261 of file AMDKernelCodeT.h.

◆ max_flat_workgroup_size

uint32_t hsa_ext_control_directives_s::max_flat_workgroup_size

If maxFlatWorkgroupSize is not enabled then must be 0, otherwise must be greater than 0.

See HSA Programmer's Reference Manual description of maxflatworkgroupsize control directive.

Definition at line 266 of file AMDKernelCodeT.h.

◆ requested_workgroups_per_cu

uint32_t hsa_ext_control_directives_s::requested_workgroups_per_cu

If requestedWorkgroupsPerCu is not enabled then must be 0, and the finalizer is free to generate ISA that may result in any number of work-groups executing on a single compute unit.

Otherwise, the finalizer should attempt to generate ISA that will allow the specified number of work-groups to execute on a single compute unit. This is only a hint and can be ignored by the finalizer. If the kernel being finalized, or any of the functions it calls, has a requested control directive, then the values must be the same. This can be used to determine the number of resources that should be allocated to a single work-group and work-item. For example, a low value may allow more resources to be allocated, resulting in higher per work-item performance, as it is known there will never be more than the specified number of work-groups actually executing on the compute unit. Conversely, a high value may allocate fewer resources, resulting in lower per work-item performance, which is offset by the fact it allows more work-groups to actually execute on the compute unit.

Definition at line 283 of file AMDKernelCodeT.h.

◆ required_dim

uint8_t hsa_ext_control_directives_s::required_dim

If requiredDim is not enabled then must be 0 and the produced kernel code can be dispatched with 1, 2 or 3 dimensions.

If enabled then the value is 1..3 and the code produced must only be dispatched with a dimension that matches. Other values are illegal. If the kernel being finalized, or any of the functions it calls, has a requireddimsize control directive, then the values must be the same. This can be used to optimize the code generated to compute the absolute and flat work-group and work-item id, and the dim HSAIL operations.

Definition at line 311 of file AMDKernelCodeT.h.

◆ required_grid_size

hsa_dim3_t hsa_ext_control_directives_s::required_grid_size

If not enabled then all elements for Dim3 must be 0, otherwise every element must be greater than 0.

See HSA Programmer's Reference Manual description of requiredgridsize control directive.

Definition at line 288 of file AMDKernelCodeT.h.

◆ required_workgroup_size

hsa_dim3_t hsa_ext_control_directives_s::required_workgroup_size

If requiredWorkgroupSize is not enabled then all elements for Dim3 must be 0, and the produced code can be dispatched with any legal work-group range consistent with the dispatch dimensions.

Otherwise, the code produced must always be dispatched with the specified work-group range. No element of the specified range must be 0. It must be consistent with required_dimensions and max_flat_workgroup_size. If the kernel being finalized, or any of the functions it calls, has a requiredworkgroupsize control directive, then the values must be the same. Specifying a value can allow the finalizer to optimize work-group id operations, and if the number of work-items in the work-group is less than the WAVESIZE then barrier operations can be optimized to just a memory fence.

Definition at line 301 of file AMDKernelCodeT.h.

◆ reserved

uint8_t hsa_ext_control_directives_s::reserved[75]

Reserved. Must be 0.

Definition at line 314 of file AMDKernelCodeT.h.


The documentation for this struct was generated from the following file: