You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
sub_group_elect is one of the many functions added in https://reviews.llvm.org/D79781 that perform implicit communication within a subgroup, i.e. threads implemented as SIMD on GPU. All the added functions are affected.
Marking subgroups operations that are to be called in non-uniform control flow with convergent attribute doesn't help because convergent attribute was added for functions called within uniform CF (by all work items): https://clang.llvm.org/docs/AttributeReference.html#convergent
Extended Description
The following OpenCL code:
kernel void test(global int *data)
{
uint id = (uint) get_global_id(0);
if (id < 4)
data[id] = sub_group_elect();
else
data[id] = sub_group_elect();
}
when compiled with the current master (f2c9765) as follows: clang -cl-std=CL2.0 -include opencl-c.h -S -emit-llvm sub_group_elect_opt.cl
has an invalid optimization of combining the branches:
; Function Attrs: convergent norecurse nounwind uwtable
define dso_local spir_kernel void @test(i32* nocapture %data) local_unnamed_addr #0 !kernel_arg_addr_space !4 !kernel_arg_access_qual !5 !kernel_arg_type !6 !kernel_arg_base_type !6 !kernel_arg_type_qual !7 {
entry:
%call = tail call i64 @"?get_global_id@@$$J0YAKI@Z"(i32 0) #3
%call2 = tail call i32 @"?sub_group_elect@@$$J0YAHXZ"() #4
%idxprom = and i64 %call, 4294967295
%arrayidx = getelementptr inbounds i32, i32* %data, i64 %idxprom
store i32 %call2, i32* %arrayidx, align 4, !tbaa !8
ret void
}
sub_group_elect is one of the many functions added in https://reviews.llvm.org/D79781 that perform implicit communication within a subgroup, i.e. threads implemented as SIMD on GPU. All the added functions are affected.
https://reviews.llvm.org/D68994 is a proposal of addressing this problem with the
convergent
attribute.The text was updated successfully, but these errors were encountered: