The following OpenCL code: kernel void test(global int *data) { uint id = (uint) get_global_id(0); if (id < 4) data[id] = sub_group_elect(); else data[id] = sub_group_elect(); } when compiled with the current master (f2c97656644e783622a6e60fe452b41ffe0f1d18) as follows: clang -cl-std=CL2.0 -include opencl-c.h -S -emit-llvm sub_group_elect_opt.cl has an invalid optimization of combining the branches: ; Function Attrs: convergent norecurse nounwind uwtable define dso_local spir_kernel void @test(i32* nocapture %data) local_unnamed_addr #0 !kernel_arg_addr_space !4 !kernel_arg_access_qual !5 !kernel_arg_type !6 !kernel_arg_base_type !6 !kernel_arg_type_qual !7 { entry: %call = tail call i64 @"?get_global_id@@$$J0YAKI@Z"(i32 0) #3 %call2 = tail call i32 @"?sub_group_elect@@$$J0YAHXZ"() #4 %idxprom = and i64 %call, 4294967295 %arrayidx = getelementptr inbounds i32, i32* %data, i64 %idxprom store i32 %call2, i32* %arrayidx, align 4, !tbaa !8 ret void } sub_group_elect is one of the many functions added in https://reviews.llvm.org/D79781 that perform implicit communication within a subgroup, i.e. threads implemented as SIMD on GPU. All the added functions are affected. https://reviews.llvm.org/D68994 is a proposal of addressing this problem with the `convergent` attribute.
Marking subgroups operations that are to be called in non-uniform control flow with convergent attribute doesn't help because convergent attribute was added for functions called within uniform CF (by all work items): https://clang.llvm.org/docs/AttributeReference.html#convergent