Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenCL] Invalid optimization of subgroup functions with non-uniform control flow #45544

Open
PiotrFusik opened this issue Jun 4, 2020 · 1 comment
Labels
bugzilla Issues migrated from bugzilla confirmed Verified by a second party OpenCL

Comments

@PiotrFusik
Copy link

Bugzilla Link 46199
Version trunk
OS Windows NT
CC @AnastasiaStulova,@kpet

Extended Description

The following OpenCL code:

kernel void test(global int *data)
{
uint id = (uint) get_global_id(0);
if (id < 4)
data[id] = sub_group_elect();
else
data[id] = sub_group_elect();
}

when compiled with the current master (f2c9765) as follows: clang -cl-std=CL2.0 -include opencl-c.h -S -emit-llvm sub_group_elect_opt.cl

has an invalid optimization of combining the branches:

; Function Attrs: convergent norecurse nounwind uwtable
define dso_local spir_kernel void @​test(i32* nocapture %data) local_unnamed_addr #​0 !kernel_arg_addr_space !​4 !kernel_arg_access_qual !​5 !kernel_arg_type !​6 !kernel_arg_base_type !​6 !kernel_arg_type_qual !​7 {
entry:
%call = tail call i64 @"?get_global_id@@$$J0YAKI@Z"(i32 0) #​3
%call2 = tail call i32 @"?sub_group_elect@@$$J0YAHXZ"() #​4
%idxprom = and i64 %call, 4294967295
%arrayidx = getelementptr inbounds i32, i32* %data, i64 %idxprom
store i32 %call2, i32* %arrayidx, align 4, !tbaa !​8
ret void
}

sub_group_elect is one of the many functions added in https://reviews.llvm.org/D79781 that perform implicit communication within a subgroup, i.e. threads implemented as SIMD on GPU. All the added functions are affected.

https://reviews.llvm.org/D68994 is a proposal of addressing this problem with the convergent attribute.

@AnastasiaStulova
Copy link
Contributor

Marking subgroups operations that are to be called in non-uniform control flow with convergent attribute doesn't help because convergent attribute was added for functions called within uniform CF (by all work items):
https://clang.llvm.org/docs/AttributeReference.html#convergent

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@llvmbot llvmbot added the confirmed Verified by a second party label Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla confirmed Verified by a second party OpenCL
Projects
None yet
Development

No branches or pull requests

3 participants