Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Incorrect RETURN_ADDRESS in tests #35661

Closed
rovka opened this issue Feb 9, 2018 · 35 comments
Closed

[AArch64] Incorrect RETURN_ADDRESS in tests #35661

rovka opened this issue Feb 9, 2018 · 35 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla openmp

Comments

@rovka
Copy link
Collaborator

rovka commented Feb 9, 2018

Bugzilla Link 36313
Resolution FIXED
Resolved on Feb 15, 2018 03:00
Version unspecified
OS Linux
Blocks #35152
Attachments Full failure report
CC @aemerson,@dsandersllvm,@hahnjo,@jprotze,@qcolombet

Extended Description

I'm getting these failures on AArch64 in 6.0.0-rc2:

Failing Tests (6):
libomp :: ompt/misc/control_tool.c
libomp :: ompt/synchronization/flush.c
libomp :: ompt/synchronization/master.c
libomp :: ompt/synchronization/taskwait.c
libomp :: ompt/synchronization/test_lock.c
libomp :: ompt/synchronization/test_nest_lock_parallel.c

They all look something like this:
******************** TEST 'libomp :: ompt/synchronization/test_nest_lock_parallel.c' FAILED ********************
Script:

/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/./bin/clang -fopenmp -I /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test -I /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/src -L /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/lib -I /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c -o /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/test/ompt/synchronization/Output/test_nest_lock_parallel.c.tmp -lm -latomic && /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/test/ompt/synchronization/Output/test_nest_lock_parallel.c.tmp | sort --numeric-sort --stable | /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/./bin/FileCheck /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c

Exit Code: 1

Command Output (stdout):

$ "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/./bin/clang" "-fopenmp" "-I" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test" "-I" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/src" "-L" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/lib" "-I" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c" "-o" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/test/ompt/synchronization/Output/test_nest_lock_parallel.c.tmp" "-lm" "-latomic"
$ "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/test/ompt/synchronization/Output/test_nest_lock_parallel.c.tmp"
$ "sort" "--numeric-sort" "--stable"
$ "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/./bin/FileCheck" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c"

command stderr:

/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c:47:17: error: expected string not found in input
// CHECK-NEXT: {{^}}[[MASTER_ID]]: current_address={{.*}}[[RETURN_ADDRESS]]
^
:20:1: note: scanning from here
281474976710657: current_address=0x403658 or 0x403654
^
:20:1: note: with variable "MASTER_ID" equal to "281474976710657"
281474976710657: current_address=0x403658 or 0x403654
^
:20:1: note: with variable "RETURN_ADDRESS" equal to "0x403650"
281474976710657: current_address=0x403658 or 0x403654
^
:20:13: note: possible intended match here
281474976710657: current_address=0x403658 or 0x403654
^

error: command failed with exit status: 1

@rovka
Copy link
Collaborator Author

rovka commented Feb 9, 2018

assigned to @zmodem

@hahnjo
Copy link
Member

hahnjo commented Feb 9, 2018

+Paul who added support for OMPT on AArch64

The check for the return address is known to be a bit fragile, relies on the instructions emitted by the compiler and needs changes for every architecture. Paul did them for AArch64 but it looks like Clang 6.0 inserts another instruction (off by 4 bytes). In particular this might be related to GlobalIsel?

Anyway, can you get me an objdump of the related code region? There should be

  • a call to a _kmpc* function
  • possibly a store instruction to save the return value
  • a NOP
    If Clang now emits another instruction we can just add another possible address.

@jprotze
Copy link
Collaborator

jprotze commented Feb 9, 2018

https://reviews.llvm.org/D43115 should address these issues. I agree with Jonas, that we would like to look into the assembly to understand what is going on there. We would appreciate to get an object dump for the respective code region.

@rovka
Copy link
Collaborator Author

rovka commented Feb 9, 2018

Thanks! I'll post the assembly as soon as I can.

@rovka
Copy link
Collaborator Author

rovka commented Feb 9, 2018

Assembly for the synchronization/flush.c test
I've attached the disassembly for the flush.c test (since it superficially seemed smaller than the others). Hope that helps, please let me know if you need anything else. Sorry about the delay.

@hahnjo
Copy link
Member

hahnjo commented Feb 9, 2018

Created attachment 19848 [details]
Assembly for the synchronization/flush.c test

I've attached the disassembly for the flush.c test (since it superficially
seemed smaller than the others). Hope that helps, please let me know if you
need anything else. Sorry about the delay.

That's the interesting part:
403344: 97fff4f3 bl 400710 __kmpc_flush@plt
-> call runtime function

403348: d503201f nop
-> this nop is the return instruction and inserted by the test via inline assembly

40334c: 14000001 b 403350 <.omp_outlined.+0x30>
-> ... an unconditional branch to the next instruction

403350: f0000088 adrp x8, 416000 <GLOBAL_OFFSET_TABLE+0x28>
403354: f9408d0 ldr x8, [x8,#280]
403358: d63f0100 blr x8
40335c: f9400001 ldr x1, [x0]
-> ompt_get_thread_data()

403360: 90000008 adrp x8, 403000 <on_ompt_callback_task_schedule+0x6c>
403364: 910d4108 add x8, x8, #​0x350
-> address of the label: 0x403350

403368: f1001102 subs x2, x8, #​0x4
40336c: f1002103 subs x3, x8, #​0x8
-> two possible values: 4 and 8 bytes before the label

403370: 90000000 adrp x0, 403000 <on_ompt_callback_task_schedule+0x6c>
403374: 913dfc00 add x0, x0, #​0xf7f
-> formatting string

403378: 97fff50e bl 4007b0 printf@plt
-> printf()

So I think this test would be fine if it actually checks for the second address (done in the linked patch). However, this only works if the runtime call returns no value. If it does have a return value and there is an additional store instruction, I'd assume that the branch is there as well which kills the assumptions.

I'm not sure where the branch instruction comes from, but if it's really related to GlobalIsel we'll end up having the same problem with other architectures if they do the switch...

@hahnjo
Copy link
Member

hahnjo commented Feb 9, 2018

I'm not sure where the branch instruction comes from, but if it's really
related to GlobalIsel we'll end up having the same problem with other
architectures if they do the switch...

It is, small example:
$ cat label.c
void test() {
asm("nop");
label:
printf("%p\n", &&label);
}

$ clang -target aarch64 -S label.c -o -
[...]
//APP
nop
//NO_APP
b .LBB0_1
.Ltmp0: // Block address taken
.LBB0_1:
[...]

$ clang -target aarch64 -fno-experimental-isel -S label.c -o -
[...]
//APP
nop
//NO_APP
.Ltmp0: // Block address taken
// %bb.1:
[...]

And I can get the same on x86:
$ clang -target x86_64 -S label.c -o -
[...]
#APP
nop
#NO_APP
.Ltmp0: # Block address taken

%bb.1:

[...]

$ clang -target x86_64 -fexperimental-isel -S label.c -o -
[...]
#APP
nop
#NO_APP
jmp .LBB0_4
.Ltmp0: # Block address taken
.LBB0_4:
[...]

So if we don't "fix" GlobalIsel in the compiler we need to adapt the return addresses of all architectures. Diana, do you think it's worth messing with GlobalIsel?

@rovka
Copy link
Collaborator Author

rovka commented Feb 9, 2018

I think this is worth investigating deeper.

If I use -mllvm -print-after-all on your example, I get this:

*** IR Dump After InstructionSelect ***:

Machine code for function test: IsSSA, TracksLiveness, FailedISel, Legalized, RegBankSelected, Selected

%bb.1: derived from LLVM BB %0
%2:_ = G_GLOBAL_VALUE @.str;
%1:_ = COPY %2;
INLINEASM $nop [sideeffect] [attdialect]
Successors according to CFG: %bb.2(?%)

%bb.2: derived from LLVM BB %1, ADDRESS TAKEN
Predecessors according to CFG: %bb.1
ADJCALLSTACKDOWN 0, 0, implicit-def %sp, implicit %sp
%x0 = COPY %1;
%x1 = COPY %3:;
BL @​printf, <regmask %fp %lr %b8 %b9 %b10 %b11 %b12 %b13 %b14 %b15 %d8 %d9 %d10 %d11 %d12 %d13 %d14 %d15 %h8 %h9 %h10 %h11 %h12 %h13 %h14 %h15 %s8 %s9 %s10 %s11 %s12 %s13 %s14 and 63 more...>, implicit-def %lr, implicit %sp, implicit %x0, implicit %x1, implicit-def %w0
%0:
= COPY %w0;
ADJCALLSTACKUP 0, 0, implicit-def %sp, implicit %sp
RET_ReallyLR

End machine code for function test.

*** IR Dump After AArch64 Instruction Selection ***:

Machine code for function test: IsSSA, TracksLiveness

%bb.3: derived from LLVM BB %0
INLINEASM $nop [sideeffect] [attdialect], !​2
B %bb.4
Successors according to CFG: %bb.4(?%)

%bb.4: derived from LLVM BB %1, ADDRESS TAKEN
Predecessors according to CFG: %bb.3
ADJCALLSTACKDOWN 0, 0, implicit-def dead %sp, implicit %sp
%0:gpr64 = MOVaddr target-flags(aarch64-page) @.str, target-flags(aarch64-pageoff, aarch64-nc) @.str; G#64 :%0
%1:gpr64 = MOVaddrBA target-flags(aarch64-page) blockaddress(@test, %ir-block.1), target-flags(aarch64-pageoff, aarch64-nc) blockaddress(@test, %ir-block.1); G#64 :%1
%x0 = COPY %0; G#64 :%0
%x1 = COPY %1; G#64 :%1
BL @​printf, <regmask %fp %lr %b8 %b9 %b10 %b11 %b12 %b13 %b14 %b15 %d8 %d9 %d10 %d11 %d12 %d13 %d14 %d15 %h8 %h9 %h10 %h11 %h12 %h13 %h14 %h15 %s8 %s9 %s10 %s11 %s12 %s13 %s14 and 63 more...>, implicit-def dead %lr, implicit %sp, implicit %x0, implicit %x1, implicit-def %sp, implicit-def %w0
ADJCALLSTACKUP 0, 0, implicit-def dead %sp, implicit %sp
%2:gpr32all = COPY %w0; G#32 all:%2
RET_ReallyLR

End machine code for function test.

So it seems we're running both GlobalISel and DAGISel, and it's actually the DAGISel that introduces the branch... I'm not sure why we fallback here.

@hahnjo
Copy link
Member

hahnjo commented Feb 9, 2018

It gets even better: I tried running the full OpenMP test suite with -fexperimental-isel on x86 and there are two jumps in ompt/synchronization/master.c

403ae6: e8 15 cc ff ff callq 400700 __kmpc_end_master@plt
403aeb: e9 00 00 00 00 jmpq 403af0 <.omp_outlined.+0xb0>
403af0: 90 nop
403af1: e9 00 00 00 00 jmpq 403af6 <.omp_outlined.+0xb6>

-> this is the position of the label
403af6: 48 8b 05 4b 36 20 00 mov 0x20364b(%rip),%rax # 607148 <ompt_get_thread_data>
403afd: ff d0 callq *%rax
403aff: 48 8b 30 mov (%rax),%rsi
403b02: b9 eb 47 40 00 mov $0x4047eb,%ecx
403b07: 89 cf mov %ecx,%edi
403b09: 48 8d 15 e5 ff ff ff lea -0x1b(%rip),%rdx # 403af5 <.omp_outlined.+0xb5>
-> one NOP = 1 byte offset

403b10: 48 8d 0d db ff ff ff lea -0x25(%rip),%rcx # 403af2 <.omp_outlined.+0xb2>
-> NOP + MOV (for non-void functions)

403b17: 4c 8d 05 d2 ff ff ff lea -0x2e(%rip),%r8 # 403af0 <.omp_outlined.+0xb0>
-> NOP + JMPQ, trying to account for GlobalIsel

403b1e: 4c 8d 0d c8 ff ff ff lea -0x38(%rip),%r9 # 403aed <.omp_outlined.+0xad>
-> NOP + MOV + JMPQ

What would actually be needed here is JMPQ + NOP + JMPQ = 11 bytes :-(

@hahnjo
Copy link
Member

hahnjo commented Feb 9, 2018

So it seems we're running both GlobalISel and DAGISel, and it's actually the
DAGISel that introduces the branch... I'm not sure why we fallback here.

label.c:1:1: remark: unable to translate constant: i8* [-Rpass-missed=gisel-irtranslator]

Next question: Why does the fallback insert these branches and jumps?

@dsandersllvm
Copy link
Collaborator

In the output Diana provided in comment #​7, I see the FailedISel attribute is present after the InstructionSelect pass. Going by the IR Translator message that Jonas mentioned in comment #​9, it was most likely added by the IR Translator in which case most of the GlobalISel passes won't have done anything to that function. I don't see anything taking the address of the block so the IR Translator most likely failed on that.

The strange thing is that the FailedISel attribute causes the function to be reset before DAGISel runs on it so you should see the same behaviour between DAGISel-only and GlobalISel that fell back to DAGISel. Could there be some state that the ResetMachineFunction pass isn't resetting?

@hahnjo
Copy link
Member

hahnjo commented Feb 9, 2018

I don't see anything taking the address of the
block so the IR Translator most likely failed on that.

That's probably "&&label" which is a GNU extension: It should return the address of the next instruction right behind the label...

@dsandersllvm
Copy link
Collaborator

I don't see anything taking the address of the
block so the IR Translator most likely failed on that.

That's probably "&&label" which is a GNU extension: It should return the
address of the next instruction right behind the label...

I mean I don't see it in the MIR. The DAGISel pass has blockaddress() operands referring to %ir-block.1 but GlobalISel's pass doesn't.

@qcolombet
Copy link
Collaborator

The strange thing is that the FailedISel attribute causes the function to be
reset before DAGISel runs on it so you should see the same behaviour between
DAGISel-only and GlobalISel that fell back to DAGISel. Could there be some
state that the ResetMachineFunction pass isn't resetting?

I agree with Daniel, when GISel falls back we should get the same output as DAGISel-only. If that's not the case, that's a bug.

@hahnjo
Copy link
Member

hahnjo commented Feb 9, 2018

The strange thing is that the FailedISel attribute causes the function to be
reset before DAGISel runs on it so you should see the same behaviour between
DAGISel-only and GlobalISel that fell back to DAGISel. Could there be some
state that the ResetMachineFunction pass isn't resetting?

I agree with Daniel, when GISel falls back we should get the same output as
DAGISel-only. If that's not the case, that's a bug.

Okay, do you want to move this bug to GlobalISel and update the information accordingly?

@aemerson
Copy link
Contributor

aemerson commented Feb 9, 2018

I'll investigate this.

@aemerson
Copy link
Contributor

aemerson commented Feb 9, 2018

Ok this isn't really a bug, this is a code generation difference between using SelectionDAG and FastISel. GlobalISel is only involved indirectly, since at -O0 GISel replaces FastISel and falls back to SelectionDAG, whereas before FastISel would handle the block with the inline ASM and not fall back to SDAG.

FastISel has some specialized code in FastISel.cpp:fastEmitBranch() that will check if the successor block is the target of the unconditional branch, in which case it will not emit anything. SelectionDAG doesn't do that, so when GlobalISel gives up on the block, instead of trying FastISel, we fall back to SDAG and end up with the extra branch.

This is expected behavior in terms of falling back to SDAG, so I think the test needs to be updated to handle this new codegen.

@hahnjo
Copy link
Member

hahnjo commented Feb 10, 2018

Ok this isn't really a bug, this is a code generation difference between
using SelectionDAG and FastISel. GlobalISel is only involved indirectly,
since at -O0 GISel replaces FastISel and falls back to SelectionDAG, whereas
before FastISel would handle the block with the inline ASM and not fall back
to SDAG.

FastISel has some specialized code in FastISel.cpp:fastEmitBranch() that
will check if the successor block is the target of the unconditional branch,
in which case it will not emit anything. SelectionDAG doesn't do that, so
when GlobalISel gives up on the block, instead of trying FastISel, we fall
back to SDAG and end up with the extra branch.

Ok, this is the answer to "why do we have branches". But it doesn't explain why GlobalISel falls back at all.

This is expected behavior in terms of falling back to SDAG, so I think the test > needs to be updated to handle this new codegen.

As said yesterday I don't think it's possible to do this reliably. And I'd argue that the generated code, while correct, doesn't make any sense.

@aemerson
Copy link
Contributor

Fallbacks are part of the design of GlobalISel, and as development continues fallbacks may increase, although we plan to reduce them over time. Why it fell back here I don't know as I'm not at my main machine, but it doesn't really matter since falling back is not a bug.

As for the code quality issue, we should have a separate PR for that.

@hahnjo
Copy link
Member

hahnjo commented Feb 10, 2018

But it doesn't explain why GlobalISel falls back at all.

I think it's because GlobalISel doesn't handle BlockAddress in IRTranslator::translate(). I tried to hack together some support for this but failed because I'm not familiar with the code at all.

So unless somebody else implements this in time for 6.0 we need to decide what to do:

  1. Disable OMPT on AArch64 for now.
  2. Disable OMPT tests on AArch64 for now - not really convincing to me.
  3. Workaround test failures by disabling GlobalISel for OpenMP tests.

For #​3 we could use -fno-experimental-isel, is that available in 6.0? Another option would be compiling the tests with "-O2 -fno-omit-frame-pointer": If I understood the discussions on llvm-dev correctly, that should still use the "old" path with FastISel? But I see at least one test failing on x86_64 that we would need to look into: ompt/tasks/serialized.c complains about frame addresses that don't match...

@aemerson
Copy link
Contributor

The -fno-experimental-isel flag is available on the 6.0 branch. I think as a short term workaround however using -O2 would be better as I’d rather not put the isel flag in places permanently if we can avoid it. Also add a comment for why it’s needed. I’ll raise a bug on implementing block address support at some point, so for the next release we can remove the workaround.

@hahnjo
Copy link
Member

hahnjo commented Feb 12, 2018

The -fno-experimental-isel flag is available on the 6.0 branch. I think as a
short term workaround however using -O2 would be better as I’d rather not
put the isel flag in places permanently if we can avoid it.
Thinking about it, I don't think we should do this kind of hack: We really want to disable GlobalISel so we should make this explicit. I posted https://reviews.llvm.org/D43195, please let me know if that works for you.

[...] so for the next release we can remove the workaround.

Probably not: The OpenMP runtime library also supports older versions of compilers, not only the latest. So we need to keep this "workaround" as long as we want to support Clang 6.0.0. We can only drop it for future versions of the compiler...

@aemerson
Copy link
Contributor

I don't understand the issue it seems, why do the tests rely closely on code generation of a specific compiler? This is something that can't be seen as stable across any version of the compiler, and in cases like these the tests would be updated to support the particular of the compiler being supported, and also branched for a particular release.

Forcing the compiler to be generate code in a specific way despite it being correct is fragile, and in this case its just pure luck that we have a flag to revert to the old behavior.

So perhaps as you say the tests need to work with various different compilers, in which case the tests themselves need to be tolerant of the differences. I have no investment in ompt so it's your choice how you choose to deal with this;

@hahnjo
Copy link
Member

hahnjo commented Feb 13, 2018

I don't understand the issue it seems, why do the tests rely closely on code
generation of a specific compiler? This is something that can't be seen as
stable across any version of the compiler, and in cases like these the tests
would be updated to support the particular of the compiler being supported,
and also branched for a particular release.

Yes, the tests rely on a specific code generation and we do everything possible to make the compiler do what we want (inline assembly for example). This seems to work across all tested versions of GCC, Intel Compiler and Clang on all supported platforms - except the latest Clang 6.0.0 on AArch64 which uses GlobalISel.
Some background: We want to test return and frame addresses that are given to a tool during runtime. If you have a better idea how to do this, I think we would be happy to listen.

We can't just update the test because the difference with GlobalISel isn't stable either: In some cases it produces one additional branch instruction, in some others there are two and I think if I keep on digging, I'll be able to construct a test where there are three of them.

Forcing the compiler to be generate code in a specific way despite it being
correct is fragile, and in this case its just pure luck that we have a flag
to revert to the old behavior.

In your proposed workaround of using optimization it's yet greater luck that GlobalISel is only enabled by default for O0. I'm guessing you want to change that at some point in the future, so if we don't want GlobalISel in the tests we should explicitly say so.

So perhaps as you say the tests need to work with various different
compilers, in which case the tests themselves need to be tolerant of the
differences. I have no investment in ompt so it's your choice how you choose
to deal with this;

As I said we have tested various different compilers and GlobalISel actually generates the first assembly that fails these tests. And I think we don't need to discuss that branches to the very next instruction are not the best the compiler can do. I understand that this will be solved in the future, but until we have support for blockaddress I think it's just the best option to disable GlobalISel.

@rovka
Copy link
Collaborator Author

rovka commented Feb 13, 2018

You're really making it sound like GlobalISel is the main villain here, but IIUC those branches are generated by DAGISel. The only "questionable" thing that GlobalISel is doing in this case is that it's falling back directly to DAGISel instead of FastISel. Should we maybe reconsider falling back to FastISel? Naturally, this will come with some extra overhead in cases where FastISel can't handle it either and it needs to fallback again to DAGISel, but maybe preserving the current behaviour is more important?

@hahnjo
Copy link
Member

hahnjo commented Feb 13, 2018

You're really making it sound like GlobalISel is the main villain here, but
IIUC those branches are generated by DAGISel. The only "questionable" thing
that GlobalISel is doing in this case is that it's falling back directly to
DAGISel instead of FastISel. Should we maybe reconsider falling back to
FastISel? Naturally, this will come with some extra overhead in cases where
FastISel can't handle it either and it needs to fallback again to DAGISel,
but maybe preserving the current behaviour is more important?

I didn't want to blame GlobalISel and I acknowledge that the branches are inserted by SelectionDAG. I'm also sure that all people working on it are doing that for a reason (not very familiar with this matter, didn't yet find enough time to carefully listen to the recorded talks about GlobalISel from the past DevMtgs).

What I tried to express is that the tests "broke" when switching from FastISel to GlobalISel and that we can temporarily workaround this by disabling GlobalISel. I don't really like that kind of workaround, but I think it's the best thing we can do after you found the problem in rc2. And I surely won't say no when we can drop the flag again from trunk.
So please just take the patch as my opinion of what we should do in the short-term and merge to release_60.

@aemerson
Copy link
Contributor

Unless Diana thinks differently, if you feel this is the right choice then go ahead. Chances are though it'll be libomp developers that will need to remember to remove these workarounds when they're not needed.

As for whether we should be falling back to FastISel, I'm open to the idea but I think we need stronger justification than these tests.

@rovka
Copy link
Collaborator Author

rovka commented Feb 13, 2018

Unless Diana thinks differently, if you feel this is the right choice then
go ahead. Chances are though it'll be libomp developers that will need to
remember to remove these workarounds when they're not needed.

As for whether we should be falling back to FastISel, I'm open to the idea
but I think we need stronger justification than these tests.

As long as the libomp devs are happy with this workaround, I have nothing against it. I was just proposing alternatives. I agree that this late in the release process we should have better justification for more intrusive changes like changing the fallback or fixing SelectionDAG. Feel free to merge the proposed patch.

@qcolombet
Copy link
Collaborator

FWIW, when we changed the fallback path of GlobalISel to explicitly not used FastISel I said it wasn't a good idea for exactly the problem we have today:

  • We cannot fix codegen in GlobalISel since we don't support that case yet
  • Users still get a different codegen than what they are used to and we are not going to fix SDISel to behave like FastISel

In other words, I would be in favor of not doing anything special for GISel when we fallback (revert r323369?)

@hahnjo
Copy link
Member

hahnjo commented Feb 14, 2018

Unless Diana thinks differently, if you feel this is the right choice then
go ahead. Chances are though it'll be libomp developers that will need to
remember to remove these workarounds when they're not needed.

If you subscribe me on the bug to implement blockaddress, I can probably take care of posting a patch to remove the workaround afterwards.

As long as the libomp devs are happy with this workaround, I have nothing
against it. I was just proposing alternatives. I agree that this late in the
release process we should have better justification for more intrusive
changes like changing the fallback or fixing SelectionDAG. Feel free to
merge the proposed patch.

Could you verify that this indeed works?

@rovka
Copy link
Collaborator Author

rovka commented Feb 14, 2018

Could you verify that this indeed works?

Yes, I've already started it, but it's still in progress, so I'll most likely let you know tomorrow (Europe time).

@hahnjo
Copy link
Member

hahnjo commented Feb 15, 2018

Could you verify that this indeed works?

Yes, I've already started it, but it's still in progress, so I'll most
likely let you know tomorrow (Europe time).

Verified and committed in r325218. Hans, can we merge this to release_60?

@zmodem
Copy link
Collaborator

zmodem commented Feb 15, 2018

Could you verify that this indeed works?

Yes, I've already started it, but it's still in progress, so I'll most
likely let you know tomorrow (Europe time).

Verified and committed in r325218. Hans, can we merge this to release_60?

Merged in r325230.

@qcolombet
Copy link
Collaborator

mentioned in issue llvm/llvm-bugzilla-archive#36390

@hahnjo
Copy link
Member

hahnjo commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#40131

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla openmp
Projects
None yet
Development

No branches or pull requests

7 participants