LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 36313 - [AArch64] Incorrect RETURN_ADDRESS in tests
Summary: [AArch64] Incorrect RETURN_ADDRESS in tests
Status: RESOLVED FIXED
Alias: None
Product: OpenMP
Classification: Unclassified
Component: Runtime Library (show other bugs)
Version: unspecified
Hardware: PC Linux
: P enhancement
Assignee: Hans Wennborg
URL:
Keywords:
Depends on:
Blocks: release-6.0
  Show dependency tree
 
Reported: 2018-02-08 23:30 PST by Diana Picus
Modified: 2018-02-15 03:00 PST (History)
7 users (show)

See Also:
Fixed By Commit(s):


Attachments
Full failure report (24.42 KB, text/plain)
2018-02-08 23:30 PST, Diana Picus
Details
Assembly for the synchronization/flush.c test (219.21 KB, text/x-tex)
2018-02-09 03:48 PST, Diana Picus
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Diana Picus 2018-02-08 23:30:59 PST
Created attachment 19846 [details]
Full failure report

I'm getting these failures on AArch64 in 6.0.0-rc2:

Failing Tests (6):
    libomp :: ompt/misc/control_tool.c
    libomp :: ompt/synchronization/flush.c
    libomp :: ompt/synchronization/master.c
    libomp :: ompt/synchronization/taskwait.c
    libomp :: ompt/synchronization/test_lock.c
    libomp :: ompt/synchronization/test_nest_lock_parallel.c

They all look something like this:
******************** TEST 'libomp :: ompt/synchronization/test_nest_lock_parallel.c' FAILED ********************
Script:
--
/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/./bin/clang -fopenmp  -I /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test -I /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/src -L /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/lib  -I /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c -o /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/test/ompt/synchronization/Output/test_nest_lock_parallel.c.tmp -lm -latomic && /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/test/ompt/synchronization/Output/test_nest_lock_parallel.c.tmp | sort --numeric-sort --stable | /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/./bin/FileCheck /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c
--
Exit Code: 1

Command Output (stdout):
--
$ "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/./bin/clang" "-fopenmp" "-I" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test" "-I" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/src" "-L" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/lib" "-I" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c" "-o" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/test/ompt/synchronization/Output/test_nest_lock_parallel.c.tmp" "-lm" "-latomic"
$ "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/test/ompt/synchronization/Output/test_nest_lock_parallel.c.tmp"
$ "sort" "--numeric-sort" "--stable"
$ "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/./bin/FileCheck" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c"
# command stderr:
/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c:47:17: error: expected string not found in input
 // CHECK-NEXT: {{^}}[[MASTER_ID]]: current_address={{.*}}[[RETURN_ADDRESS]]
                ^
<stdin>:20:1: note: scanning from here
281474976710657: current_address=0x403658 or 0x403654
^
<stdin>:20:1: note: with variable "MASTER_ID" equal to "281474976710657"
281474976710657: current_address=0x403658 or 0x403654
^
<stdin>:20:1: note: with variable "RETURN_ADDRESS" equal to "0x403650"
281474976710657: current_address=0x403658 or 0x403654
^
<stdin>:20:13: note: possible intended match here
281474976710657: current_address=0x403658 or 0x403654
            ^

error: command failed with exit status: 1
Comment 1 Jonas Hahnfeld 2018-02-09 00:36:12 PST
+Paul who added support for OMPT on AArch64

The check for the return address is known to be a bit fragile, relies on the instructions emitted by the compiler and needs changes for every architecture. Paul did them for AArch64 but it looks like Clang 6.0 inserts another instruction (off by 4 bytes). In particular this might be related to GlobalIsel?

Anyway, can you get me an objdump of the related code region? There should be
 * a call to a __kmpc_* function
 * possibly a store instruction to save the return value
 * a NOP
If Clang now emits another instruction we can just add another possible address.
Comment 2 Joachim Protze 2018-02-09 03:26:57 PST
https://reviews.llvm.org/D43115 should address these issues. I agree with Jonas, that we would like to look into the assembly to understand what is going on there. We would appreciate to get an object dump for the respective code region.
Comment 3 Diana Picus 2018-02-09 03:28:37 PST
Thanks! I'll post the assembly as soon as I can.
Comment 4 Diana Picus 2018-02-09 03:48:09 PST
Created attachment 19848 [details]
Assembly for the synchronization/flush.c test

I've attached the disassembly for the flush.c test (since it superficially seemed smaller than the others). Hope that helps, please let me know if you need anything else. Sorry about the delay.
Comment 5 Jonas Hahnfeld 2018-02-09 04:13:54 PST
(In reply to Diana Picus from comment #4)
> Created attachment 19848 [details]
> Assembly for the synchronization/flush.c test
> 
> I've attached the disassembly for the flush.c test (since it superficially
> seemed smaller than the others). Hope that helps, please let me know if you
> need anything else. Sorry about the delay.

That's the interesting part:
  403344:	97fff4f3 	bl	400710 <__kmpc_flush@plt>
-> call runtime function

  403348:	d503201f 	nop
-> this nop is the return instruction and inserted by the test via inline assembly

  40334c:	14000001 	b	403350 <.omp_outlined.+0x30>
-> ... an unconditional branch to the next instruction

  403350:	f0000088 	adrp	x8, 416000 <_GLOBAL_OFFSET_TABLE_+0x28>
  403354:	f9408d08 	ldr	x8, [x8,#280]
  403358:	d63f0100 	blr	x8
  40335c:	f9400001 	ldr	x1, [x0]
-> ompt_get_thread_data()

  403360:	90000008 	adrp	x8, 403000 <on_ompt_callback_task_schedule+0x6c>
  403364:	910d4108 	add	x8, x8, #0x350
-> address of the label: 0x403350

  403368:	f1001102 	subs	x2, x8, #0x4
  40336c:	f1002103 	subs	x3, x8, #0x8
-> two possible values: 4 and 8 bytes before the label

  403370:	90000000 	adrp	x0, 403000 <on_ompt_callback_task_schedule+0x6c>
  403374:	913dfc00 	add	x0, x0, #0xf7f
-> formatting string

  403378:	97fff50e 	bl	4007b0 <printf@plt>
-> printf()

So I think this test would be fine if it actually checks for the second address (done in the linked patch). However, this only works if the runtime call returns no value. If it does have a return value and there is an additional store instruction, I'd assume that the branch is there as well which kills the assumptions.

I'm not sure where the branch instruction comes from, but if it's really related to GlobalIsel we'll end up having the same problem with other architectures if they do the switch...
Comment 6 Jonas Hahnfeld 2018-02-09 04:28:39 PST
(In reply to Jonas Hahnfeld from comment #5)
> I'm not sure where the branch instruction comes from, but if it's really
> related to GlobalIsel we'll end up having the same problem with other
> architectures if they do the switch...

It is, small example:
$ cat label.c
void test() {
  __asm__("nop");
label:
  printf("%p\n", &&label);
}

$ clang -target aarch64 -S label.c -o -
[...]
        //APP
        nop
        //NO_APP
        b       .LBB0_1
.Ltmp0:                                 // Block address taken
.LBB0_1:
[...]

$ clang -target aarch64 -fno-experimental-isel -S label.c -o -
[...]
        //APP
        nop
        //NO_APP
.Ltmp0:                                 // Block address taken
// %bb.1:
[...]

And I can get the same on x86:
$ clang -target x86_64 -S label.c -o -
[...]
        #APP
        nop
        #NO_APP
.Ltmp0:                                 # Block address taken
# %bb.1:
[...]

$ clang -target x86_64 -fexperimental-isel -S label.c -o -
[...]
        #APP
        nop
        #NO_APP
        jmp     .LBB0_4
.Ltmp0:                                 # Block address taken
.LBB0_4:
[...]

So if we don't "fix" GlobalIsel in the compiler we need to adapt the return addresses of all architectures. Diana, do you think it's worth messing with GlobalIsel?
Comment 7 Diana Picus 2018-02-09 04:47:16 PST
I think this is worth investigating deeper.

If I use -mllvm -print-after-all on your example, I get this:
# *** IR Dump After InstructionSelect ***:
# Machine code for function test: IsSSA, TracksLiveness, FailedISel, Legalized, RegBankSelected, Selected

%bb.1: derived from LLVM BB %0
        %2:_ = G_GLOBAL_VALUE @.str;
        %1:_ = COPY %2;
        INLINEASM $nop [sideeffect] [attdialect]
    Successors according to CFG: %bb.2(?%)

%bb.2: derived from LLVM BB %1, ADDRESS TAKEN
    Predecessors according to CFG: %bb.1
        ADJCALLSTACKDOWN 0, 0, implicit-def %sp, implicit %sp
        %x0 = COPY %1;
        %x1 = COPY %3:_;
        BL @printf, <regmask %fp %lr %b8 %b9 %b10 %b11 %b12 %b13 %b14 %b15 %d8 %d9 %d10 %d11 %d12 %d13 %d14 %d15 %h8 %h9 %h10 %h11 %h12 %h13 %h14 %h15 %s8 %s9 %s10 %s11 %s12 %s13 %s14 and 63 more...>, implicit-def %lr, implicit %sp, implicit %x0, implicit %x1, implicit-def %w0
        %0:_ = COPY %w0;
        ADJCALLSTACKUP 0, 0, implicit-def %sp, implicit %sp
        RET_ReallyLR

# End machine code for function test.

# *** IR Dump After AArch64 Instruction Selection ***:
# Machine code for function test: IsSSA, TracksLiveness

%bb.3: derived from LLVM BB %0
        INLINEASM $nop [sideeffect] [attdialect], !2
        B %bb.4
    Successors according to CFG: %bb.4(?%)

%bb.4: derived from LLVM BB %1, ADDRESS TAKEN
    Predecessors according to CFG: %bb.3
        ADJCALLSTACKDOWN 0, 0, implicit-def dead %sp, implicit %sp
        %0:gpr64 = MOVaddr target-flags(aarch64-page) @.str, target-flags(aarch64-pageoff, aarch64-nc) @.str; GPR64:%0
        %1:gpr64 = MOVaddrBA target-flags(aarch64-page) blockaddress(@test, %ir-block.1), target-flags(aarch64-pageoff, aarch64-nc) blockaddress(@test, %ir-block.1); GPR64:%1
        %x0 = COPY %0; GPR64:%0
        %x1 = COPY %1; GPR64:%1
        BL @printf, <regmask %fp %lr %b8 %b9 %b10 %b11 %b12 %b13 %b14 %b15 %d8 %d9 %d10 %d11 %d12 %d13 %d14 %d15 %h8 %h9 %h10 %h11 %h12 %h13 %h14 %h15 %s8 %s9 %s10 %s11 %s12 %s13 %s14 and 63 more...>, implicit-def dead %lr, implicit %sp, implicit %x0, implicit %x1, implicit-def %sp, implicit-def %w0
        ADJCALLSTACKUP 0, 0, implicit-def dead %sp, implicit %sp
        %2:gpr32all = COPY %w0; GPR32all:%2
        RET_ReallyLR

# End machine code for function test.

So it seems we're running both GlobalISel and DAGISel, and it's actually the DAGISel that introduces the branch... I'm not sure why we fallback here.
Comment 8 Jonas Hahnfeld 2018-02-09 04:53:43 PST
It gets even better: I tried running the full OpenMP test suite with -fexperimental-isel on x86 and there are *two* jumps in ompt/synchronization/master.c

  403ae6:       e8 15 cc ff ff          callq  400700 <__kmpc_end_master@plt>
  403aeb:       e9 00 00 00 00          jmpq   403af0 <.omp_outlined.+0xb0>
  403af0:       90                      nop
  403af1:       e9 00 00 00 00          jmpq   403af6 <.omp_outlined.+0xb6>

-> this is the position of the label
  403af6:       48 8b 05 4b 36 20 00    mov    0x20364b(%rip),%rax        # 607148 <ompt_get_thread_data>
  403afd:       ff d0                   callq  *%rax
  403aff:       48 8b 30                mov    (%rax),%rsi
  403b02:       b9 eb 47 40 00          mov    $0x4047eb,%ecx
  403b07:       89 cf                   mov    %ecx,%edi
  403b09:       48 8d 15 e5 ff ff ff    lea    -0x1b(%rip),%rdx        # 403af5 <.omp_outlined.+0xb5>
-> one NOP = 1 byte offset

  403b10:       48 8d 0d db ff ff ff    lea    -0x25(%rip),%rcx        # 403af2 <.omp_outlined.+0xb2>
-> NOP + MOV (for non-void functions)

  403b17:       4c 8d 05 d2 ff ff ff    lea    -0x2e(%rip),%r8        # 403af0 <.omp_outlined.+0xb0>
-> NOP + JMPQ, trying to account for GlobalIsel

  403b1e:       4c 8d 0d c8 ff ff ff    lea    -0x38(%rip),%r9        # 403aed <.omp_outlined.+0xad>
-> NOP + MOV + JMPQ

What would actually be needed here is JMPQ + NOP + JMPQ = 11 bytes :-(
Comment 9 Jonas Hahnfeld 2018-02-09 05:30:58 PST
(In reply to Diana Picus from comment #7)
> So it seems we're running both GlobalISel and DAGISel, and it's actually the
> DAGISel that introduces the branch... I'm not sure why we fallback here.

label.c:1:1: remark: unable to translate constant: i8* [-Rpass-missed=gisel-irtranslator]

Next question: Why does the fallback insert these branches and jumps?
Comment 10 daniel_l_sanders 2018-02-09 08:30:52 PST
In the output Diana provided in comment #7, I see the FailedISel attribute is present after the InstructionSelect pass. Going by the IR Translator message that Jonas mentioned in comment #9, it was most likely added by the IR Translator in which case most of the GlobalISel passes won't have done anything to that function. I don't see anything taking the address of the block so the IR Translator most likely failed on that.

The strange thing is that the FailedISel attribute causes the function to be reset before DAGISel runs on it so you should see the same behaviour between DAGISel-only and GlobalISel that fell back to DAGISel. Could there be some state that the ResetMachineFunction pass isn't resetting?
Comment 11 Jonas Hahnfeld 2018-02-09 08:36:20 PST
(In reply to daniel_l_sanders from comment #10)
> I don't see anything taking the address of the
> block so the IR Translator most likely failed on that.

That's probably "&&label" which is a GNU extension: It _should_ return the address of the next instruction right behind the label...
Comment 12 daniel_l_sanders 2018-02-09 08:39:41 PST
(In reply to Jonas Hahnfeld from comment #11)
> (In reply to daniel_l_sanders from comment #10)
> > I don't see anything taking the address of the
> > block so the IR Translator most likely failed on that.
> 
> That's probably "&&label" which is a GNU extension: It _should_ return the
> address of the next instruction right behind the label...

I mean I don't see it in the MIR. The DAGISel pass has blockaddress() operands referring to %ir-block.1 but GlobalISel's pass doesn't.
Comment 13 Quentin Colombet 2018-02-09 11:09:51 PST
(In reply to daniel_l_sanders from comment #10)
> The strange thing is that the FailedISel attribute causes the function to be
> reset before DAGISel runs on it so you should see the same behaviour between
> DAGISel-only and GlobalISel that fell back to DAGISel. Could there be some
> state that the ResetMachineFunction pass isn't resetting?

I agree with Daniel, when GISel falls back we should get the same output as DAGISel-only. If that's not the case, that's a bug.
Comment 14 Jonas Hahnfeld 2018-02-09 11:28:35 PST
(In reply to Quentin Colombet from comment #13)
> (In reply to daniel_l_sanders from comment #10)
> > The strange thing is that the FailedISel attribute causes the function to be
> > reset before DAGISel runs on it so you should see the same behaviour between
> > DAGISel-only and GlobalISel that fell back to DAGISel. Could there be some
> > state that the ResetMachineFunction pass isn't resetting?
> 
> I agree with Daniel, when GISel falls back we should get the same output as
> DAGISel-only. If that's not the case, that's a bug.

Okay, do you want to move this bug to GlobalISel and update the information accordingly?
Comment 15 Amara Emerson 2018-02-09 11:31:00 PST
I'll investigate this.
Comment 16 Amara Emerson 2018-02-09 13:34:35 PST
Ok this isn't really a bug, this is a code generation difference between using SelectionDAG and FastISel. GlobalISel is only involved indirectly, since at -O0 GISel replaces FastISel and falls back to SelectionDAG, whereas before FastISel would handle the block with the inline ASM and not fall back to SDAG.

FastISel has some specialized code in FastISel.cpp:fastEmitBranch() that will check if the successor block is the target of the unconditional branch, in which case it will not emit anything. SelectionDAG doesn't do that, so when GlobalISel gives up on the block, instead of trying FastISel, we fall back to SDAG and end up with the extra branch.

This is expected behavior in terms of falling back to SDAG, so I think the test needs to be updated to handle this new codegen.
Comment 17 Jonas Hahnfeld 2018-02-10 00:39:51 PST
(In reply to Amara Emerson from comment #16)
> Ok this isn't really a bug, this is a code generation difference between
> using SelectionDAG and FastISel. GlobalISel is only involved indirectly,
> since at -O0 GISel replaces FastISel and falls back to SelectionDAG, whereas
> before FastISel would handle the block with the inline ASM and not fall back
> to SDAG.
> 
> FastISel has some specialized code in FastISel.cpp:fastEmitBranch() that
> will check if the successor block is the target of the unconditional branch,
> in which case it will not emit anything. SelectionDAG doesn't do that, so
> when GlobalISel gives up on the block, instead of trying FastISel, we fall
> back to SDAG and end up with the extra branch.

Ok, this is the answer to "why do we have branches". But it doesn't explain why GlobalISel falls back at all.

> This is expected behavior in terms of falling back to SDAG, so I think the test > needs to be updated to handle this new codegen.

As said yesterday I don't think it's possible to do this reliably. And I'd argue that the generated code, while correct, doesn't make any sense.
Comment 18 Amara Emerson 2018-02-10 02:53:39 PST
Fallbacks are part of the design of GlobalISel, and as development continues fallbacks may increase, although we plan to reduce them over time. Why it fell back here I don't know as I'm not at my main machine, but it doesn't really matter since falling back is not a bug.

As for the code quality issue, we should have a separate PR for that.
Comment 19 Jonas Hahnfeld 2018-02-10 03:51:38 PST
(In reply to Jonas Hahnfeld from comment #17)
> But it doesn't explain why GlobalISel falls back at all.

I think it's because GlobalISel doesn't handle BlockAddress in IRTranslator::translate(). I tried to hack together some support for this but failed because I'm not familiar with the code at all.

So unless somebody else implements this in time for 6.0 we need to decide what to do:
1) Disable OMPT on AArch64 for now.
2) Disable OMPT *tests* on AArch64 for now - not really convincing to me.
3) Workaround test failures by disabling GlobalISel for OpenMP tests.

For #3 we could use -fno-experimental-isel, is that available in 6.0? Another option would be compiling the tests with "-O2 -fno-omit-frame-pointer": If I understood the discussions on llvm-dev correctly, that should still use the "old" path with FastISel? But I see at least one test failing on x86_64 that we would need to look into: ompt/tasks/serialized.c complains about frame addresses that don't match...
Comment 20 Amara Emerson 2018-02-10 12:14:33 PST
The -fno-experimental-isel flag is available on the 6.0 branch. I think as a short term workaround however using -O2 would be better as I’d rather not put the isel flag in places permanently if we can avoid it. Also add a comment for why it’s needed. I’ll raise a bug on implementing block address support at some point, so for the next release we can remove the workaround.
Comment 21 Jonas Hahnfeld 2018-02-12 07:54:50 PST
(In reply to Amara Emerson from comment #20)
> The -fno-experimental-isel flag is available on the 6.0 branch. I think as a
> short term workaround however using -O2 would be better as I’d rather not
> put the isel flag in places permanently if we can avoid it.
Thinking about it, I don't think we should do this kind of hack: We really want to disable GlobalISel so we should make this explicit. I posted https://reviews.llvm.org/D43195, please let me know if that works for you.

> [...] so for the next release we can remove the workaround.

Probably not: The OpenMP runtime library also supports older versions of compilers, not only the latest. So we need to keep this "workaround" as long as we want to support Clang 6.0.0. We can only drop it for future versions of the compiler...
Comment 22 Amara Emerson 2018-02-12 22:06:04 PST
I don't understand the issue it seems, why do the tests rely closely on code generation of a specific compiler? This is something that can't be seen as stable across any version of the compiler, and in cases like these the tests would be updated to support the particular of the compiler being supported, and also branched for a particular release.

Forcing the compiler to be generate code in a specific way despite it being correct is fragile, and in this case its just pure luck that we have a flag to revert to the old behavior.

So perhaps as you say the tests need to work with various different compilers, in which case the tests themselves need to be tolerant of the differences. I have no investment in ompt so it's your choice how you choose to deal with this;
Comment 23 Jonas Hahnfeld 2018-02-13 01:12:08 PST
(In reply to Amara Emerson from comment #22)
> I don't understand the issue it seems, why do the tests rely closely on code
> generation of a specific compiler? This is something that can't be seen as
> stable across any version of the compiler, and in cases like these the tests
> would be updated to support the particular of the compiler being supported,
> and also branched for a particular release.

Yes, the tests rely on a specific code generation and we do everything possible to make the compiler do what we want (inline assembly for example). This seems to work across all tested versions of GCC, Intel Compiler and Clang on all supported platforms - except the latest Clang 6.0.0 on AArch64 which uses GlobalISel.
Some background: We want to test return and frame addresses that are given to a tool during runtime. If you have a better idea how to do this, I think we would be happy to listen.

We can't just update the test because the difference with GlobalISel isn't stable either: In some cases it produces one additional branch instruction, in some others there are two and I think if I keep on digging, I'll be able to construct a test where there are three of them.

> Forcing the compiler to be generate code in a specific way despite it being
> correct is fragile, and in this case its just pure luck that we have a flag
> to revert to the old behavior.

In your proposed workaround of using optimization it's yet greater luck that GlobalISel is only enabled by default for O0. I'm guessing you want to change that at some point in the future, so if we don't want GlobalISel in the tests we should explicitly say so.

> So perhaps as you say the tests need to work with various different
> compilers, in which case the tests themselves need to be tolerant of the
> differences. I have no investment in ompt so it's your choice how you choose
> to deal with this;

As I said we have tested various different compilers and GlobalISel actually generates the first assembly that fails these tests. And I think we don't need to discuss that branches to the very next instruction are not the best the compiler can do. I understand that this will be solved in the future, but until we have support for blockaddress I think it's just the best option to disable GlobalISel.
Comment 24 Diana Picus 2018-02-13 02:54:41 PST
You're really making it sound like GlobalISel is the main villain here, but IIUC those branches are generated by DAGISel. The only "questionable" thing that GlobalISel is doing in this case is that it's falling back directly to DAGISel instead of FastISel. Should we maybe reconsider falling back to FastISel? Naturally, this will come with some extra overhead in cases where FastISel can't handle it either and it needs to fallback again to DAGISel, but maybe preserving the current behaviour is more important?
Comment 25 Jonas Hahnfeld 2018-02-13 05:19:38 PST
(In reply to Diana Picus from comment #24)
> You're really making it sound like GlobalISel is the main villain here, but
> IIUC those branches are generated by DAGISel. The only "questionable" thing
> that GlobalISel is doing in this case is that it's falling back directly to
> DAGISel instead of FastISel. Should we maybe reconsider falling back to
> FastISel? Naturally, this will come with some extra overhead in cases where
> FastISel can't handle it either and it needs to fallback again to DAGISel,
> but maybe preserving the current behaviour is more important?

I didn't want to blame GlobalISel and I acknowledge that the branches are inserted by SelectionDAG. I'm also sure that all people working on it are doing that for a reason (not very familiar with this matter, didn't yet find enough time to carefully listen to the recorded talks about GlobalISel from the past DevMtgs).

What I tried to express is that the tests "broke" when switching from FastISel to GlobalISel and that we can temporarily workaround this by disabling GlobalISel. I don't really like that kind of workaround, but I think it's the best thing we can do after you found the problem in rc2. And I surely won't say no when we can drop the flag again from trunk.
So please just take the patch as my opinion of what we should do in the short-term and merge to release_60.
Comment 26 Amara Emerson 2018-02-13 06:19:16 PST
Unless Diana thinks differently, if you feel this is the right choice then go ahead. Chances are though it'll be libomp developers that will need to remember to remove these workarounds when they're not needed.

As for whether we should be falling back to FastISel, I'm open to the idea but I think we need stronger justification than these tests.
Comment 27 Diana Picus 2018-02-13 06:27:50 PST
(In reply to Amara Emerson from comment #26)
> Unless Diana thinks differently, if you feel this is the right choice then
> go ahead. Chances are though it'll be libomp developers that will need to
> remember to remove these workarounds when they're not needed.
> 
> As for whether we should be falling back to FastISel, I'm open to the idea
> but I think we need stronger justification than these tests.

As long as the libomp devs are happy with this workaround, I have nothing against it. I was just proposing alternatives. I agree that this late in the release process we should have better justification for more intrusive changes like changing the fallback or fixing SelectionDAG. Feel free to merge the proposed patch.
Comment 28 Quentin Colombet 2018-02-13 14:16:53 PST
FWIW, when we changed the fallback path of GlobalISel to explicitly not used FastISel I said it wasn't a good idea for exactly the problem we have today:
- We cannot fix codegen in GlobalISel since we don't support that case yet
- Users still get a different codegen than what they are used to and we are not going to fix SDISel to behave like FastISel

In other words, I would be in favor of not doing anything special for GISel when we fallback (revert r323369?)
Comment 29 Jonas Hahnfeld 2018-02-14 10:10:28 PST
(In reply to Amara Emerson from comment #26)
> Unless Diana thinks differently, if you feel this is the right choice then
> go ahead. Chances are though it'll be libomp developers that will need to
> remember to remove these workarounds when they're not needed.

If you subscribe me on the bug to implement blockaddress, I can probably take care of posting a patch to remove the workaround afterwards.

(In reply to Diana Picus from comment #27)
> As long as the libomp devs are happy with this workaround, I have nothing
> against it. I was just proposing alternatives. I agree that this late in the
> release process we should have better justification for more intrusive
> changes like changing the fallback or fixing SelectionDAG. Feel free to
> merge the proposed patch.

Could you verify that this indeed works?
Comment 30 Diana Picus 2018-02-14 10:12:29 PST
(In reply to Jonas Hahnfeld from comment #29)
> Could you verify that this indeed works?

Yes, I've already started it, but it's still in progress, so I'll most likely let you know tomorrow (Europe time).
Comment 31 Jonas Hahnfeld 2018-02-15 00:20:33 PST
(In reply to Diana Picus from comment #30)
> (In reply to Jonas Hahnfeld from comment #29)
> > Could you verify that this indeed works?
> 
> Yes, I've already started it, but it's still in progress, so I'll most
> likely let you know tomorrow (Europe time).

Verified and committed in r325218. Hans, can we merge this to release_60?
Comment 32 Hans Wennborg 2018-02-15 03:00:50 PST
(In reply to Jonas Hahnfeld from comment #31)
> (In reply to Diana Picus from comment #30)
> > (In reply to Jonas Hahnfeld from comment #29)
> > > Could you verify that this indeed works?
> > 
> > Yes, I've already started it, but it's still in progress, so I'll most
> > likely let you know tomorrow (Europe time).
> 
> Verified and committed in r325218. Hans, can we merge this to release_60?

Merged in r325230.