-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64] Incorrect RETURN_ADDRESS in tests #35661
Comments
assigned to @zmodem |
+Paul who added support for OMPT on AArch64 The check for the return address is known to be a bit fragile, relies on the instructions emitted by the compiler and needs changes for every architecture. Paul did them for AArch64 but it looks like Clang 6.0 inserts another instruction (off by 4 bytes). In particular this might be related to GlobalIsel? Anyway, can you get me an objdump of the related code region? There should be
|
https://reviews.llvm.org/D43115 should address these issues. I agree with Jonas, that we would like to look into the assembly to understand what is going on there. We would appreciate to get an object dump for the respective code region. |
Thanks! I'll post the assembly as soon as I can. |
Assembly for the synchronization/flush.c test |
That's the interesting part: 403348: d503201f nop 40334c: 14000001 b 403350 <.omp_outlined.+0x30> 403350: f0000088 adrp x8, 416000 <GLOBAL_OFFSET_TABLE+0x28> 403360: 90000008 adrp x8, 403000 <on_ompt_callback_task_schedule+0x6c> 403368: f1001102 subs x2, x8, #0x4 403370: 90000000 adrp x0, 403000 <on_ompt_callback_task_schedule+0x6c> 403378: 97fff50e bl 4007b0 printf@plt So I think this test would be fine if it actually checks for the second address (done in the linked patch). However, this only works if the runtime call returns no value. If it does have a return value and there is an additional store instruction, I'd assume that the branch is there as well which kills the assumptions. I'm not sure where the branch instruction comes from, but if it's really related to GlobalIsel we'll end up having the same problem with other architectures if they do the switch... |
It is, small example: $ clang -target aarch64 -S label.c -o - $ clang -target aarch64 -fno-experimental-isel -S label.c -o - And I can get the same on x86: %bb.1:[...] $ clang -target x86_64 -fexperimental-isel -S label.c -o - So if we don't "fix" GlobalIsel in the compiler we need to adapt the return addresses of all architectures. Diana, do you think it's worth messing with GlobalIsel? |
I think this is worth investigating deeper. If I use -mllvm -print-after-all on your example, I get this: *** IR Dump After InstructionSelect ***:Machine code for function test: IsSSA, TracksLiveness, FailedISel, Legalized, RegBankSelected, Selected%bb.1: derived from LLVM BB %0 %bb.2: derived from LLVM BB %1, ADDRESS TAKEN End machine code for function test.*** IR Dump After AArch64 Instruction Selection ***:Machine code for function test: IsSSA, TracksLiveness%bb.3: derived from LLVM BB %0 %bb.4: derived from LLVM BB %1, ADDRESS TAKEN End machine code for function test.So it seems we're running both GlobalISel and DAGISel, and it's actually the DAGISel that introduces the branch... I'm not sure why we fallback here. |
It gets even better: I tried running the full OpenMP test suite with -fexperimental-isel on x86 and there are two jumps in ompt/synchronization/master.c 403ae6: e8 15 cc ff ff callq 400700 __kmpc_end_master@plt -> this is the position of the label 403b10: 48 8d 0d db ff ff ff lea -0x25(%rip),%rcx # 403af2 <.omp_outlined.+0xb2> 403b17: 4c 8d 05 d2 ff ff ff lea -0x2e(%rip),%r8 # 403af0 <.omp_outlined.+0xb0> 403b1e: 4c 8d 0d c8 ff ff ff lea -0x38(%rip),%r9 # 403aed <.omp_outlined.+0xad> What would actually be needed here is JMPQ + NOP + JMPQ = 11 bytes :-( |
label.c:1:1: remark: unable to translate constant: i8* [-Rpass-missed=gisel-irtranslator] Next question: Why does the fallback insert these branches and jumps? |
In the output Diana provided in comment #7, I see the FailedISel attribute is present after the InstructionSelect pass. Going by the IR Translator message that Jonas mentioned in comment #9, it was most likely added by the IR Translator in which case most of the GlobalISel passes won't have done anything to that function. I don't see anything taking the address of the block so the IR Translator most likely failed on that. The strange thing is that the FailedISel attribute causes the function to be reset before DAGISel runs on it so you should see the same behaviour between DAGISel-only and GlobalISel that fell back to DAGISel. Could there be some state that the ResetMachineFunction pass isn't resetting? |
That's probably "&&label" which is a GNU extension: It should return the address of the next instruction right behind the label... |
I mean I don't see it in the MIR. The DAGISel pass has blockaddress() operands referring to %ir-block.1 but GlobalISel's pass doesn't. |
I agree with Daniel, when GISel falls back we should get the same output as DAGISel-only. If that's not the case, that's a bug. |
Okay, do you want to move this bug to GlobalISel and update the information accordingly? |
I'll investigate this. |
Ok this isn't really a bug, this is a code generation difference between using SelectionDAG and FastISel. GlobalISel is only involved indirectly, since at -O0 GISel replaces FastISel and falls back to SelectionDAG, whereas before FastISel would handle the block with the inline ASM and not fall back to SDAG. FastISel has some specialized code in FastISel.cpp:fastEmitBranch() that will check if the successor block is the target of the unconditional branch, in which case it will not emit anything. SelectionDAG doesn't do that, so when GlobalISel gives up on the block, instead of trying FastISel, we fall back to SDAG and end up with the extra branch. This is expected behavior in terms of falling back to SDAG, so I think the test needs to be updated to handle this new codegen. |
Ok, this is the answer to "why do we have branches". But it doesn't explain why GlobalISel falls back at all.
As said yesterday I don't think it's possible to do this reliably. And I'd argue that the generated code, while correct, doesn't make any sense. |
Fallbacks are part of the design of GlobalISel, and as development continues fallbacks may increase, although we plan to reduce them over time. Why it fell back here I don't know as I'm not at my main machine, but it doesn't really matter since falling back is not a bug. As for the code quality issue, we should have a separate PR for that. |
I think it's because GlobalISel doesn't handle BlockAddress in IRTranslator::translate(). I tried to hack together some support for this but failed because I'm not familiar with the code at all. So unless somebody else implements this in time for 6.0 we need to decide what to do:
For #3 we could use -fno-experimental-isel, is that available in 6.0? Another option would be compiling the tests with "-O2 -fno-omit-frame-pointer": If I understood the discussions on llvm-dev correctly, that should still use the "old" path with FastISel? But I see at least one test failing on x86_64 that we would need to look into: ompt/tasks/serialized.c complains about frame addresses that don't match... |
The -fno-experimental-isel flag is available on the 6.0 branch. I think as a short term workaround however using -O2 would be better as I’d rather not put the isel flag in places permanently if we can avoid it. Also add a comment for why it’s needed. I’ll raise a bug on implementing block address support at some point, so for the next release we can remove the workaround. |
Probably not: The OpenMP runtime library also supports older versions of compilers, not only the latest. So we need to keep this "workaround" as long as we want to support Clang 6.0.0. We can only drop it for future versions of the compiler... |
I don't understand the issue it seems, why do the tests rely closely on code generation of a specific compiler? This is something that can't be seen as stable across any version of the compiler, and in cases like these the tests would be updated to support the particular of the compiler being supported, and also branched for a particular release. Forcing the compiler to be generate code in a specific way despite it being correct is fragile, and in this case its just pure luck that we have a flag to revert to the old behavior. So perhaps as you say the tests need to work with various different compilers, in which case the tests themselves need to be tolerant of the differences. I have no investment in ompt so it's your choice how you choose to deal with this; |
Yes, the tests rely on a specific code generation and we do everything possible to make the compiler do what we want (inline assembly for example). This seems to work across all tested versions of GCC, Intel Compiler and Clang on all supported platforms - except the latest Clang 6.0.0 on AArch64 which uses GlobalISel. We can't just update the test because the difference with GlobalISel isn't stable either: In some cases it produces one additional branch instruction, in some others there are two and I think if I keep on digging, I'll be able to construct a test where there are three of them.
In your proposed workaround of using optimization it's yet greater luck that GlobalISel is only enabled by default for O0. I'm guessing you want to change that at some point in the future, so if we don't want GlobalISel in the tests we should explicitly say so.
As I said we have tested various different compilers and GlobalISel actually generates the first assembly that fails these tests. And I think we don't need to discuss that branches to the very next instruction are not the best the compiler can do. I understand that this will be solved in the future, but until we have support for blockaddress I think it's just the best option to disable GlobalISel. |
You're really making it sound like GlobalISel is the main villain here, but IIUC those branches are generated by DAGISel. The only "questionable" thing that GlobalISel is doing in this case is that it's falling back directly to DAGISel instead of FastISel. Should we maybe reconsider falling back to FastISel? Naturally, this will come with some extra overhead in cases where FastISel can't handle it either and it needs to fallback again to DAGISel, but maybe preserving the current behaviour is more important? |
I didn't want to blame GlobalISel and I acknowledge that the branches are inserted by SelectionDAG. I'm also sure that all people working on it are doing that for a reason (not very familiar with this matter, didn't yet find enough time to carefully listen to the recorded talks about GlobalISel from the past DevMtgs). What I tried to express is that the tests "broke" when switching from FastISel to GlobalISel and that we can temporarily workaround this by disabling GlobalISel. I don't really like that kind of workaround, but I think it's the best thing we can do after you found the problem in rc2. And I surely won't say no when we can drop the flag again from trunk. |
Unless Diana thinks differently, if you feel this is the right choice then go ahead. Chances are though it'll be libomp developers that will need to remember to remove these workarounds when they're not needed. As for whether we should be falling back to FastISel, I'm open to the idea but I think we need stronger justification than these tests. |
As long as the libomp devs are happy with this workaround, I have nothing against it. I was just proposing alternatives. I agree that this late in the release process we should have better justification for more intrusive changes like changing the fallback or fixing SelectionDAG. Feel free to merge the proposed patch. |
FWIW, when we changed the fallback path of GlobalISel to explicitly not used FastISel I said it wasn't a good idea for exactly the problem we have today:
In other words, I would be in favor of not doing anything special for GISel when we fallback (revert r323369?) |
If you subscribe me on the bug to implement blockaddress, I can probably take care of posting a patch to remove the workaround afterwards.
Could you verify that this indeed works? |
Yes, I've already started it, but it's still in progress, so I'll most likely let you know tomorrow (Europe time). |
Verified and committed in r325218. Hans, can we merge this to release_60? |
Merged in r325230. |
mentioned in issue llvm/llvm-bugzilla-archive#36390 |
mentioned in issue llvm/llvm-bugzilla-archive#40131 |
Extended Description
I'm getting these failures on AArch64 in 6.0.0-rc2:
Failing Tests (6):
libomp :: ompt/misc/control_tool.c
libomp :: ompt/synchronization/flush.c
libomp :: ompt/synchronization/master.c
libomp :: ompt/synchronization/taskwait.c
libomp :: ompt/synchronization/test_lock.c
libomp :: ompt/synchronization/test_nest_lock_parallel.c
They all look something like this:
******************** TEST 'libomp :: ompt/synchronization/test_nest_lock_parallel.c' FAILED ********************
Script:
/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/./bin/clang -fopenmp -I /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test -I /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/src -L /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/lib -I /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c -o /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/test/ompt/synchronization/Output/test_nest_lock_parallel.c.tmp -lm -latomic && /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/test/ompt/synchronization/Output/test_nest_lock_parallel.c.tmp | sort --numeric-sort --stable | /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/./bin/FileCheck /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c
Exit Code: 1
Command Output (stdout):
$ "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/./bin/clang" "-fopenmp" "-I" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test" "-I" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/src" "-L" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/lib" "-I" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c" "-o" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/test/ompt/synchronization/Output/test_nest_lock_parallel.c.tmp" "-lm" "-latomic"
$ "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/test/ompt/synchronization/Output/test_nest_lock_parallel.c.tmp"
$ "sort" "--numeric-sort" "--stable"
$ "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/./bin/FileCheck" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c"
command stderr:
/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c:47:17: error: expected string not found in input
// CHECK-NEXT: {{^}}[[MASTER_ID]]: current_address={{.*}}[[RETURN_ADDRESS]]
^
:20:1: note: scanning from here
281474976710657: current_address=0x403658 or 0x403654
^
:20:1: note: with variable "MASTER_ID" equal to "281474976710657"
281474976710657: current_address=0x403658 or 0x403654
^
:20:1: note: with variable "RETURN_ADDRESS" equal to "0x403650"
281474976710657: current_address=0x403658 or 0x403654
^
:20:13: note: possible intended match here
281474976710657: current_address=0x403658 or 0x403654
^
error: command failed with exit status: 1
The text was updated successfully, but these errors were encountered: