Created attachment 19846 [details] Full failure report I'm getting these failures on AArch64 in 6.0.0-rc2: Failing Tests (6): libomp :: ompt/misc/control_tool.c libomp :: ompt/synchronization/flush.c libomp :: ompt/synchronization/master.c libomp :: ompt/synchronization/taskwait.c libomp :: ompt/synchronization/test_lock.c libomp :: ompt/synchronization/test_nest_lock_parallel.c They all look something like this: ******************** TEST 'libomp :: ompt/synchronization/test_nest_lock_parallel.c' FAILED ******************** Script: -- /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/./bin/clang -fopenmp -I /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test -I /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/src -L /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/lib -I /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c -o /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/test/ompt/synchronization/Output/test_nest_lock_parallel.c.tmp -lm -latomic && /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/test/ompt/synchronization/Output/test_nest_lock_parallel.c.tmp | sort --numeric-sort --stable | /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/./bin/FileCheck /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c -- Exit Code: 1 Command Output (stdout): -- $ "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/./bin/clang" "-fopenmp" "-I" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test" "-I" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/src" "-L" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/lib" "-I" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c" "-o" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/test/ompt/synchronization/Output/test_nest_lock_parallel.c.tmp" "-lm" "-latomic" $ "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/projects/openmp/runtime/test/ompt/synchronization/Output/test_nest_lock_parallel.c.tmp" $ "sort" "--numeric-sort" "--stable" $ "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/Phase3/Release/llvmCore-6.0.0-rc2.obj/./bin/FileCheck" "/home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c" # command stderr: /home/tcwg-buildslave/workspace/tcwg-llvm-release/tcwg-apm_64-build/rc2/llvm.src/projects/openmp/runtime/test/ompt/synchronization/test_nest_lock_parallel.c:47:17: error: expected string not found in input // CHECK-NEXT: {{^}}[[MASTER_ID]]: current_address={{.*}}[[RETURN_ADDRESS]] ^ <stdin>:20:1: note: scanning from here 281474976710657: current_address=0x403658 or 0x403654 ^ <stdin>:20:1: note: with variable "MASTER_ID" equal to "281474976710657" 281474976710657: current_address=0x403658 or 0x403654 ^ <stdin>:20:1: note: with variable "RETURN_ADDRESS" equal to "0x403650" 281474976710657: current_address=0x403658 or 0x403654 ^ <stdin>:20:13: note: possible intended match here 281474976710657: current_address=0x403658 or 0x403654 ^ error: command failed with exit status: 1
+Paul who added support for OMPT on AArch64 The check for the return address is known to be a bit fragile, relies on the instructions emitted by the compiler and needs changes for every architecture. Paul did them for AArch64 but it looks like Clang 6.0 inserts another instruction (off by 4 bytes). In particular this might be related to GlobalIsel? Anyway, can you get me an objdump of the related code region? There should be * a call to a __kmpc_* function * possibly a store instruction to save the return value * a NOP If Clang now emits another instruction we can just add another possible address.
https://reviews.llvm.org/D43115 should address these issues. I agree with Jonas, that we would like to look into the assembly to understand what is going on there. We would appreciate to get an object dump for the respective code region.
Thanks! I'll post the assembly as soon as I can.
Created attachment 19848 [details] Assembly for the synchronization/flush.c test I've attached the disassembly for the flush.c test (since it superficially seemed smaller than the others). Hope that helps, please let me know if you need anything else. Sorry about the delay.
(In reply to Diana Picus from comment #4) > Created attachment 19848 [details] > Assembly for the synchronization/flush.c test > > I've attached the disassembly for the flush.c test (since it superficially > seemed smaller than the others). Hope that helps, please let me know if you > need anything else. Sorry about the delay. That's the interesting part: 403344: 97fff4f3 bl 400710 <__kmpc_flush@plt> -> call runtime function 403348: d503201f nop -> this nop is the return instruction and inserted by the test via inline assembly 40334c: 14000001 b 403350 <.omp_outlined.+0x30> -> ... an unconditional branch to the next instruction 403350: f0000088 adrp x8, 416000 <_GLOBAL_OFFSET_TABLE_+0x28> 403354: f9408d08 ldr x8, [x8,#280] 403358: d63f0100 blr x8 40335c: f9400001 ldr x1, [x0] -> ompt_get_thread_data() 403360: 90000008 adrp x8, 403000 <on_ompt_callback_task_schedule+0x6c> 403364: 910d4108 add x8, x8, #0x350 -> address of the label: 0x403350 403368: f1001102 subs x2, x8, #0x4 40336c: f1002103 subs x3, x8, #0x8 -> two possible values: 4 and 8 bytes before the label 403370: 90000000 adrp x0, 403000 <on_ompt_callback_task_schedule+0x6c> 403374: 913dfc00 add x0, x0, #0xf7f -> formatting string 403378: 97fff50e bl 4007b0 <printf@plt> -> printf() So I think this test would be fine if it actually checks for the second address (done in the linked patch). However, this only works if the runtime call returns no value. If it does have a return value and there is an additional store instruction, I'd assume that the branch is there as well which kills the assumptions. I'm not sure where the branch instruction comes from, but if it's really related to GlobalIsel we'll end up having the same problem with other architectures if they do the switch...
(In reply to Jonas Hahnfeld from comment #5) > I'm not sure where the branch instruction comes from, but if it's really > related to GlobalIsel we'll end up having the same problem with other > architectures if they do the switch... It is, small example: $ cat label.c void test() { __asm__("nop"); label: printf("%p\n", &&label); } $ clang -target aarch64 -S label.c -o - [...] //APP nop //NO_APP b .LBB0_1 .Ltmp0: // Block address taken .LBB0_1: [...] $ clang -target aarch64 -fno-experimental-isel -S label.c -o - [...] //APP nop //NO_APP .Ltmp0: // Block address taken // %bb.1: [...] And I can get the same on x86: $ clang -target x86_64 -S label.c -o - [...] #APP nop #NO_APP .Ltmp0: # Block address taken # %bb.1: [...] $ clang -target x86_64 -fexperimental-isel -S label.c -o - [...] #APP nop #NO_APP jmp .LBB0_4 .Ltmp0: # Block address taken .LBB0_4: [...] So if we don't "fix" GlobalIsel in the compiler we need to adapt the return addresses of all architectures. Diana, do you think it's worth messing with GlobalIsel?
I think this is worth investigating deeper. If I use -mllvm -print-after-all on your example, I get this: # *** IR Dump After InstructionSelect ***: # Machine code for function test: IsSSA, TracksLiveness, FailedISel, Legalized, RegBankSelected, Selected %bb.1: derived from LLVM BB %0 %2:_ = G_GLOBAL_VALUE @.str; %1:_ = COPY %2; INLINEASM $nop [sideeffect] [attdialect] Successors according to CFG: %bb.2(?%) %bb.2: derived from LLVM BB %1, ADDRESS TAKEN Predecessors according to CFG: %bb.1 ADJCALLSTACKDOWN 0, 0, implicit-def %sp, implicit %sp %x0 = COPY %1; %x1 = COPY %3:_; BL @printf, <regmask %fp %lr %b8 %b9 %b10 %b11 %b12 %b13 %b14 %b15 %d8 %d9 %d10 %d11 %d12 %d13 %d14 %d15 %h8 %h9 %h10 %h11 %h12 %h13 %h14 %h15 %s8 %s9 %s10 %s11 %s12 %s13 %s14 and 63 more...>, implicit-def %lr, implicit %sp, implicit %x0, implicit %x1, implicit-def %w0 %0:_ = COPY %w0; ADJCALLSTACKUP 0, 0, implicit-def %sp, implicit %sp RET_ReallyLR # End machine code for function test. # *** IR Dump After AArch64 Instruction Selection ***: # Machine code for function test: IsSSA, TracksLiveness %bb.3: derived from LLVM BB %0 INLINEASM $nop [sideeffect] [attdialect], !2 B %bb.4 Successors according to CFG: %bb.4(?%) %bb.4: derived from LLVM BB %1, ADDRESS TAKEN Predecessors according to CFG: %bb.3 ADJCALLSTACKDOWN 0, 0, implicit-def dead %sp, implicit %sp %0:gpr64 = MOVaddr target-flags(aarch64-page) @.str, target-flags(aarch64-pageoff, aarch64-nc) @.str; GPR64:%0 %1:gpr64 = MOVaddrBA target-flags(aarch64-page) blockaddress(@test, %ir-block.1), target-flags(aarch64-pageoff, aarch64-nc) blockaddress(@test, %ir-block.1); GPR64:%1 %x0 = COPY %0; GPR64:%0 %x1 = COPY %1; GPR64:%1 BL @printf, <regmask %fp %lr %b8 %b9 %b10 %b11 %b12 %b13 %b14 %b15 %d8 %d9 %d10 %d11 %d12 %d13 %d14 %d15 %h8 %h9 %h10 %h11 %h12 %h13 %h14 %h15 %s8 %s9 %s10 %s11 %s12 %s13 %s14 and 63 more...>, implicit-def dead %lr, implicit %sp, implicit %x0, implicit %x1, implicit-def %sp, implicit-def %w0 ADJCALLSTACKUP 0, 0, implicit-def dead %sp, implicit %sp %2:gpr32all = COPY %w0; GPR32all:%2 RET_ReallyLR # End machine code for function test. So it seems we're running both GlobalISel and DAGISel, and it's actually the DAGISel that introduces the branch... I'm not sure why we fallback here.
It gets even better: I tried running the full OpenMP test suite with -fexperimental-isel on x86 and there are *two* jumps in ompt/synchronization/master.c 403ae6: e8 15 cc ff ff callq 400700 <__kmpc_end_master@plt> 403aeb: e9 00 00 00 00 jmpq 403af0 <.omp_outlined.+0xb0> 403af0: 90 nop 403af1: e9 00 00 00 00 jmpq 403af6 <.omp_outlined.+0xb6> -> this is the position of the label 403af6: 48 8b 05 4b 36 20 00 mov 0x20364b(%rip),%rax # 607148 <ompt_get_thread_data> 403afd: ff d0 callq *%rax 403aff: 48 8b 30 mov (%rax),%rsi 403b02: b9 eb 47 40 00 mov $0x4047eb,%ecx 403b07: 89 cf mov %ecx,%edi 403b09: 48 8d 15 e5 ff ff ff lea -0x1b(%rip),%rdx # 403af5 <.omp_outlined.+0xb5> -> one NOP = 1 byte offset 403b10: 48 8d 0d db ff ff ff lea -0x25(%rip),%rcx # 403af2 <.omp_outlined.+0xb2> -> NOP + MOV (for non-void functions) 403b17: 4c 8d 05 d2 ff ff ff lea -0x2e(%rip),%r8 # 403af0 <.omp_outlined.+0xb0> -> NOP + JMPQ, trying to account for GlobalIsel 403b1e: 4c 8d 0d c8 ff ff ff lea -0x38(%rip),%r9 # 403aed <.omp_outlined.+0xad> -> NOP + MOV + JMPQ What would actually be needed here is JMPQ + NOP + JMPQ = 11 bytes :-(
(In reply to Diana Picus from comment #7) > So it seems we're running both GlobalISel and DAGISel, and it's actually the > DAGISel that introduces the branch... I'm not sure why we fallback here. label.c:1:1: remark: unable to translate constant: i8* [-Rpass-missed=gisel-irtranslator] Next question: Why does the fallback insert these branches and jumps?
In the output Diana provided in comment #7, I see the FailedISel attribute is present after the InstructionSelect pass. Going by the IR Translator message that Jonas mentioned in comment #9, it was most likely added by the IR Translator in which case most of the GlobalISel passes won't have done anything to that function. I don't see anything taking the address of the block so the IR Translator most likely failed on that. The strange thing is that the FailedISel attribute causes the function to be reset before DAGISel runs on it so you should see the same behaviour between DAGISel-only and GlobalISel that fell back to DAGISel. Could there be some state that the ResetMachineFunction pass isn't resetting?
(In reply to daniel_l_sanders from comment #10) > I don't see anything taking the address of the > block so the IR Translator most likely failed on that. That's probably "&&label" which is a GNU extension: It _should_ return the address of the next instruction right behind the label...
(In reply to Jonas Hahnfeld from comment #11) > (In reply to daniel_l_sanders from comment #10) > > I don't see anything taking the address of the > > block so the IR Translator most likely failed on that. > > That's probably "&&label" which is a GNU extension: It _should_ return the > address of the next instruction right behind the label... I mean I don't see it in the MIR. The DAGISel pass has blockaddress() operands referring to %ir-block.1 but GlobalISel's pass doesn't.
(In reply to daniel_l_sanders from comment #10) > The strange thing is that the FailedISel attribute causes the function to be > reset before DAGISel runs on it so you should see the same behaviour between > DAGISel-only and GlobalISel that fell back to DAGISel. Could there be some > state that the ResetMachineFunction pass isn't resetting? I agree with Daniel, when GISel falls back we should get the same output as DAGISel-only. If that's not the case, that's a bug.
(In reply to Quentin Colombet from comment #13) > (In reply to daniel_l_sanders from comment #10) > > The strange thing is that the FailedISel attribute causes the function to be > > reset before DAGISel runs on it so you should see the same behaviour between > > DAGISel-only and GlobalISel that fell back to DAGISel. Could there be some > > state that the ResetMachineFunction pass isn't resetting? > > I agree with Daniel, when GISel falls back we should get the same output as > DAGISel-only. If that's not the case, that's a bug. Okay, do you want to move this bug to GlobalISel and update the information accordingly?
I'll investigate this.
Ok this isn't really a bug, this is a code generation difference between using SelectionDAG and FastISel. GlobalISel is only involved indirectly, since at -O0 GISel replaces FastISel and falls back to SelectionDAG, whereas before FastISel would handle the block with the inline ASM and not fall back to SDAG. FastISel has some specialized code in FastISel.cpp:fastEmitBranch() that will check if the successor block is the target of the unconditional branch, in which case it will not emit anything. SelectionDAG doesn't do that, so when GlobalISel gives up on the block, instead of trying FastISel, we fall back to SDAG and end up with the extra branch. This is expected behavior in terms of falling back to SDAG, so I think the test needs to be updated to handle this new codegen.
(In reply to Amara Emerson from comment #16) > Ok this isn't really a bug, this is a code generation difference between > using SelectionDAG and FastISel. GlobalISel is only involved indirectly, > since at -O0 GISel replaces FastISel and falls back to SelectionDAG, whereas > before FastISel would handle the block with the inline ASM and not fall back > to SDAG. > > FastISel has some specialized code in FastISel.cpp:fastEmitBranch() that > will check if the successor block is the target of the unconditional branch, > in which case it will not emit anything. SelectionDAG doesn't do that, so > when GlobalISel gives up on the block, instead of trying FastISel, we fall > back to SDAG and end up with the extra branch. Ok, this is the answer to "why do we have branches". But it doesn't explain why GlobalISel falls back at all. > This is expected behavior in terms of falling back to SDAG, so I think the test > needs to be updated to handle this new codegen. As said yesterday I don't think it's possible to do this reliably. And I'd argue that the generated code, while correct, doesn't make any sense.
Fallbacks are part of the design of GlobalISel, and as development continues fallbacks may increase, although we plan to reduce them over time. Why it fell back here I don't know as I'm not at my main machine, but it doesn't really matter since falling back is not a bug. As for the code quality issue, we should have a separate PR for that.
(In reply to Jonas Hahnfeld from comment #17) > But it doesn't explain why GlobalISel falls back at all. I think it's because GlobalISel doesn't handle BlockAddress in IRTranslator::translate(). I tried to hack together some support for this but failed because I'm not familiar with the code at all. So unless somebody else implements this in time for 6.0 we need to decide what to do: 1) Disable OMPT on AArch64 for now. 2) Disable OMPT *tests* on AArch64 for now - not really convincing to me. 3) Workaround test failures by disabling GlobalISel for OpenMP tests. For #3 we could use -fno-experimental-isel, is that available in 6.0? Another option would be compiling the tests with "-O2 -fno-omit-frame-pointer": If I understood the discussions on llvm-dev correctly, that should still use the "old" path with FastISel? But I see at least one test failing on x86_64 that we would need to look into: ompt/tasks/serialized.c complains about frame addresses that don't match...
The -fno-experimental-isel flag is available on the 6.0 branch. I think as a short term workaround however using -O2 would be better as I’d rather not put the isel flag in places permanently if we can avoid it. Also add a comment for why it’s needed. I’ll raise a bug on implementing block address support at some point, so for the next release we can remove the workaround.
(In reply to Amara Emerson from comment #20) > The -fno-experimental-isel flag is available on the 6.0 branch. I think as a > short term workaround however using -O2 would be better as I’d rather not > put the isel flag in places permanently if we can avoid it. Thinking about it, I don't think we should do this kind of hack: We really want to disable GlobalISel so we should make this explicit. I posted https://reviews.llvm.org/D43195, please let me know if that works for you. > [...] so for the next release we can remove the workaround. Probably not: The OpenMP runtime library also supports older versions of compilers, not only the latest. So we need to keep this "workaround" as long as we want to support Clang 6.0.0. We can only drop it for future versions of the compiler...
I don't understand the issue it seems, why do the tests rely closely on code generation of a specific compiler? This is something that can't be seen as stable across any version of the compiler, and in cases like these the tests would be updated to support the particular of the compiler being supported, and also branched for a particular release. Forcing the compiler to be generate code in a specific way despite it being correct is fragile, and in this case its just pure luck that we have a flag to revert to the old behavior. So perhaps as you say the tests need to work with various different compilers, in which case the tests themselves need to be tolerant of the differences. I have no investment in ompt so it's your choice how you choose to deal with this;
(In reply to Amara Emerson from comment #22) > I don't understand the issue it seems, why do the tests rely closely on code > generation of a specific compiler? This is something that can't be seen as > stable across any version of the compiler, and in cases like these the tests > would be updated to support the particular of the compiler being supported, > and also branched for a particular release. Yes, the tests rely on a specific code generation and we do everything possible to make the compiler do what we want (inline assembly for example). This seems to work across all tested versions of GCC, Intel Compiler and Clang on all supported platforms - except the latest Clang 6.0.0 on AArch64 which uses GlobalISel. Some background: We want to test return and frame addresses that are given to a tool during runtime. If you have a better idea how to do this, I think we would be happy to listen. We can't just update the test because the difference with GlobalISel isn't stable either: In some cases it produces one additional branch instruction, in some others there are two and I think if I keep on digging, I'll be able to construct a test where there are three of them. > Forcing the compiler to be generate code in a specific way despite it being > correct is fragile, and in this case its just pure luck that we have a flag > to revert to the old behavior. In your proposed workaround of using optimization it's yet greater luck that GlobalISel is only enabled by default for O0. I'm guessing you want to change that at some point in the future, so if we don't want GlobalISel in the tests we should explicitly say so. > So perhaps as you say the tests need to work with various different > compilers, in which case the tests themselves need to be tolerant of the > differences. I have no investment in ompt so it's your choice how you choose > to deal with this; As I said we have tested various different compilers and GlobalISel actually generates the first assembly that fails these tests. And I think we don't need to discuss that branches to the very next instruction are not the best the compiler can do. I understand that this will be solved in the future, but until we have support for blockaddress I think it's just the best option to disable GlobalISel.
You're really making it sound like GlobalISel is the main villain here, but IIUC those branches are generated by DAGISel. The only "questionable" thing that GlobalISel is doing in this case is that it's falling back directly to DAGISel instead of FastISel. Should we maybe reconsider falling back to FastISel? Naturally, this will come with some extra overhead in cases where FastISel can't handle it either and it needs to fallback again to DAGISel, but maybe preserving the current behaviour is more important?
(In reply to Diana Picus from comment #24) > You're really making it sound like GlobalISel is the main villain here, but > IIUC those branches are generated by DAGISel. The only "questionable" thing > that GlobalISel is doing in this case is that it's falling back directly to > DAGISel instead of FastISel. Should we maybe reconsider falling back to > FastISel? Naturally, this will come with some extra overhead in cases where > FastISel can't handle it either and it needs to fallback again to DAGISel, > but maybe preserving the current behaviour is more important? I didn't want to blame GlobalISel and I acknowledge that the branches are inserted by SelectionDAG. I'm also sure that all people working on it are doing that for a reason (not very familiar with this matter, didn't yet find enough time to carefully listen to the recorded talks about GlobalISel from the past DevMtgs). What I tried to express is that the tests "broke" when switching from FastISel to GlobalISel and that we can temporarily workaround this by disabling GlobalISel. I don't really like that kind of workaround, but I think it's the best thing we can do after you found the problem in rc2. And I surely won't say no when we can drop the flag again from trunk. So please just take the patch as my opinion of what we should do in the short-term and merge to release_60.
Unless Diana thinks differently, if you feel this is the right choice then go ahead. Chances are though it'll be libomp developers that will need to remember to remove these workarounds when they're not needed. As for whether we should be falling back to FastISel, I'm open to the idea but I think we need stronger justification than these tests.
(In reply to Amara Emerson from comment #26) > Unless Diana thinks differently, if you feel this is the right choice then > go ahead. Chances are though it'll be libomp developers that will need to > remember to remove these workarounds when they're not needed. > > As for whether we should be falling back to FastISel, I'm open to the idea > but I think we need stronger justification than these tests. As long as the libomp devs are happy with this workaround, I have nothing against it. I was just proposing alternatives. I agree that this late in the release process we should have better justification for more intrusive changes like changing the fallback or fixing SelectionDAG. Feel free to merge the proposed patch.
FWIW, when we changed the fallback path of GlobalISel to explicitly not used FastISel I said it wasn't a good idea for exactly the problem we have today: - We cannot fix codegen in GlobalISel since we don't support that case yet - Users still get a different codegen than what they are used to and we are not going to fix SDISel to behave like FastISel In other words, I would be in favor of not doing anything special for GISel when we fallback (revert r323369?)
(In reply to Amara Emerson from comment #26) > Unless Diana thinks differently, if you feel this is the right choice then > go ahead. Chances are though it'll be libomp developers that will need to > remember to remove these workarounds when they're not needed. If you subscribe me on the bug to implement blockaddress, I can probably take care of posting a patch to remove the workaround afterwards. (In reply to Diana Picus from comment #27) > As long as the libomp devs are happy with this workaround, I have nothing > against it. I was just proposing alternatives. I agree that this late in the > release process we should have better justification for more intrusive > changes like changing the fallback or fixing SelectionDAG. Feel free to > merge the proposed patch. Could you verify that this indeed works?
(In reply to Jonas Hahnfeld from comment #29) > Could you verify that this indeed works? Yes, I've already started it, but it's still in progress, so I'll most likely let you know tomorrow (Europe time).
(In reply to Diana Picus from comment #30) > (In reply to Jonas Hahnfeld from comment #29) > > Could you verify that this indeed works? > > Yes, I've already started it, but it's still in progress, so I'll most > likely let you know tomorrow (Europe time). Verified and committed in r325218. Hans, can we merge this to release_60?
(In reply to Jonas Hahnfeld from comment #31) > (In reply to Diana Picus from comment #30) > > (In reply to Jonas Hahnfeld from comment #29) > > > Could you verify that this indeed works? > > > > Yes, I've already started it, but it's still in progress, so I'll most > > likely let you know tomorrow (Europe time). > > Verified and committed in r325218. Hans, can we merge this to release_60? Merged in r325230.