When compiling the following code: ============================ void alloc_sock(int *err); int try_send(); int send(int len) { int err; if (len) { err = -1; alloc_sock(&err); err = try_send(); } return -1; } ============================ with $ clang -O2 -g -ftrivial-auto-var-init=pattern -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang -c t.c Clang produces an extra store of $0xaaaaaaaa to |err|, which is introduced by the -ftrivial-auto-var-init and not removed by other optimizations in the pipeline: $ objdump -d t.o 0000000000000000 <send>: 0: 50 push %rax 1: c7 44 24 04 aa aa aa movl $0xaaaaaaaa,0x4(%rsp) 8: aa 9: 85 ff test %edi,%edi b: 74 1d je 2a <send+0x2a> d: c7 44 24 04 ff ff ff movl $0xffffffff,0x4(%rsp) 14: ff 15: 48 8d 7c 24 04 lea 0x4(%rsp),%rdi 1a: e8 00 00 00 00 callq 1f <send+0x1f> 1f: 31 c0 xor %eax,%eax 21: e8 00 00 00 00 callq 26 <send+0x26> 26: 89 44 24 04 mov %eax,0x4(%rsp) 2a: b8 ff ff ff ff mov $0xffffffff,%eax 2f: 59 pop %rcx 30: c3 retq As noted by JF Bastien here: https://reviews.llvm.org/D54604, "Seems like the optimizer should sink the store enough that it appears only once on each path."
Isn't this a job for -dse or some other pass in the IR optimizer rather than a backend problem? define i32 @send(i32 %len) { entry: %err = alloca i32, align 4 store i32 -1431655766, i32* %err, align 4 <--- dead store %tobool = icmp eq i32 %len, 0 br i1 %tobool, label %if.end, label %if.then if.then: store i32 -1, i32* %err, align 4 call void @alloc_sock(i32* nonnull %err) #3 %call = call i32 (...) @try_send() #3 store i32 %call, i32* %err, align 4 br label %if.end if.end: ret i32 -1 } declare void @alloc_sock(i32*) local_unnamed_addr #2 declare i32 @try_send(...) local_unnamed_addr #2
Well, yes. I've illustrated the problem with an assembly excerpt, but this is indeed in IR.
Changing component - not sure if anyone gets auto-cc'd on "scalar optimizations", but it's worth a try. :)
There's been work in the past to do cross-block DSE, but it's currently not in trunk. See https://reviews.llvm.org/D13363 https://reviews.llvm.org/D22568 etc.
Apologies, I thought the code had a diamond and we needed to DSE on one side, and not on the other (some min-cut). This missed optimization is even worst than what I'd understood. For Apple folks, I mirrored this bug in <rdar://problem/47671216>
There's also a patch for partial DSE, which never got merged: https://reviews.llvm.org/D29866 / https://reviews.llvm.org/D29865 . But the store in the given testcase is fully redundant, yes. There's also slightly more recent WIP patch for cross-block DSE: https://reviews.llvm.org/D29624 .
Added more people from CLs mentioned by Eli. Folks, do you think reviving these CLs may help here?
FYI I tried to revive MemorySSA backed DSE and broke down the patches in relatively small chunks: https://reviews.llvm.org/D72146 (and linked) The patch series I posted still does not quite cover this case, but I know what's missing (exploring multiple memorydefs) and that should also be addressed at some point soon.
MemorySSA backed DSE on trunk can remove the extra stores now (needs -mllvm -enable-dse-memoryssa): https://godbolt.org/z/x5YPsd
@fhahn, close and commit testcase?
(In reply to David Bolvansky from comment #11) > @fhahn, close and commit testcase? I think there are already plenty of tests that cover the scenario in tree. The unnecessary store is now removed without extra options, but I was planning on waiting with closing the issues a bit to wait until the new default sticks.
DSE + MemorySSA is the default now (51ff04567b2f8d06b2062bd3ed72eab2e93e4466). The store gets removed on current trunk without extra options: https://godbolt.org/z/avq4zz