40527 – Missed opportunity to remove a dead store

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 40527 - Missed opportunity to remove a dead store

Summary: Missed opportunity to remove a dead store

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Scalar Optimizations (show other bugs)
Version:	trunk
Hardware:	PC Linux

Importance:	P enhancement
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2019-01-30 05:56 PST by Alexander Potapenko
Modified:	2020-10-22 06:00 PDT (History)
CC List:	16 users (show)

See Also:	39603 39225 45792
Fixed By Commit(s):	51ff04567b2f8d06b2062bd3ed72eab2e93e4466

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Alexander Potapenko 2019-01-30 05:56:10 PST

When compiling the following code:

============================
  void alloc_sock(int *err);
  int try_send();

  int send(int len) {
    int err;
    if (len) {
      err = -1; 
      alloc_sock(&err);
      err = try_send();
    }
    return -1; 
  }
============================

with 
 $ clang -O2  -g -ftrivial-auto-var-init=pattern -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang -c t.c

Clang produces an extra store of $0xaaaaaaaa to |err|, which is introduced by the -ftrivial-auto-var-init and not removed by other optimizations in the pipeline:

$ objdump -d t.o

0000000000000000 <send>:
   0:	50                   	push   %rax
   1:	c7 44 24 04 aa aa aa 	movl   $0xaaaaaaaa,0x4(%rsp)
   8:	aa 
   9:	85 ff                	test   %edi,%edi
   b:	74 1d                	je     2a <send+0x2a>
   d:	c7 44 24 04 ff ff ff 	movl   $0xffffffff,0x4(%rsp)
  14:	ff 
  15:	48 8d 7c 24 04       	lea    0x4(%rsp),%rdi
  1a:	e8 00 00 00 00       	callq  1f <send+0x1f>
  1f:	31 c0                	xor    %eax,%eax
  21:	e8 00 00 00 00       	callq  26 <send+0x26>
  26:	89 44 24 04          	mov    %eax,0x4(%rsp)
  2a:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
  2f:	59                   	pop    %rcx
  30:	c3                   	retq  

As noted by JF Bastien here: https://reviews.llvm.org/D54604, "Seems like the optimizer should sink the store enough that it appears only once on each path."

Comment 1 Sanjay Patel 2019-01-30 06:10:41 PST

Isn't this a job for -dse or some other pass in the IR optimizer rather than a backend problem?

define i32 @send(i32 %len) {
entry:
  %err = alloca i32, align 4
  store i32 -1431655766, i32* %err, align 4   <--- dead store 
  %tobool = icmp eq i32 %len, 0
  br i1 %tobool, label %if.end, label %if.then

if.then:                                          
  store i32 -1, i32* %err, align 4
  call void @alloc_sock(i32* nonnull %err) #3
  %call = call i32 (...) @try_send() #3
  store i32 %call, i32* %err, align 4
  br label %if.end

if.end:                                           
  ret i32 -1
}

declare void @alloc_sock(i32*) local_unnamed_addr #2
declare i32 @try_send(...) local_unnamed_addr #2

Comment 2 Alexander Potapenko 2019-01-30 06:14:19 PST

Well, yes. I've illustrated the problem with an assembly excerpt, but this is indeed in IR.

Comment 3 Alexander Potapenko 2019-01-30 06:14:19 PST

Well, yes. I've illustrated the problem with an assembly excerpt, but this is indeed in IR.

Comment 4 Sanjay Patel 2019-01-30 06:23:08 PST

Changing component - not sure if anyone gets auto-cc'd on "scalar optimizations", but it's worth a try. :)

Comment 5 Eli Friedman 2019-01-30 09:40:28 PST

There's been work in the past to do cross-block DSE, but it's currently not in trunk. See https://reviews.llvm.org/D13363 https://reviews.llvm.org/D22568 etc.

Comment 6 JF Bastien 2019-01-30 10:16:49 PST

Apologies, I thought the code had a diamond and we needed to DSE on one side, and not on the other (some min-cut). This missed optimization is even worst than what I'd understood.

For Apple folks, I mirrored this bug in <rdar://problem/47671216>

Comment 7 Eli Friedman 2019-01-30 13:44:28 PST

There's also a patch for partial DSE, which never got merged: https://reviews.llvm.org/D29866 / https://reviews.llvm.org/D29865 .  But the store in the given testcase is fully redundant, yes.

There's also slightly more recent WIP patch for cross-block DSE: https://reviews.llvm.org/D29624 .

Comment 8 Alexander Potapenko 2019-02-12 04:35:48 PST

Added more people from CLs mentioned by Eli.
Folks, do you think reviving these CLs may help here?

Comment 9 Florian Hahn 2020-01-13 16:15:35 PST

FYI I tried to revive MemorySSA backed DSE and broke down the patches in relatively small chunks: https://reviews.llvm.org/D72146 (and linked)

The patch series I posted still does not quite cover this case, but I know what's missing (exploring multiple memorydefs) and that should also be addressed at some point soon.

Comment 10 Florian Hahn 2020-08-14 01:30:12 PDT

MemorySSA backed DSE on trunk can remove the extra stores now (needs -mllvm -enable-dse-memoryssa): https://godbolt.org/z/x5YPsd

Comment 11 David Bolvansky 2020-09-11 06:03:00 PDT

@fhahn, close and commit testcase?

Comment 12 Florian Hahn 2020-09-11 06:23:51 PDT

(In reply to David Bolvansky from comment #11)
> @fhahn, close and commit testcase?

I think there are already plenty of tests that cover the scenario in tree. The unnecessary store is now removed without extra options, but I was planning on waiting with closing the issues a bit to wait until the new default sticks.

Comment 13 Florian Hahn 2020-10-22 06:00:05 PDT

DSE + MemorySSA is the default now (51ff04567b2f8d06b2062bd3ed72eab2e93e4466).

The store gets removed on current trunk without extra options:  https://godbolt.org/z/avq4zz