LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 47416 - [x86] Failure to remove all statements in function that doesn't return anything but should
Summary: [x86] Failure to remove all statements in function that doesn't return anythi...
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Scalar Optimizations (show other bugs)
Version: trunk
Hardware: PC Linux
: P enhancement
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-09-04 03:39 PDT by Gabriel Ravier
Modified: 2020-09-07 08:12 PDT (History)
5 users (show)

See Also:
Fixed By Commit(s): b22910daab95


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gabriel Ravier 2020-09-04 03:39:44 PDT
int f(int a, int b)
{
    if ((a < 4 && b == 'r') || (a == 1 && (b == 'h' || b == 'i')) || (a == 2))
        a = b;
}

Compiling this results in this assembly output:

f(int, int):
  cmp edi, 1

It should be possible to remove the `cmp edi, 1` as far as I know. I guess this isn't really by itself considering the behaviour is undefined anyway but this behaviour itself seems rather weird.
Comment 1 Gabriel Ravier 2020-09-04 04:18:40 PDT
It also seems to completely fail to realise that the code has to be unreachable (unless UB occurs). If I add this code (with `f` still visible) :

int g(bool cond, int a, int b)
{
    if (cond)
        return f(a, b);
    return 12;
}

LLVM is incapable of optimizing this to `return 12;` (removing the conditional), instead inlining the broken `f` and jumping to it if `cond` is true. If I change the body of `f` to nothing, it manages to do so properly (and the codegen for `f` emits nothing as expected).
Comment 2 Gabriel Ravier 2020-09-04 04:20:59 PDT
PS: In my second comment, I am *not* saying that behaviour should be defined where `cond == true`. I am saying that the code is not properly optimized based on the fact that `f` cannot be called without invoking UB.
Comment 3 Gabriel Ravier 2020-09-04 04:33:34 PDT
Also, here is a Godbolt comparison with GCC (which is capable of optimizing this better than LLVM) : https://godbolt.org/z/Kbs1f1
Comment 4 Gabriel Ravier 2020-09-04 04:40:38 PDT
Extra note: This doesn't occur when compiling the code as C (https://godbolt.org/z/n5d65h), so this bug seems even odder now.
Comment 5 Sanjay Patel 2020-09-04 08:46:46 PDT
https://reviews.llvm.org/D87149
Comment 6 Sanjay Patel 2020-09-04 08:48:57 PDT
Changing to IR bug - the sooner we can remove dead code, the better (potentially, there's also some clang front-end part of this to account for the C/C++ difference).
Comment 7 Johannes Doerfert 2020-09-04 22:34:22 PDT
To echo here what was aid in a review [0], we should have a pass early on removing all instructions preceding an unreachable if the are known to transfer control. 

For the example https://godbolt.org/z/Kbs1f1 , the Attributor knows `f` is `noreturn` and replaces the return in `g` with unreachable. That results in the "good" code: https://godbolt.org/z/6j36x8

That said, we might want to have a more localized optimization doing this, and finally land the default `willreturn` and `nounwind` annotations for intrinsics.


[0] https://reviews.llvm.org/D87149#2256885
Comment 8 Sanjay Patel 2020-09-07 08:12:37 PDT
Should be fixed with more instcombine zapping:
https://reviews.llvm.org/rGb22910daab95

Depending on target, we may do better than nothing on UB code as shown in the description. For example, on x86, I see a 'ud2' instruction, so that should crash and make the programmer aware of their buggy code even if they ignore the compile-time warning:
47416.cpp:5:1: warning: non-void function does not return a value [-Wreturn-type]