Created attachment 15960 [details] Reproducer This was discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-February/095833.html Description copy-pasted from the thread: """ Let's start with an example that shows that we have a problem (direct copy/paste from the guard intrinsics thread). Say we have: ``` void foo() available_externally { %t0 = load atomic %ptr %t1 = load atomic %ptr if (%t0 != %t1) print("X"); } void main() { foo(); print("Y"); } ``` The possible behaviors of the above program are {print("X"), print("Y")} or {print("Y")}. But if we run opt then we have ``` void foo() available_externally readnone nounwind { ;; After CSE'ing the two loads and folding the condition } void main() { foo(); print("Y"); } ``` and some generic reordering ``` void foo() available_externally readnone nounwind { ;; After CSE'ing the two loads and folding the condition } void main() { print("Y"); foo(); // legal since we're moving a readnone nounwind function that // was guaranteed to execute (hence can't have UB) } ``` If we do not inline @foo(), and instead re-link the call site in @main to some non-optimized copy (or differently optimized copy) of @foo, then it is possible for the program to have the behavior {print("Y"); print ("X")}, which was disallowed in the earlier program. In other words, opt refined the semantics of @foo() (i.e. reduced the set of behaviors it may have) in ways that would make later optimizations invalid if we de-refine the implementation of @foo(). The above example is clearly fabricated, but such cases can come up even if everything is optimized to the same level. E.g. one of the atomic loads in the unrefined implementation of @foo() could have been hidden behind a function call, whose body existed in only one module. That module would then be able to refine @foo() to `ret void` but other modules won't. """
Adding a link to a complete reproducer (also from Sanjoy) that has nothing to do with atomics: https://github.com/sanjoy/comdat-ipo
(In reply to comment #1) > Adding a link to a complete reproducer (also from Sanjoy) that has nothing > to do with atomics: > https://github.com/sanjoy/comdat-ipo FYI: that's the zip file I've attached to this bug.
I vote that we stop inferring readnone/readonly/etc on functions with isWeakForLinker linkage. I expect that in most places where we can infer such attributes, inlining is likely to happen anyway, and be just as profitable. Are we aware of any other IPO soundness issues with other passes that aren't functionattrs? I think IPSCP is fine for ODR linkage functions, they all have to return the same constant regardless of optimization.
Just a note: be careful with isWeakForLinker(), it is not the right one (as I wrote in the email thread). I think the right condition is !isStrongDefinitionForLinker(). (isWeakForLinker() does not catch available_externally)
(In reply to comment #3) > Are we aware of any other IPO soundness issues with other passes that aren't > functionattrs? I think IPSCP is fine for ODR linkage functions, they all > have to return the same constant regardless of optimization. I think both ipsccp and ipconstprop are bad. Consider a variant of the original example: void f(i32* %ptr) available_externally { %v0 = load atomic i32* %ptr %v1 = load atomic i32* %ptr %sub = sub i32 %v0, %v1 call @print(i32 %sub) ret i32 %sub } void caller() { %t = call @f(%ptr) print(%t) } function local opts ==> void f(i32* %ptr) available_externally { call @print(i32 0) ret i32 0 } void caller() { %t = call @f(%ptr) print(%t) } ipsccp or ipconstprop ==> void f(i32* %ptr) available_externally { call @print(i32 0) ret i32 0 } void caller() { %t = call @f(%ptr) print(i32 0) } Now if we link to the unoptimized @f, we can print {<non zero value>, <0>}, when in the original program the two values printed would have had to be the same. I think (haven't tried) we can construct similar cases with undef (no atomics) as well.
(In reply to comment #3) > I vote that we stop inferring readnone/readonly/etc on functions with > isWeakForLinker linkage. I expect that in most places where we can infer > such attributes, inlining is likely to happen anyway, and be just as > profitable. I think we need to be careful here; you could be right, but there seem to be some attributes that we do commonly prove for inline functions, even those too large to inline. We can prove 'nocapture' on the implicit this pointer, for example. readonly seems more common than readnone. I really think we need to benchmark this against privatizing the functions to make an informed decision. > > Are we aware of any other IPO soundness issues with other passes that aren't > functionattrs? I think IPSCP is fine for ODR linkage functions, they all > have to return the same constant regardless of optimization. I suspect that GlobalsAA has the same problem as attribute propagation (in one TU, a function's access to a global could be optimized out, but the linker could pick a version of the function which still accesses the global). [And privatizing just got harder?...]
I see a change was made at r265762 to address this issue. That's causing a lost optimization in variadic cases, that I believe can still safely be optimized. I've made bug 27796 for it.