Created attachment 3061 [details] Patch to check CI->mayHaveSideEffects() when looking at a load. opt -tailcallelim refuses to eliminate a tail call if it's followed by load, even if the call is both nounwind and readonly. An example where it would be safe to do this: ---- define fastcc i32 @_D4test3sumFPiiiZi(i32* %a_arg, i32 %a_len_arg, i32 %start_arg) nounwind readonly { entry: %tmp2 = icmp slt i32 %start_arg, %a_len_arg ; <i1> [#uses=1] br i1 %tmp2, label %else, label %if if: ; preds = %entry ret i32 0 else: ; preds = %entry %tmp4 = sext i32 %start_arg to i64 ; <i64> [#uses=1] %tmp6 = getelementptr i32* %a_arg, i64 %tmp4 ; <i32*> [#uses=1] %tmp10 = add i32 %start_arg, 1 ; <i32> [#uses=1] %tmp11 = tail call fastcc i32 @_D4test3sumFPiiiZi(i32* %a_arg, i32 %a_len_arg, i32 %tmp10) ; <i32> [#uses=1] %tmp12 = load i32* %tmp6 ; <i32> [#uses=1] %tmp13 = add i32 %tmp12, %tmp11 ; <i32> [#uses=1] ret i32 %tmp13 } ---- I'm attaching a small patch that remedies the situation. Note that I used CI->mayHaveSideEffects() instead of CI->onlyReadsMemory() to correctly handle the case where the call may throw and the load may introduce undefined behavior[1]. This way, the transformation does not take place in that case. [1]: For example, if the pointer being loaded from may be null but the call will throw in this case it's not safe to move the load before the call since that would produce undefined behavior instead of an exception.
If the call unwinds then the load is never executed and there is no undefined behaviour: unwinding continues in the caller.
Created attachment 3062 [details] Fixed patch Indeed, but the check in the pass is if it's safe to /move/ the instruction to /before/ the call, so that it can eliminate a tail recursion. It currently does not do this at all if the instruction is a load, but it should be safe if the function does not unwind and does not write to memory. ... and I just noticed I had the check reversed. Attaching corrected patch. For reference, this is the case I was talking about: ---- target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128" target triple = "x86_64-unknown-linux-gnu" define i32 @_Z3fooPiii(i32* %p, i32 %i, i32 %max) readonly { entry: %tmp4 = icmp slt i32 %i, %max ; <i1> [#uses=1] br i1 %tmp4, label %return, label %bb1 bb1: ; preds = %entry %tmp6 = icmp eq i32* %p, null ; <i1> [#uses=1] br i1 %tmp6, label %bb2, label %bb3 bb2: ; preds = %bb1 unwind bb3: ; preds = %bb1 %tmp13 = sext i32 %i to i64 ; <i64> [#uses=1] %tmp17 = add i32 %i, 1 ; <i32> [#uses=1] %tmp20 = call i32 @_Z3fooPiii(i32* %p, i32 %tmp17, i32 %max) ; <i32> [#uses=1] %tmp15 = load i32* %p, align 1 ; <i32> [#uses=1] %tmp21 = add i32 %tmp15, %tmp20 ; <i32> [#uses=1] ret i32 %tmp21 return: ; preds = %entry ret i32 0 } ---- In this case, it's *not* safe to perform the transformation, and so the fixed patch does not.
Isn't it safe to move the load before the call if the load can be proved not to trap?
Sure. But I don't see an easy way to determine that. My proposed change would certainly not be the most precise fix, but it's an improvement over the current situation :). (You could also move the load if it is non-trapping and the call writes to memory, but not *that* memory. However, that would require adding an alias analysis dependency to the pass, and I wasn't sure if that would be accepted)
Instcombine has isSafeToLoadUnconditionally. You could factor that out and use it here. On the other hand, maybe if the load could be safely moved it already would have been by some pass like instcombine?
(In reply to comment #5) > On the other hand, maybe if the > load could be safely moved it already would have been by some > pass like instcombine? I doubt another pass would see the benefit of doing so. The only reason it's profitable is because it allows the elimination of tail recursion.
Created attachment 3068 [details] A more comprehensive patch This patch * moves instcombine's isSafeToLoadUnconditionally to Transforms/Utils/Local.cpp (the best place for it that I could find, but I'm open to suggestions). * tweaks it a bit to ignore readonly calls when scanning backwards, since those aren't allowed to call free (right?). * uses it in tailcallelim to allow more loads to be moved above the call. Passes 'make check', including the new test cases included in the patch.
The patch to move and enhance isSafeToLoadUnconditionally looks great to me, please commit it as a separate change.
Once that is done, please send the proposed patch to llvm-commits where I'll see it. It would be nice to simplify the conditional logic somehow if possible.
Created attachment 3098 [details] Patch with just isSafeToLoadUnconditionally() changes. (In reply to comment #8) > The patch to move and enhance isSafeToLoadUnconditionally looks great to me, > please commit it as a separate change. See new attachment. I don't have commit access, so someone else will have to commit it.
Applied, I also added a testcase. Please make sure any optimizer enhancement includes a small testcase (e.g. r73508).
(In reply to comment #11) > Applied, I also added a testcase. Please make sure any optimizer enhancement > includes a small testcase (e.g. r73508). Sorry, I only had testcases for the tailcallelim change. I didn't really study what instcombine was trying to do there, I just stole a handy function and modified it to make it work for this case :). (And I didn't think about this when I split up the patch) (In reply to comment #9) > Once that is done, please send the proposed patch to llvm-commits where I'll > see it. http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20090615/078834.html
Patch applied here, thanks! http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20090615/079051.html