llc -fast -march=x86 on this .ll file: int %foo(int* %a, int %t, int %C) { entry: %a = cast int* %a to uint ; <uint> [#uses=1] br label %cond_true cond_true: ; preds = %cond_true, %entry %iv. = phi uint [ %a, %entry ], [ %iv..inc, %cond_true ] ; <uint> [#uses=2] %t_addr.0.0 = phi int [ %t, %entry ], [ %tmp7, %cond_true ] ; <int> [#uses=2] %iv. = cast uint %iv. to int* ; <int*> [#uses=1] %tmp3 = load int* %iv. ; <int> [#uses=1] %tmp7 = add int %t_addr.0.0, %tmp3 ; <int> [#uses=1] %tmp = setgt int %C, 39 ; <bool> [#uses=1] %iv..inc = add uint %iv., 4 ; <uint> [#uses=1] br bool %tmp, label %bb12, label %cond_true bb12: ; preds = %cond_true ret int %t_addr.0.0 } produces: _foo: subl $4, %esp movl %esi, (%esp) movl 8(%esp), %ecx movl 16(%esp), %edx movl 12(%esp), %esi LBB1_1: #cond_true *** movl %esi, %eax *** movl %eax, %esi addl (%ecx), %esi addl $4, %ecx cmpl $40, %edx jl LBB1_1 #cond_true LBB1_2: #bb12 movl (%esp), %esi addl $4, %esp ret It is safe to say that that is not the fastest way to copy esi into esi. Note that eax is dead. -Chris
Reduced testcase: int %foo(int %t, int %C) { entry: br label %cond_true cond_true: ; preds = %cond_true, %entry %t_addr.0.0 = phi int [ %t, %entry ], [ %tmp7, %cond_true ] ; <int> [#uses=2] %tmp7 = add int %t_addr.0.0, 1 ; <int> [#uses=1] %tmp = setgt int %C, 39 ; <bool> [#uses=1] br bool %tmp, label %bb12, label %cond_true bb12: ; preds = %cond_true ret int %t_addr.0.0 } _foo: movl 8(%esp), %ecx movl 4(%esp), %edx LBB1_1: #cond_true movl %edx, %eax movl %eax, %edx incl %edx cmpl $40, %ecx jl LBB1_1 #cond_true LBB1_2: #bb12 ret
Actually, I was wrong above, eax isn't dead, it's used by the ret outside the loop.
This fix for this one case is pretty trivial (I'm testing it now). One question, should we modify it so that the general case is also caught? Something like this: LBB1_1: #cond_true movl %edx, %eax ;; Only uses of %eax movl %eax, %edx Is transformed into: LBB1_1: #cond_true movl %edx, %eax ;; Only uses of %eax ? -bw
Fixed. After thinking about it some more, doing a check for the "general" case would be at least O(N^2) and hairy. If it's something we want to perform in the future, we should create a new report for it.
This is definitely an improvement, but not enough. The generated code now looks like this: _foo: movl 8(%esp), %ecx movl 4(%esp), %edx LBB1_1: #cond_true movl %edx, %eax incl %edx cmpl $40, %ecx jl LBB1_1 #cond_true LBB1_2: #bb12 ret This is *clearly* better than having two movs, but it would be better to have: _foo: movl 8(%esp), %ecx movl 4(%esp), %eax LBB1_1: #cond_true incl %eax cmpl $40, %ecx jl LBB1_1 #cond_true LBB1_2: #bb12 ret I think that this is a coallescing problem, the PHI live-range isn't getting coallesced with the copy into EAX at the end properly or something. -Chris
Ah! Yes. I see what you mean. Okay, I'll delve more deeply into the coalescer.
No, I'm confused. My suggestion would produce a different result. However, it does seem better to solve this at the interval level if possible. I'll take a quick look and if not, I'll close it.
Fixed. Testcase here: X86/2006-08-21-ExtraMovInst.ll Patches here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060821/037017.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20060821/037018.html -Chris