This code: int value(uint64 b1, uint64 b2) { int i, j, k; int value = 0; for (k = 0; k < 2; k++) for (i = 0; i < 6; i++) for (j = 0; j < 2; j++) if ((b2 & 0xf << (j + i * 6)) == 0xf << (j + i * 6)) value += 1000; return value; } generates too many unnecessary spills/reloads when rerun through the optimizer (at -O2). See "***" below: LBB1_1: #bb17.preheader movl $15, %eax *** movb 39(%esp), %cl movl %eax, %edx shll %cl, %edx *** movb 39(%esp), %cl incb %cl shll %cl, %eax movl %edx, %ecx The result is that the loop will run slower than if optimizations were turned on only once.
Created attachment 409 [details] Extracted function
Taking a look.
Fixed. Patch here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20061009/038511.html The diff of produced code: --- t.s 2006-10-11 18:57:23.000000000 -0700 +++ t2.s 2006-10-11 19:29:35.000000000 -0700 @@ -22,7 +22,6 @@ movb 39(%esp), %cl movl %eax, %edx shll %cl, %edx - movb 39(%esp), %cl incb %cl shll %cl, %eax movl %edx, %ecx @@ -73,7 +72,6 @@ movb 15(%esp), %cl movl %esi, %eax shll %cl, %eax - movb 15(%esp), %cl incb %cl shll %cl, %esi movl %eax, %ecx -Chris
Rockin'!
This sped up the fourinarow testpoiont on x86. It now has a smaller gap (~.5s instead of ~.9s) between the times.