Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bottom-up register-reduction scheduling actually increases register pressure #1447

Closed
sunfishcode opened this issue Jan 4, 2007 · 7 comments
Labels
bugzilla Issues migrated from bugzilla code-quality llvm:codegen

Comments

@sunfishcode
Copy link
Member

Bugzilla Link 1075
Resolution FIXED
Resolved on Feb 22, 2010 12:44
Version trunk
OS Linux
CC @lattner

Extended Description

A high Sethi-Ullman number for an operation indicates it should be scheduled
earlier rather than later -- from a top-down perspective. However, in bottom-up
scheduling currently Seth-Ullman numbers are translated directly to priority
values, so such operations end up being scheduled earlier from a bottom-up
perspective. The result is that register pressure is actually increased instead
of reduced. The effect is somewhat hidden because the special case values are
also inverted -- for example, a store is given a low Sethi-Ullman number, but it
ends up having the desired effect. In larger test cases, there are fewer nodes
that use the special cases, and the effect is much more visible.

I've experimented a little with having the priority function take the inverse of
the SethiUllman value, and inverting the special-case values, and I've seen
spill counts drop significantly in large test cases on x86-64.

Here's a simple case that shows the problem:

float %foo(float %x) {
%tmp1 = mul float %x, 3.000000e+00
%tmp3 = mul float %x, 5.000000e+00
%tmp5 = mul float %x, 7.000000e+00
%tmp7 = mul float %x, 1.100000e+01
%tmp10 = add float %tmp1, %tmp3
%tmp12 = add float %tmp10, %tmp5
%tmp14 = add float %tmp12, %tmp7
ret float %tmp14
}

With -march=x86-64, the list-burr (default) schedule places all the multiplies
before all the adds. With the change mentioned above, the multiplies are
intermixed with the adds, so the register pressure is lower.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 8, 2007

I am going to take a closer look at his. But I don't see the problem you are
describing with the given example:

*** Final schedule ***
SU(0): 0x7b048a0: ch = EntryToken
SU(11): 0x7b055a0: f32,ch = MOVSSrm 0x7b04eb0, 0x7b05c20, 0x7b04e50, 0x7b053e0,
0x7b048a0
SU(9): 0x7b04fe0: f32,ch = MULSSrm 0x7b055a0, 0x7b04e50, 0x7b05c20, 0x7b04e50,
0x7b057c0, 0x7b048a0
SU(10): 0x7b04e00: f32,ch = MULSSrm 0x7b055a0, 0x7b04e50, 0x7b05c20, 0x7b04e50,
0x7b05540, 0x7b048a0
SU(1): 0x7b05090: f32 = ADDSSrr 0x7b04e00, 0x7b04fe0
SU(8): 0x7b05de0: f32,ch = MULSSrm 0x7b055a0, 0x7b04e50, 0x7b05c20, 0x7b04e50,
0x7b05a80, 0x7b048a0
SU(2): 0x7b05110: f32 = ADDSSrr 0x7b05090, 0x7b05de0
SU(7): 0x7b06100: f32,ch = MULSSrm 0x7b055a0, 0x7b04e50, 0x7b05c20, 0x7b04e50,
0x7b05d80, 0x7b048a0
SU(3): 0x7b05190: f32 = ADDSSrr 0x7b05110, 0x7b06100
SU(6): 0x7b05440: ch = MOVSSmr 0x7b04f50, 0x7b05c20, 0x7b04e50, 0x7b053e0,
0x7b05190, 0x7b048a0
SU(5): 0x7b06240: f64,ch = FpLD32m 0x7b04f50, 0x7b05c20, 0x7b04e50, 0x7b053e0,
0x7b05440
SU(4): 0x7b04ba0: ch = RET 0x7b05200, 0x7b05200:1
0x7b05200: ch,flag = FpSETRESULT 0x7b06240, 0x7b06240:1

foo:
subl $4, %esp
movss 8(%esp), %xmm0
movaps %xmm0, %xmm1
mulss .LCPI1_0, %xmm1
movaps %xmm0, %xmm2
mulss .LCPI1_1, %xmm2
addss %xmm1, %xmm2
movaps %xmm0, %xmm1
mulss .LCPI1_2, %xmm1
addss %xmm1, %xmm2
mulss .LCPI1_3, %xmm0
addss %xmm0, %xmm2
movss %xmm2, (%esp)
flds (%esp)
addl $4, %esp
ret

Multiplies are intermixed with the adds. Dan, are you working with tot?

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 8, 2007

Hmmm... I see there are some issues. Hopefully I can fix the general cases
without breaking the special ones.

@sunfishcode
Copy link
Member Author

I used -march=x86-64 above; the code you posted looks like 32-bit code. Offhand
I don't know why it schedules differently on x86 and x86-64; maybe the special
cases are applying differently.

Here's a case that shows the problem on both x86 and x86-64. In this test, the
div instructions should precede the mul instructions, according to pure
Sethi-Ullman numbering.

float %foo(float* %a) {
entry:
%tmp = load float* %a
%tmp4 = getelementptr float* %a, int 1
%tmp5 = load float* %tmp4
%tmp6 = mul float %tmp, %tmp5
%tmp8 = getelementptr float* %a, int 2
%tmp9 = load float* %tmp8
%tmp11 = getelementptr float* %a, int 3
%tmp12 = load float* %tmp11
%tmp13 = mul float %tmp9, %tmp12
%tmp14 = mul float %tmp6, %tmp13
%tmp16 = getelementptr float* %a, int 4
%tmp17 = load float* %tmp16
%tmp19 = getelementptr float* %a, int 5
%tmp20 = load float* %tmp19
%tmp21 = fdiv float %tmp17, %tmp20
%tmp23 = getelementptr float* %a, int 6
%tmp24 = load float* %tmp23
%tmp26 = getelementptr float* %a, int 7
%tmp27 = load float* %tmp26
%tmp28 = fdiv float %tmp24, %tmp27
%tmp29 = fdiv float %tmp21, %tmp28
%tmp31 = getelementptr float* %a, int 8
%tmp32 = load float* %tmp31
%tmp34 = getelementptr float* %a, int 9
%tmp35 = load float* %tmp34
%tmp36 = fdiv float %tmp32, %tmp35
%tmp38 = getelementptr float* %a, int 10
%tmp39 = load float* %tmp38
%tmp41 = getelementptr float* %a, int 11
%tmp42 = load float* %tmp41
%tmp43 = fdiv float %tmp39, %tmp42
%tmp44 = fdiv float %tmp36, %tmp43
%tmp45 = fdiv float %tmp29, %tmp44
%tmp46 = add float %tmp14, %tmp45
ret float %tmp46
}

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 9, 2007

Yeah, x86 and x86-64 cases scheduled differently because of calling convention
differences (x86 case has an extra store).

I have fixed this. Doing some tests now.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 9, 2007

Fixed.

@lattner
Copy link
Collaborator

lattner commented Jan 9, 2007

Thanks Evan, please add a test for this. :)

@lattner
Copy link
Collaborator

lattner commented Jan 9, 2007

ah, you did, it's Regression/CodeGen/X86/2007-01-08-InstrSched.ll

Thanks :)

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla code-quality llvm:codegen
Projects
None yet
Development

No branches or pull requests

3 participants