First Last Prev Next    No search results available
Details
: [linscan] Coallescing physregs with virtregs should allow...
Bug#: 711
: libraries
: Common Code Generator Code
Status: RESOLVED
Resolution: FIXED
: All
: All
: 1.0
: P2
: normal
: 2.0

:
: compile-fail
:
: 699
  Show dependency tree - Show dependency graph
People
Reporter: Chris Lattner <clattner@apple.com>
Assigned To: Unassigned LLVM Bugs <unassignedbugs@nondot.org>
:

Attachments
Failed bytecode (244.93 KB, application/octet-stream)
2007-03-20 11:41, Anton Korobeynikov
Details


Note

You need to log in before you can comment on or make changes to this bug.

Related actions


Description:   Opened: 2006-02-23 01:50
This problem is causing fastcc on x86 to fail in some corner cases.  Here's a
testcase:

---
target endian = little
target pointersize = 32
target triple = "i686-pc-linux-gnu"
%.str_28 = external global [24 x sbyte]         ; <[24 x sbyte]*> [#uses=1]

implementation   ; Functions:

fastcc void %apply_stencil_op_to_pixels(int* %tmp.414, uint %n, 
int* %x, int* %y, uint %oper, ubyte* %mask) {
entry:
        switch uint %oper, label %label.5 [
                 uint 5386, label %label.6
        ]

label.5:                ; preds = %entry
        br label %no_exit.8

no_exit.8:              ; preds = %then.17, %label.5
        %tmp.409 = getelementptr ubyte* %mask, uint 0           ; <ubyte*>
[#uses=1]
        %tmp.410 = load ubyte* %tmp.409         ; <ubyte> [#uses=0]
        br label %then.17

then.17:                ; preds = %no_exit.8
        %tmp.415 = load int* %tmp.414
        %tmp.417 = load ubyte** null            ; <ubyte*> [#uses=1]
        %tmp.425 = getelementptr int* %y, uint 0                ; <int*>
[#uses=1]
        %tmp.426 = load int* %tmp.425           ; <int> [#uses=1]
        %tmp.437 = getelementptr ubyte* %tmp.417, int %tmp.426
                ; <ubyte*> [#uses=2]
        %tmp.440 = load ubyte* %tmp.437         ; <ubyte> [#uses=1]
        store ubyte %tmp.440, ubyte* %tmp.437
        %tmp.405101 = setlt uint 0, %n          ; <bool> [#uses=1]
        br bool %tmp.405101, label %no_exit.8, label %UnifiedReturnBlock

label.6:                ; preds = %entry
        %tmp.4.i = tail call uint %fwrite( sbyte* getelementptr ([24 x sbyte]*
%.str_28, int 0, int 0), uint 23, 
uint 1, sbyte* null )           ; <uint> [#uses=0]
        ret void

UnifiedReturnBlock:             ; preds = %then.17, %no_exit.8
        ret void
}

declare int %fprintf(%struct._IO_FILE*, sbyte*, ...)

declare uint %fwrite(sbyte*, uint, uint, sbyte*)
---

Compiled with 'llc -enable-x86-fastcc', this crashes llc.

This is due to the coallescer coallescing virtregs with both EAX and EDX, which
makes them unavailable 
to satisfy spills, causing the RA to run out of registers.  We want to
coallesce physregs when possible, 
but we cannot pin them in the spiller: we have to be able to uncoallesce them.

This is almost certainly related to Bug 699.

-Chris
------- Comment #1 From Anton Korobeynikov 2007-01-29 11:52:35 -------
Now with inreg patch landed, it seems, that this bug is now more important. For
example, this (currently) breaks Qt:

Consider we're having regparm(3) function being compiled in PIC mode on
x86/Linux. So, in general, eax, ebx, ecx & edx are used in the early entry of
function. It seems, that register allocator can't handle such corner situation.
llc -debug shows, that is just "run out of registers" and after go into infinite
cycle. Compiling the same function in non-PIC mode (so, making ebx free) will
allow register allocator to correctly handle this case.

Cheap workaround is just lower regparm(3) to regparm(2). But, it seems, many
libraries are using stdcall + regparm(3) as "fast" CC and it will be definitely
better to support this. At least, infinite cycling is not good :)

I can provide additional .ll, which causes llc to cycle.
------- Comment #2 From Chris Lattner 2007-01-29 12:14:23 -------
Evan, can you think of a reasonably simple way to work around this problem in
the short-term?

-Chris
------- Comment #3 From Evan Cheng 2007-01-30 19:13:04 -------
I have to spend some time (which I don't have right now) to figure out a fix. Do
you suppose it's easy for the coalescer to detect that a liverange has been
coalesced to more than one physical register?
------- Comment #4 From Chris Lattner 2007-01-30 23:15:04 -------
I don't think that's the issue.  Consider a two register machine with code like
this:

vreg1 = r1
vreg2 = r2

vreg3 = some operation

use vreg1
use vreg2

Right now, the coallescer will coallesce vreg1 with r1 because the live ranges
don't overlap.  Then it 
coallesces vreg2 with r2, because those live range don't overlap.

Then the regalloc part starts, and tries to allocate vreg3.  However, there are
no regs to allocate vreg3 
to, and badness ensues.

I think the right way to solve this is to detect badness and break long
liveranges tied to physregs (like 
vreg1/2 above).  The problem with this is book-keeping, we'd need to remember
where these things 
came from to know how/where to break them.

-Chris
------- Comment #5 From Chris Lattner 2007-01-30 23:16:32 -------
Oh yeah, in case it's not obvious, desired code in this example is something
like:

r1 = r1
r2 = r2
spill r2 -> ss#1

r2 = some operation

use r1
restore ss#1 -> r2
use r2

or something.
------- Comment #6 From Anton Korobeynikov 2007-03-20 11:41:06 -------
Created an attachment (id=709) [details]
Failed bytecode

This is sample bytecode from Qt, which causes llc to cycle. Compile it with llc
-relocation-model=pic. The "bad" function is
_Z9_decOctetPPcP10QByteArrayP9ErrorInfo.
------- Comment #7 From Chris Lattner 2007-05-17 13:59:04 -------
Evan fixed this a while back.  Qt doesn't require any regparm hacks to build
with 2.0.

First Last Prev Next    No search results available