This problem is causing fastcc on x86 to fail in some corner cases. Here's a testcase: --- target endian = little target pointersize = 32 target triple = "i686-pc-linux-gnu" %.str_28 = external global [24 x sbyte] ; <[24 x sbyte]*> [#uses=1] implementation ; Functions: fastcc void %apply_stencil_op_to_pixels(int* %tmp.414, uint %n, int* %x, int* %y, uint %oper, ubyte* %mask) { entry: switch uint %oper, label %label.5 [ uint 5386, label %label.6 ] label.5: ; preds = %entry br label %no_exit.8 no_exit.8: ; preds = %then.17, %label.5 %tmp.409 = getelementptr ubyte* %mask, uint 0 ; <ubyte*> [#uses=1] %tmp.410 = load ubyte* %tmp.409 ; <ubyte> [#uses=0] br label %then.17 then.17: ; preds = %no_exit.8 %tmp.415 = load int* %tmp.414 %tmp.417 = load ubyte** null ; <ubyte*> [#uses=1] %tmp.425 = getelementptr int* %y, uint 0 ; <int*> [#uses=1] %tmp.426 = load int* %tmp.425 ; <int> [#uses=1] %tmp.437 = getelementptr ubyte* %tmp.417, int %tmp.426 ; <ubyte*> [#uses=2] %tmp.440 = load ubyte* %tmp.437 ; <ubyte> [#uses=1] store ubyte %tmp.440, ubyte* %tmp.437 %tmp.405101 = setlt uint 0, %n ; <bool> [#uses=1] br bool %tmp.405101, label %no_exit.8, label %UnifiedReturnBlock label.6: ; preds = %entry %tmp.4.i = tail call uint %fwrite( sbyte* getelementptr ([24 x sbyte]* %.str_28, int 0, int 0), uint 23, uint 1, sbyte* null ) ; <uint> [#uses=0] ret void UnifiedReturnBlock: ; preds = %then.17, %no_exit.8 ret void } declare int %fprintf(%struct._IO_FILE*, sbyte*, ...) declare uint %fwrite(sbyte*, uint, uint, sbyte*) --- Compiled with 'llc -enable-x86-fastcc', this crashes llc. This is due to the coallescer coallescing virtregs with both EAX and EDX, which makes them unavailable to satisfy spills, causing the RA to run out of registers. We want to coallesce physregs when possible, but we cannot pin them in the spiller: we have to be able to uncoallesce them. This is almost certainly related to Bug 699. -Chris
Now with inreg patch landed, it seems, that this bug is now more important. For example, this (currently) breaks Qt: Consider we're having regparm(3) function being compiled in PIC mode on x86/Linux. So, in general, eax, ebx, ecx & edx are used in the early entry of function. It seems, that register allocator can't handle such corner situation. llc -debug shows, that is just "run out of registers" and after go into infinite cycle. Compiling the same function in non-PIC mode (so, making ebx free) will allow register allocator to correctly handle this case. Cheap workaround is just lower regparm(3) to regparm(2). But, it seems, many libraries are using stdcall + regparm(3) as "fast" CC and it will be definitely better to support this. At least, infinite cycling is not good :) I can provide additional .ll, which causes llc to cycle.
Evan, can you think of a reasonably simple way to work around this problem in the short-term? -Chris
I have to spend some time (which I don't have right now) to figure out a fix. Do you suppose it's easy for the coalescer to detect that a liverange has been coalesced to more than one physical register?
I don't think that's the issue. Consider a two register machine with code like this: vreg1 = r1 vreg2 = r2 vreg3 = some operation use vreg1 use vreg2 Right now, the coallescer will coallesce vreg1 with r1 because the live ranges don't overlap. Then it coallesces vreg2 with r2, because those live range don't overlap. Then the regalloc part starts, and tries to allocate vreg3. However, there are no regs to allocate vreg3 to, and badness ensues. I think the right way to solve this is to detect badness and break long liveranges tied to physregs (like vreg1/2 above). The problem with this is book-keeping, we'd need to remember where these things came from to know how/where to break them. -Chris
Oh yeah, in case it's not obvious, desired code in this example is something like: r1 = r1 r2 = r2 spill r2 -> ss#1 r2 = some operation use r1 restore ss#1 -> r2 use r2 or something.
Created attachment 709 [details] Failed bytecode This is sample bytecode from Qt, which causes llc to cycle. Compile it with llc -relocation-model=pic. The "bad" function is _Z9_decOctetPPcP10QByteArrayP9ErrorInfo.
Evan fixed this a while back. Qt doesn't require any regparm hacks to build with 2.0.