Please try the following with the attached unstripped.bc: opt -strip -o stripped.bc unstripped.bc llvm-dis stripped.bc gccas stripped.ll => stripped.ll:4104: Reference to an invalid definition: #5 of type 'void () *' The offending function now looks like this: int %_Z20test_inline_funcdiffPFvvEPFjjE(void ()*, uint (uint)*) { cast void ()* %5 to int ; <int>:0 [#uses=1] cast uint (uint)* %2 to int ; <int>:1 [#uses=1] sub int %0, %1 ; <int>:2 [#uses=1] ret int %2 } ~Markus
Created attachment 279 [details] unstripped.bc
Here is a reduced testcase: --- ; RUN: llvm-as < %s | opt -strip | llvm-dis | llvm-as int %foo(int()* %P) { %X = call int %P() ret int %X } internal int %XX() { ret int 1 } --- This looks like a bug in the ll writer. -Chris
It looks like this is going to take some significant restructuring. I'm going to defer it to 1.7. Sorry :( -Chris
The only way to fix this reasonably is to give global symbols a different prefix (e.g. $X instead of %X). This would also remove hacks needed to support Assembler/2004-12-05-LocalGlobalSymtabConflict.ll
Initial support for llvm-upgrade has been implemented with this patch: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20061204/040812.html When more details are available, the upgrade will be implemented.
Idea: Continue to use % for all global names (vars, constants, types, functions). Make use of % illegal for register names inside functions where $ might be more appropriate (more assemblerish).
I actually am gravitating towards the %% idea. It seems that most simple global names can use %%foo, which has the added advantage that they stand out a bit. Complex global names (cases where we use "foo bar" instead of %), can use %"..". This would allow us to continue the use of any character in the name. Reasonable? -Chris
Yup, makes sense that way too, but I'd still vote for not allowing % as a register name. That is %% or %"..." is always global. "..." or $... is local. That way its even more distinct.
The problem with disallowing stuff is that it means we'd have to check it somewhere, slowing things down. Is there a problem with: %%G and %"G" being global and %L and "L" being local? In the asmprinter, if % is used in a local symbol, you could always use "%" like we do today. Today, this is valid syntax: "%" = global int 4 -Chris
> The problem with disallowing stuff is that it means we'd have to check it > somewhere, slowing things down. Huh? All this means is turning the Name production into a GlobalName and a LocalName production. Same for OptAssign -> GlobalOptAssign and LocalOptAssign. You're worried about the parser doing extra checking? > Is there a problem with: > %%G and %"G" being global and > %L and "L" being local? No, there's nothing "wrong" with it. Parsing wise its not ambiguous, but when humans read it, its much more clear with: %G and %"G" being global and $L and $"L" being local. Simple rule: $ means local, % means global. > In the asmprinter, if % is used in a local symbol, you could always use > "%" like we do today. Sure. > Today, this is valid syntax: > "%" = global int 4 Sure, I'm just trying to simplify things so that when someone reads the assembly its always clear: $=local, %=global. Period, no special cases. But, its not a big deal. Implement what you like.
Ok, I misunderstood what you said. I have no inherent preference for the prefix we use. The disadvantage of using $ as a prefix is that $ is a valid (and commonly used on macos) character in some identifiers. Using $ as a prefix would force those to print with "" syntax instead of the simple prefix syntax.
It doesn't have to be $. My whole point was that % and %% might confused by some humans. Using % for globals and something else for locals would alleviate that confusion. Anyway, I reiterate, implement what you like. Its not that big a deal. Reid.
What do you think of @global and %local? -Chris
Works for me.
Another hard case: define i32 @foo(i32()* %P) { %X = call i32 %P() %Y = call i32 @XX() ret i32 %X } define internal i32 @XX() { ret i32 1 }
The .ll file printer + asmparser bits are implemented, I'll attach a patch soon. Unfortunately I can't commit these until llvm-upgrade is implemented, as doing so would massacre llvm/test. Reid volenteered to do the llvm-upgrade part.
Created attachment 557 [details] vmcore + asmparser patch Hold your breath, here it is.
Mine.
Are module asm blocks considered global ? I'm assuming they are.
Never mind, silly question. They don't have names :)
This is an even harder test case (for llvm-upgrade): int %foo(int()* %P) { %XX = call int %P() %Y = call uint %XX() ret int %XX } internal uint "XX"() { ret uint 1 }
This is basically going to take a rewrite of llvm-upgrade. However, I'm not even sure that will help because the ambiguity that PR645 is solving is still present in llvm-ugprade's input. Consider this: void %func(int* %X) { entry: br next more: %ptr = getelementptr long* %X, int 0 %ptr2 = getelementptr int* %Y, int 0 br done next: %Y = getelementptr int* %X, int 0 br more done: ret void } %X = internal constant long 5 When we hit the more block we have two problems. First, we have no idea that long* %X is forward referenced. It could be either local or global. Furthermore there's a %X local symbol of a different type. Second, LLVM allows instructions to be forward referenced too. This is illustrated in the next instruction in which case %Y is local but it hasn't been seen yet by the parser. This prevents use from using a rule like: "if you haven't seen it yet, its global" because it might be a forward referenced local. These issues require llvm-upgrade to keep a model of each function until the entire function is finished parsing. Then we have to go back and "Fixup" any references that were not defined by the function. llvm-upgrade wasn't designed to do this. It emits instructions as soon as they are read. Consequently, its going to be a while before this can get resolved.
I assume you should use the same approach that asmparser used to. Basically, if you see a name you: 1. Check to see if it's already defined, if so, use it. 2. If not, see if we have a placeholder for it already, if so, use it. 3. If not, create a new placeholder. Add it to the list of unresolved names. 4. When done parsing each function, scan through the list of unresolved names. If any instructions match an unresolved name, resolve it. 5. When done parsing the module, scan through the list of unresolved names, any unresolved names get resolved to globals. If something isn't resolvable, it's an error. -Chris
Nope, it doesn't. Until now it was unnecessary. I decoupled llvm-upgrade from LLVM so that it is not affected by IR changes or any "new" rules. For the most part it just passes strings through on the assumption that the input is valid and the number of changes required are minimal. With this change, I'm going to do exactly what you just described. Hence, the rewrite.
ok
Note: when this lands, we should remove Function::renameLocalSymbols and all uses. This fix obsoletes that hack. -Chris
Okay. This should land this week if I can get through the llvm-upgrade portion of it. I'll make the change now so that it gets included with the commit.
no worries, we can make the change after it lands. I just didn't want to forget about it. :)
This has now been implemented via joint effort. The llvm-upgrade changes are in this patch: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070122/043254.html