LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 645 - ambiguity parsing local vs global symbols
Summary: ambiguity parsing local vs global symbols
Status: RESOLVED FIXED
Alias: None
Product: libraries
Classification: Unclassified
Component: Core LLVM classes (show other bugs)
Version: 1.0
Hardware: All All
: P normal
Assignee: Reid Spencer
URL:
Keywords: compile-fail
Depends on:
Blocks: 761
  Show dependency tree
 
Reported: 2005-10-31 07:22 PST by Markus F.X.J. Oberhumer
Modified: 2010-02-22 12:50 PST (History)
3 users (show)

See Also:
Fixed By Commit(s):


Attachments
unstripped.bc (36.68 KB, application/octet-stream)
2005-10-31 07:23 PST, Markus F.X.J. Oberhumer
Details
vmcore + asmparser patch (31.55 KB, patch)
2007-01-11 01:04 PST, Chris Lattner
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Markus F.X.J. Oberhumer 2005-10-31 07:22:44 PST
Please try the following with the attached unstripped.bc:

opt -strip -o stripped.bc unstripped.bc
llvm-dis stripped.bc
gccas stripped.ll

=> stripped.ll:4104: Reference to an invalid definition: #5 of type 'void () *'

The offending function now looks like this:

int %_Z20test_inline_funcdiffPFvvEPFjjE(void ()*, uint (uint)*) {
        cast void ()* %5 to int         ; <int>:0 [#uses=1]
        cast uint (uint)* %2 to int             ; <int>:1 [#uses=1]
        sub int %0, %1          ; <int>:2 [#uses=1]
        ret int %2
}

~Markus
Comment 1 Markus F.X.J. Oberhumer 2005-10-31 07:23:25 PST
Created attachment 279 [details]
unstripped.bc
Comment 2 Chris Lattner 2005-10-31 12:48:05 PST
Here is a reduced testcase:

---
; RUN: llvm-as < %s | opt -strip | llvm-dis | llvm-as
int %foo(int()* %P) {
        %X = call int %P()
        ret int %X
}
internal int %XX() {
        ret int 1
}
---

This looks like a bug in the ll writer.

-Chris
Comment 3 Chris Lattner 2005-10-31 12:53:55 PST
It looks like this is going to take some significant restructuring.  I'm going to defer it to 1.7.  Sorry :(

-Chris
Comment 4 Chris Lattner 2006-12-05 11:58:56 PST
The only way to fix this reasonably is to give global symbols a different prefix (e.g. $X instead of %X).  This 
would also remove hacks needed to support Assembler/2004-12-05-LocalGlobalSymtabConflict.ll
Comment 5 Reid Spencer 2006-12-05 13:20:51 PST
Initial support for llvm-upgrade has been implemented with this patch:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20061204/040812.html

When more details are available, the upgrade will be implemented.
Comment 6 Reid Spencer 2006-12-05 16:17:34 PST
Idea:

Continue to use % for all global names (vars, constants, types, functions). Make
use of % illegal for register names inside functions where $ might be more
appropriate (more assemblerish). 

Comment 7 Chris Lattner 2006-12-05 16:20:29 PST
I actually am gravitating towards the %% idea.

It seems that most simple global names can use %%foo, which has the added advantage that they stand 
out a bit. 

Complex global names (cases where we use "foo bar" instead of %), can use %"..".  This would allow us 
to continue the use of any character in the name.

Reasonable?

-Chris
Comment 8 Reid Spencer 2006-12-05 16:31:15 PST
Yup, makes sense that way too, but I'd still vote for not allowing % as a
register name. That is %% or %"..." is always global.  "..." or $... is local.
That way its even more distinct.
Comment 9 Chris Lattner 2006-12-05 17:12:10 PST
The problem with disallowing stuff is that it means we'd have to check it somewhere, slowing things 
down.

Is there a problem with:

%%G and %"G" being global and
%L and "L" being local?

In the asmprinter, if % is used in a local symbol, you could always use "%" like we do today.

Today, this is valid syntax:
"%" = global int 4

-Chris

Comment 10 Reid Spencer 2006-12-06 00:38:31 PST
> The problem with disallowing stuff is that it means we'd have to check it
> somewhere, slowing things down.

Huh? All this means is turning the Name production into a GlobalName and a
LocalName production. Same for OptAssign -> GlobalOptAssign and LocalOptAssign.

You're worried about the parser doing extra checking?

> Is there a problem with:
> %%G and %"G" being global and
> %L and "L" being local?

No, there's nothing "wrong" with it. Parsing wise its not ambiguous, but when
humans read it, its much more clear with:

%G and %"G" being global and
$L and $"L" being local. 

Simple rule: $ means local, % means global.


> In the asmprinter, if % is used in a local symbol, you could always use 
> "%" like we do today.

Sure.

> Today, this is valid syntax:
> "%" = global int 4

Sure, I'm just trying to simplify things so that when someone reads the assembly
its always clear: $=local, %=global. Period, no special cases.

But, its not a big deal. Implement what you like.
Comment 11 Chris Lattner 2006-12-10 14:59:47 PST
Ok, I misunderstood what you said.  I have no inherent preference for the prefix we use.  The 
disadvantage of using $ as a prefix is that $ is a valid (and commonly used on macos) character in some 
identifiers.  Using $ as a prefix would force those to print with "" syntax instead of the simple prefix 
syntax.
Comment 12 Reid Spencer 2006-12-10 15:11:30 PST
It doesn't have to be $. My whole point was that % and %% might confused by some
humans. Using % for globals and something else for locals would alleviate that
confusion.

Anyway, I reiterate, implement what you like. Its not that big a deal.

Reid.
Comment 13 Chris Lattner 2006-12-10 15:16:19 PST
What do you think of @global and %local?

-Chris
Comment 14 Reid Spencer 2006-12-10 15:49:33 PST
Works for me.
Comment 15 Chris Lattner 2007-01-11 00:23:05 PST
Another hard case:
define i32 @foo(i32()* %P) {
        %X = call i32 %P()
        %Y = call i32 @XX()
        ret i32 %X
}
define internal i32 @XX() {
        ret i32 1
}
Comment 16 Chris Lattner 2007-01-11 00:38:17 PST
The .ll file printer + asmparser bits are implemented, I'll attach a patch soon.  Unfortunately I can't commit 
these until llvm-upgrade is implemented, as doing so would massacre llvm/test.  Reid volenteered to do 
the llvm-upgrade part.
Comment 17 Chris Lattner 2007-01-11 01:04:47 PST
Created attachment 557 [details]
vmcore + asmparser patch

Hold your breath, here it is.
Comment 18 Reid Spencer 2007-01-12 12:36:28 PST
Mine.
Comment 19 Reid Spencer 2007-01-12 12:58:39 PST
Are module asm blocks considered global ? I'm assuming they are.
Comment 20 Reid Spencer 2007-01-12 13:07:32 PST
Never mind, silly question. They don't have names :)
Comment 21 Reid Spencer 2007-01-12 13:49:27 PST
This is an even harder test case (for llvm-upgrade):

int %foo(int()* %P) {
        %XX = call int %P()
        %Y = call uint %XX()
        ret int %XX
}

internal uint "XX"() {
        ret uint 1
}

Comment 22 Reid Spencer 2007-01-12 14:38:57 PST
This is basically going to take a rewrite of llvm-upgrade. However, I'm not even
sure that will help because the ambiguity that PR645 is solving is still present
in llvm-ugprade's input. Consider this:

void %func(int* %X) {
entry:
    br next

more:
    %ptr = getelementptr long* %X, int 0
    %ptr2 = getelementptr int* %Y, int 0
    br done

next:
    %Y = getelementptr int* %X, int 0
    br more

done:
   ret void
}

%X = internal constant long 5

When we hit the more block we have two problems. First, we have no idea that
long* %X is forward referenced. It could be either local or global. Furthermore
there's a %X local symbol of a different type.  Second, LLVM allows instructions
to be forward referenced too. This is illustrated in the next instruction in
which case %Y is local but it hasn't been seen yet by the parser. This prevents
use from using a rule like: "if you haven't seen it yet, its global" because it
might be a forward referenced local. 

These issues require llvm-upgrade to keep a model of each function until the
entire function is finished parsing. Then we have to go back and "Fixup" any
references that were not defined by the function. llvm-upgrade wasn't designed
to do this. It emits instructions as soon as they are read. 

Consequently, its going to be a while before this can get resolved.
Comment 23 Chris Lattner 2007-01-12 15:35:08 PST
I assume you should use the same approach that asmparser used to.  Basically, if you see a name you:

1. Check to see if it's already defined, if so, use it.
2. If not, see if we have a placeholder for it already, if so, use it.
3. If not, create a new placeholder.  Add it to the list of unresolved names.
4. When done parsing each function, scan through the list of unresolved names.  If any instructions match 
an unresolved name, resolve it.
5. When done parsing the module, scan through the list of unresolved names, any unresolved names get 
resolved to globals.  If something isn't resolvable, it's an error.

-Chris
Comment 24 Reid Spencer 2007-01-12 15:56:30 PST
Nope, it doesn't. Until now it was unnecessary. I decoupled llvm-upgrade from
LLVM so that it is not affected by IR changes or any "new" rules. For the most
part it just passes strings through on the assumption that the input is valid
and the number of changes required are minimal.

With this change, I'm going to do exactly what you just described. Hence, the
rewrite.
Comment 25 Chris Lattner 2007-01-12 16:04:38 PST
ok
Comment 26 Chris Lattner 2007-01-25 11:37:34 PST
Note: when this lands, we should remove Function::renameLocalSymbols and all uses.  This fix obsoletes 
that hack.

-Chris
Comment 27 Reid Spencer 2007-01-25 12:03:15 PST
Okay. This should land this week if I can get through the llvm-upgrade portion
of it. I'll make the change now so that it gets included with the commit.
Comment 28 Chris Lattner 2007-01-25 12:04:35 PST
no worries, we can make the change after it lands.  I just didn't want to forget about it. :)
Comment 29 Reid Spencer 2007-01-26 02:32:19 PST
This has now been implemented via joint effort. The llvm-upgrade changes are in
this patch:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070122/043254.html