645 – ambiguity parsing local vs global symbols

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 645 - ambiguity parsing local vs global symbols

Summary: ambiguity parsing local vs global symbols

Status:	RESOLVED FIXED

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Core LLVM classes (show other bugs)
Version:	1.0
Hardware:	All All

Importance:	P normal
Assignee:	Reid Spencer

URL:
Keywords:	compile-fail

Depends on:
Blocks:	761
	Show dependency tree

Reported:	2005-10-31 07:22 PST by Markus F.X.J. Oberhumer
Modified:	2010-02-22 12:50 PST (History)
CC List:	3 users (show)

See Also:
Fixed By Commit(s):

Attachments
unstripped.bc (36.68 KB, application/octet-stream) 2005-10-31 07:23 PST, Markus F.X.J. Oberhumer	Details
vmcore + asmparser patch (31.55 KB, patch) 2007-01-11 01:04 PST, Chris Lattner	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Markus F.X.J. Oberhumer 2005-10-31 07:22:44 PST

Please try the following with the attached unstripped.bc:

opt -strip -o stripped.bc unstripped.bc
llvm-dis stripped.bc
gccas stripped.ll

=> stripped.ll:4104: Reference to an invalid definition: #5 of type 'void () *'

The offending function now looks like this:

int %_Z20test_inline_funcdiffPFvvEPFjjE(void ()*, uint (uint)*) {
        cast void ()* %5 to int         ; <int>:0 [#uses=1]
        cast uint (uint)* %2 to int             ; <int>:1 [#uses=1]
        sub int %0, %1          ; <int>:2 [#uses=1]
        ret int %2
}

~Markus

Comment 1 Markus F.X.J. Oberhumer 2005-10-31 07:23:25 PST

Created attachment 279 [details]
unstripped.bc

Comment 2 Chris Lattner 2005-10-31 12:48:05 PST

Here is a reduced testcase:

---
; RUN: llvm-as < %s | opt -strip | llvm-dis | llvm-as
int %foo(int()* %P) {
        %X = call int %P()
        ret int %X
}
internal int %XX() {
        ret int 1
}
---

This looks like a bug in the ll writer.

-Chris

Comment 3 Chris Lattner 2005-10-31 12:53:55 PST

It looks like this is going to take some significant restructuring.  I'm going to defer it to 1.7.  Sorry :(

-Chris

Comment 4 Chris Lattner 2006-12-05 11:58:56 PST

The only way to fix this reasonably is to give global symbols a different prefix (e.g. $X instead of %X).  This 
would also remove hacks needed to support Assembler/2004-12-05-LocalGlobalSymtabConflict.ll

Comment 5 Reid Spencer 2006-12-05 13:20:51 PST

Initial support for llvm-upgrade has been implemented with this patch:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20061204/040812.html

When more details are available, the upgrade will be implemented.

Comment 6 Reid Spencer 2006-12-05 16:17:34 PST

Idea:

Continue to use % for all global names (vars, constants, types, functions). Make
use of % illegal for register names inside functions where $ might be more
appropriate (more assemblerish).

Comment 7 Chris Lattner 2006-12-05 16:20:29 PST

I actually am gravitating towards the %% idea.

It seems that most simple global names can use %%foo, which has the added advantage that they stand 
out a bit. 

Complex global names (cases where we use "foo bar" instead of %), can use %"..".  This would allow us 
to continue the use of any character in the name.

Reasonable?

-Chris

Comment 8 Reid Spencer 2006-12-05 16:31:15 PST

Yup, makes sense that way too, but I'd still vote for not allowing % as a
register name. That is %% or %"..." is always global.  "..." or $... is local.
That way its even more distinct.

Comment 9 Chris Lattner 2006-12-05 17:12:10 PST

The problem with disallowing stuff is that it means we'd have to check it somewhere, slowing things 
down.

Is there a problem with:

%%G and %"G" being global and
%L and "L" being local?

In the asmprinter, if % is used in a local symbol, you could always use "%" like we do today.

Today, this is valid syntax:
"%" = global int 4

-Chris

Comment 10 Reid Spencer 2006-12-06 00:38:31 PST

> The problem with disallowing stuff is that it means we'd have to check it
> somewhere, slowing things down.

Huh? All this means is turning the Name production into a GlobalName and a
LocalName production. Same for OptAssign -> GlobalOptAssign and LocalOptAssign.

You're worried about the parser doing extra checking?

> Is there a problem with:
> %%G and %"G" being global and
> %L and "L" being local?

No, there's nothing "wrong" with it. Parsing wise its not ambiguous, but when
humans read it, its much more clear with:

%G and %"G" being global and
$L and $"L" being local. 

Simple rule: $ means local, % means global.


> In the asmprinter, if % is used in a local symbol, you could always use 
> "%" like we do today.

Sure.

> Today, this is valid syntax:
> "%" = global int 4

Sure, I'm just trying to simplify things so that when someone reads the assembly
its always clear: $=local, %=global. Period, no special cases.

But, its not a big deal. Implement what you like.

Comment 11 Chris Lattner 2006-12-10 14:59:47 PST

Ok, I misunderstood what you said.  I have no inherent preference for the prefix we use.  The 
disadvantage of using $ as a prefix is that $ is a valid (and commonly used on macos) character in some 
identifiers.  Using $ as a prefix would force those to print with "" syntax instead of the simple prefix 
syntax.

Comment 12 Reid Spencer 2006-12-10 15:11:30 PST

It doesn't have to be $. My whole point was that % and %% might confused by some
humans. Using % for globals and something else for locals would alleviate that
confusion.

Anyway, I reiterate, implement what you like. Its not that big a deal.

Reid.

Comment 13 Chris Lattner 2006-12-10 15:16:19 PST

What do you think of @global and %local?

-Chris

Comment 14 Reid Spencer 2006-12-10 15:49:33 PST

Works for me.

Comment 15 Chris Lattner 2007-01-11 00:23:05 PST

Another hard case:
define i32 @foo(i32()* %P) {
        %X = call i32 %P()
        %Y = call i32 @XX()
        ret i32 %X
}
define internal i32 @XX() {
        ret i32 1
}

Comment 16 Chris Lattner 2007-01-11 00:38:17 PST

The .ll file printer + asmparser bits are implemented, I'll attach a patch soon.  Unfortunately I can't commit 
these until llvm-upgrade is implemented, as doing so would massacre llvm/test.  Reid volenteered to do 
the llvm-upgrade part.

Comment 17 Chris Lattner 2007-01-11 01:04:47 PST

Created attachment 557 [details]
vmcore + asmparser patch

Hold your breath, here it is.

Comment 18 Reid Spencer 2007-01-12 12:36:28 PST

Mine.

Comment 19 Reid Spencer 2007-01-12 12:58:39 PST

Are module asm blocks considered global ? I'm assuming they are.

Comment 20 Reid Spencer 2007-01-12 13:07:32 PST

Never mind, silly question. They don't have names :)

Comment 21 Reid Spencer 2007-01-12 13:49:27 PST

This is an even harder test case (for llvm-upgrade):

int %foo(int()* %P) {
        %XX = call int %P()
        %Y = call uint %XX()
        ret int %XX
}

internal uint "XX"() {
        ret uint 1
}

Comment 22 Reid Spencer 2007-01-12 14:38:57 PST

This is basically going to take a rewrite of llvm-upgrade. However, I'm not even
sure that will help because the ambiguity that PR645 is solving is still present
in llvm-ugprade's input. Consider this:

void %func(int* %X) {
entry:
    br next

more:
    %ptr = getelementptr long* %X, int 0
    %ptr2 = getelementptr int* %Y, int 0
    br done

next:
    %Y = getelementptr int* %X, int 0
    br more

done:
   ret void
}

%X = internal constant long 5

When we hit the more block we have two problems. First, we have no idea that
long* %X is forward referenced. It could be either local or global. Furthermore
there's a %X local symbol of a different type.  Second, LLVM allows instructions
to be forward referenced too. This is illustrated in the next instruction in
which case %Y is local but it hasn't been seen yet by the parser. This prevents
use from using a rule like: "if you haven't seen it yet, its global" because it
might be a forward referenced local. 

These issues require llvm-upgrade to keep a model of each function until the
entire function is finished parsing. Then we have to go back and "Fixup" any
references that were not defined by the function. llvm-upgrade wasn't designed
to do this. It emits instructions as soon as they are read. 

Consequently, its going to be a while before this can get resolved.

Comment 23 Chris Lattner 2007-01-12 15:35:08 PST

I assume you should use the same approach that asmparser used to.  Basically, if you see a name you:

1. Check to see if it's already defined, if so, use it.
2. If not, see if we have a placeholder for it already, if so, use it.
3. If not, create a new placeholder.  Add it to the list of unresolved names.
4. When done parsing each function, scan through the list of unresolved names.  If any instructions match 
an unresolved name, resolve it.
5. When done parsing the module, scan through the list of unresolved names, any unresolved names get 
resolved to globals.  If something isn't resolvable, it's an error.

-Chris

Comment 24 Reid Spencer 2007-01-12 15:56:30 PST

Nope, it doesn't. Until now it was unnecessary. I decoupled llvm-upgrade from
LLVM so that it is not affected by IR changes or any "new" rules. For the most
part it just passes strings through on the assumption that the input is valid
and the number of changes required are minimal.

With this change, I'm going to do exactly what you just described. Hence, the
rewrite.

Comment 25 Chris Lattner 2007-01-12 16:04:38 PST

ok

Comment 26 Chris Lattner 2007-01-25 11:37:34 PST

Note: when this lands, we should remove Function::renameLocalSymbols and all uses.  This fix obsoletes 
that hack.

-Chris

Comment 27 Reid Spencer 2007-01-25 12:03:15 PST

Okay. This should land this week if I can get through the llvm-upgrade portion
of it. I'll make the change now so that it gets included with the commit.

Comment 28 Chris Lattner 2007-01-25 12:04:35 PST

no worries, we can make the change after it lands.  I just didn't want to forget about it. :)

Comment 29 Reid Spencer 2007-01-26 02:32:19 PST

This has now been implemented via joint effort. The llvm-upgrade changes are in
this patch:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070122/043254.html