12193 – Add support for 'outreg' parameter attribute

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 12193 - Add support for 'outreg' parameter attribute

Summary: Add support for 'outreg' parameter attribute

Status:	NEW

Alias:	None

Product:	libraries
Classification:	Unclassified
Component:	Common Code Generator Code (show other bugs)
Version:	trunk
Hardware:	PC All

Importance:	P enhancement
Assignee:	Unassigned LLVM Bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2012-03-05 20:48 PST by Daniel Dunbar
Modified:	2012-03-06 12:59 PST (History)
CC List:	6 users (show)

See Also:
Fixed By Commit(s):

Attachments
Related patch adding support for not using byval on small structs for x86_64. (8.00 KB, patch) 2012-03-06 11:47 PST, Daniel Dunbar	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Daniel Dunbar 2012-03-05 20:48:33 PST

We currently have support for 'inreg' as a parameter attribute on calls.

There are situations in which front ends would benefit from having an 'outreg' parameter attribute available.

For example, in Clang we occasionally would like to lower structs which should be passed on the stack into scalar LLVM values (in order to avoid byval, which tends to pessimize codeine).

On x86_64, for example, we currently are unable to (easily) do this in situations in which the argument to be passed on the stack occurs before all integer registers are exhausted, because turning it into a scalar value would result in the argument occupying an integer register. We could reorder the arguments to deal with this, but that would be much more complicated for the frontend.

Comment 1 Daniel Dunbar 2012-03-05 21:11:47 PST

Maybe 'onstack' is a more reasonable name.

Comment 2 Rafael Ávila de Espíndola 2012-03-06 07:35:41 PST

It would be really nice if the caller could see a value and the callee could see memory, as that is what will happen in practice.

Consider for example

-------------------------------------
struct foo {
  long a;  long b; char c;
};
void h(struct foo *x);
__attribute__((noinline)) void f(struct foo x) {
  h(&x);
}
void g(long a, long b, char c) {
  struct foo x = {a, b, c};
  f(x);
}
------------------------------------

for the function f getting a byval argument is perfect. It knows the value is in memory and is compiled to

------------------------------------
define void @f(%struct.foo* byval align 8 %x) nounwind uwtable optsize noinline ssp {
entry:
  tail call void @h(%struct.foo* %x) nounwind optsize
  ret void
}
------------------------------------

For the function g byval is bad because it thinks it has to create a stack location, but codegen will copy it anyway. We produce:

--------------------------------------------
define void @g(i64 %a, i64 %b, i8 signext %c) nounwind uwtable optsize ssp {
entry:
  %x = alloca %struct.foo, align 8
  %a1 = getelementptr inbounds %struct.foo* %x, i64 0, i32 0
  store i64 %a, i64* %a1, align 8, !tbaa !0
  %b2 = getelementptr inbounds %struct.foo* %x, i64 0, i32 1
  store i64 %b, i64* %b2, align 8, !tbaa !0
  %c3 = getelementptr inbounds %struct.foo* %x, i64 0, i32 2
  store i8 %c, i8* %c3, align 8, !tbaa !1
  call void @f(%struct.foo* byval align 8 %x) optsize
  ret void
}
---------------------------------------------

The IL for f could remain something like

-----------------------------------------------
define void @f(%struct.foo* onstack align 8 %x) nounwind uwtable optsize noinline ssp {
entry:
  tail call void @h(%struct.foo* %x) nounwind optsize
  ret void
}
-----------------------------------------------

and g become

---------------------------------------
define void @g(i64 %a, i64 %b, i8 signext %c) nounwind uwtable optsize ssp {
entry:
  call void @f(%struct.foo onstack align 8 {%a, %b, %c }) optsize
  ret void
}
--------------------------------------

That is, for onstack arguments, a pointer in the calle is matched by a value in the caller and codegen is the one responsible for creating the address when it produces the call frame. This should let it do a better job than what we currently do for g:


_g:                                     ## @g
        subq    $56, %rsp
        movq    %rdi, 32(%rsp)
        movq    %rsi, 40(%rsp)
        movb    %dl, 48(%rsp)
        movq    48(%rsp), %rax
        movq    %rax, 16(%rsp)
        movq    32(%rsp), %rax
        movq    40(%rsp), %rcx
        movq    %rcx, 8(%rsp)
        movq    %rax, (%rsp)
        callq   _f
        addq    $56, %rsp
        ret

Comment 3 Duncan Sands 2012-03-06 07:49:31 PST

Hi Rafael, suppose the example was this:

void g(long a, long b, char c) {
  struct foo x = {a, b, c};
  f(x);
  do_something_with(x); <- or address of x, doesn't matter
}

Then making a copy of x is obligatory when calling f, because f is allowed
to write to the memory passed to it, but such changes are not allowed show up
in x.

So in order to do your proposed optimization, you first have to check that
it is safe to pass the address of x directly to f, by analyzing the code
following the call to check that it doesn't read from x.

If that check passes, then it is OK to pass the address of x even if there
were loads of complicated accesses to x before the call to f.  However your
syntax
  call void @f(%struct.foo onstack align 8 {%a, %b, %c }) optsize
would only handle simple cases in which it is clear what is in x at the
moment of the call.

Instead, couldn't you just drop the 'byval' attribute on the call, and then
codegen should just pass the address of x straight through?

Comment 4 Duncan Sands 2012-03-06 08:06:17 PST

PS: The analysis would also have to prove that &x doesn't escape before the call
to f, otherwise f might read from x (such reads aren't allowed to see any writes
that f makes to x).

Comment 5 Rafael Ávila de Espíndola 2012-03-06 08:09:34 PST

> Then making a copy of x is obligatory when calling f, because f is allowed
> to write to the memory passed to it, but such changes are not allowed show up
> in x.

Yes, Codegen would still do the copy, like it does todays for simple types like i64. The problem is just the representation at the IL level.

And yes, if we known that the function never modifies its argument, we can optimize the code to just drop the byval/onstack attribute. In the example I wrote that optimization was *not* done, there is still an implicit copy done by the codegen.

Cheers,
Rafael

Comment 6 Duncan Sands 2012-03-06 08:13:54 PST

Just to be clear, you don't need to know that *f* doesn't modify the argument.
It is OK if it modifies the argument as long x isn't used after the call to f.
So your testcase function g is fine because x isn't used after the call to f.

I'm not sure why you want a new IR construct when just dropping the byval on
the call should do the right thing in your testcase.  Maybe you have some more
complicated testcases in mind?

Comment 7 Rafael Ávila de Espíndola 2012-03-06 08:16:28 PST

(In reply to comment #6)
> Just to be clear, you don't need to know that *f* doesn't modify the argument.
> It is OK if it modifies the argument as long x isn't used after the call to f.
> So your testcase function g is fine because x isn't used after the call to f.

No, it is fine because I am defining outreg as causing codegen to create a copy!

> I'm not sure why you want a new IR construct when just dropping the byval on
> the call should do the right thing in your testcase.  Maybe you have some more
> complicated testcases in mind?

No, I am trying to handle the general case. As I noted, the example I gave is *not* optimized on any assumptions about f.

Comment 8 Duncan Sands 2012-03-06 08:21:50 PST

Are we talking at cross-purposes?

As far as I can see all that is missing for your testcase is an IR level
optimization that drops the 'byval' in the call for examples like the call
to f in your function g.  I don't see why outreg is needed.

Comment 9 Rafael Ávila de Espíndola 2012-03-06 08:30:11 PST

(In reply to comment #8)
> Are we talking at cross-purposes?
> 
> As far as I can see all that is missing for your testcase is an IR level
> optimization that drops the 'byval' in the call for examples like the call
> to f in your function g.  I don't see why outreg is needed.

Because I used a simple example. I am not interested on what is the best codgen for this small testcase given knowledge of what f does, but on a more general problem.  Let me give a more complex one

---------------------------------------
struct foo {
  long a;
  long b;
  char c;
};
int should_modify(void);
void h(struct foo *x);
__attribute__((noinline)) void f(struct foo x) {
  if (should_modify())
    x.b = 42;
  h(&x);
}
void g(long a, long b, char c) {
  struct foo x = {a, b, c};
  f(x);
  f(x);
}
--------------------------------------

In here the codegen for f would stays very similar to what it is today:

-------------------------------------------
define void @f(%struct.foo* onstack align 8 %x) nounwind uwtable optsize noinline ssp {
entry:
  %call = tail call i32 @should_modify() nounwind optsize
  %tobool = icmp eq i32 %call, 0
  br i1 %tobool, label %if.end, label %if.then

if.then:                                          ; preds = %entry
  %b = getelementptr inbounds %struct.foo* %x, i64 0, i32 1
  store i64 42, i64* %b, align 8, !tbaa !0
  br label %if.end

if.end:                                           ; preds = %entry, %if.then
  tail call void @h(%struct.foo* %x) nounwind optsize
  ret void
}
------------------------------------------------

but g would become

----------------------------
define void @g(i64 %a, i64 %b, i8 signext %c) nounwind uwtable optsize ssp {
entry:
  call void @f(%struct.foo onstack align 8 {%a, %b, %c }) optsize ;codegen does a copy in here
  call void @f(%struct.foo onstack align 8 {%a, %b, %c }) optsize ;codegen does a copy in here
  ret void
}
-----------------------

I am precisely interested in the cases where we don't know what f does. In fact, I included f's body just to show that the IL for it should stays similar to what it is today. The Il for g would stay the same even if you replaced f's definition with a declaration.

Comment 10 Duncan Sands 2012-03-06 08:38:41 PST

I think you only avoid one copy (the first one):

define void @g(i64 %a, i64 %b, i8 signext %c) nounwind uwtable optsize ssp {
entry:
  call void @f(%struct.foo onstack align 8 {%a, %b, %c }) optsize ;codegen does
a copy in here

^ I agree you would avoid a copy here.  But the call to f may modify the part of
the stack you put %a, %b, %c in.  Thus you will need to set the stack up again.

  call void @f(%struct.foo onstack align 8 {%a, %b, %c }) optsize ;codegen does
a copy in here

^ I.e. you will need to store %a, %b, %c to stack again before doing this call.

  ret void
}


My proposed optimization would also save a copy in this case (the second one).

Comment 11 Rafael Ávila de Espíndola 2012-03-06 08:53:42 PST

(In reply to comment #10)
> I think you only avoid one copy (the first one):
> 
> define void @g(i64 %a, i64 %b, i8 signext %c) nounwind uwtable optsize ssp {
> entry:
>   call void @f(%struct.foo onstack align 8 {%a, %b, %c }) optsize ;codegen does
> a copy in here
> 
> ^ I agree you would avoid a copy here.  But the call to f may modify the part
> of
> the stack you put %a, %b, %c in.  Thus you will need to set the stack up again.

I would *not* avoid a copy. Codegen is doing the copy as it does today with byval, this avoids us having two copies, on at the IL level and one at codegen.
 
>   call void @f(%struct.foo onstack align 8 {%a, %b, %c }) optsize ;codegen does
> a copy in here
> 
> ^ I.e. you will need to store %a, %b, %c to stack again before doing this call.
> 
>   ret void
> }
> 
> 
> My proposed optimization would also save a copy in this case (the second one).

The problem is not a specific case. Add 500 calls to f in g if you want. Only the last one will be a special case if you do not know what f does. Lets keep this bug about the general case, not particular optimizations.

Comment 12 Duncan Sands 2012-03-06 09:04:06 PST

> The problem is not a specific case. Add 500 calls to f in g if you want. Only
> the last one will be a special case if you do not know what f does. Lets keep
> this bug about the general case, not particular optimizations.

This is probably what I'm not getting.  As far as I can see you are only dealing
with a special case: the case in which you know at compile time what is in the
memory at the moment of the call.

Also, as far as I can see, you are only ever saving one copy: right now there
is a copy of %a, %b, %c into an alloca; then codegen generates a bunch of copies
from the alloca, one for each byval call.  You save the copy into the alloca.
If you are actually saving more copies, please explain how, since I don't see
this.

I'm also saving only one copy, but a different one: no matter how many 500
calls to f and g etc, codegen doesn't have to do the last copy.

So if I'm correct that you are only saving one copy, and only doing it in a
special case, is it worth introducing this new IR construct when using my
approach you can probably in the same situations also save one copy without
any new IR construct?

Comment 13 Rafael Ávila de Espíndola 2012-03-06 09:18:51 PST

> This is probably what I'm not getting.  As far as I can see you are only
> dealing
> with a special case: the case in which you know at compile time what is in the
> memory at the moment of the call.

no. I am just letting the caller uses values and know that the codegen will produce the copy to a stack space. A possible assembly for g is

------------------------------
_g:
	subq	$56, %rsp
# move values to callee saved regs
        movq	%rdx, %rbx
        movq	%rsi, %r12
	movq	%rdi, %r13

# first copy, set up by codegen
        movq	%rbx, 16(%rsp)
        movq	%r12, 8(%rsp)
	movq	%r13, (%rsp)
	callq	_f

# second copy, set up by codegen because we don't know if f modified the stack
# or not the the caller (_g) has no copy available for us to use (its copy is
# in regs).
        movq	%rbx, 16(%rsp)
        movq	%r12, 8(%rsp)
	movq	%r13, (%rsp)
        callq	_f
	addq	$56, %rsp
	ret

------------------------------------------
 
> Also, as far as I can see, you are only ever saving one copy: right now there
> is a copy of %a, %b, %c into an alloca; then codegen generates a bunch of
> copies
> from the alloca, one for each byval call.  You save the copy into the alloca.
> If you are actually saving more copies, please explain how, since I don't see
> this.

It saves having the two copies on the stack. Currently when we pass a struct by value we always get one because of the alloca and one because of codegen. 

> I'm also saving only one copy, but a different one: no matter how many 500
> calls to f and g etc, codegen doesn't have to do the last copy.

Changing only codegen, how would you avoid codegen doing the copy? The callers copy is in an alloca. If I complicate f's signature just a bit (say "int f(int a, struct foo x)" codegen would have to be very clever to decide where that alloca would live for it to be in a valid frame according to the abi.

> So if I'm correct that you are only saving one copy, and only doing it in a
> special case, is it worth introducing this new IR construct when using my
> approach you can probably in the same situations also save one copy without
> any new IR construct?

It is not a special case. The above codegen avoids the copy for any f. It also saves stack space and lets the caller uses only values, wich are easier to handle in analysis than addresses.

Comment 14 Duncan Sands 2012-03-06 09:28:26 PST

> no. I am just letting the caller uses values and know that the codegen will
> produce the copy to a stack space. A possible assembly for g is
>
> ------------------------------
> _g:
>     subq    $56, %rsp
> # move values to callee saved regs
>         movq    %rdx, %rbx
>         movq    %rsi, %r12
>     movq    %rdi, %r13

How do you know what the values are?  Here it is easy because you are
just passing through %a, %b, %c.  What about this:

void g() {
  struct foo x;
  init_x(&x);
  f(x);
}

?

As far as I can see you can't handle it: you need to know what "the values" are.
That's why I'm saying that you only seem to be handling a special case.

> Changing only codegen, how would you avoid codegen doing the copy? The callers
> copy is in an alloca. If I complicate f's signature just a bit (say "int f(int
> a, struct foo x)" codegen would have to be very clever to decide where that
> alloca would live for it to be in a valid frame according to the abi.

OK, I can buy this.

Comment 15 Rafael Ávila de Espíndola 2012-03-06 09:39:03 PST

> How do you know what the values are?  Here it is easy because you are
> just passing through %a, %b, %c.  What about this:
> 
> void g() {
>   struct foo x;
>   init_x(&x);
>   f(x);
> }
> 
> ?
> 
> As far as I can see you can't handle it: you need to know what "the values"
> are.
> That's why I'm saying that you only seem to be handling a special case.

That is not one of the cases we use byval. As far as I can tell the Il can handle that perfectly today, both the caller and callee need a pointer and that is what they get. In fact, even for

void g() {
  int x;
  init_x(&x);
  f(x);
}

we get (and should get) an alloca. Being more specific, the cases I find a onstack attribute interesting are the ones we currently need to use a byval. It might also be handy for cases where we don't currently use a byval but the ABI puts the values in memory anyway, but that is a smaller concern I think.

Comment 16 Duncan Sands 2012-03-06 09:44:00 PST

struct foo {
  long a;  long b; char c;
};

extern void init_x(struct foo *);
extern void f(struct foo);

void g() {
 struct foo x;
 init_x(&x);
 f(x);
}

->


define void @g() nounwind uwtable {
entry:
  %x = alloca %struct.foo, align 8
  call void @init_x(%struct.foo* %x) nounwind
  call void @f(%struct.foo* byval align 8 %x) nounwind
  ret void
}

Notice the byval on the call to @f.  In theory it should be possible to set
aside the stack frame for the call to @f before calling @init_x, and have
@init_x initialize it.  Then there would be no need for any copying.

Comment 17 Daniel Dunbar 2012-03-06 11:47:42 PST

Created attachment 8144 [details]
Related patch adding support for not using byval on small structs for x86_64.

Added a patch to Clang which does a simple optimization to avoid using byval when trying to pass some small structs on the stack.

The patch includes comments about how we would like to use 'onstack' to be able to make the patch more general.

Comment 18 Rafael Ávila de Espíndola 2012-03-06 12:59:58 PST

> define void @g() nounwind uwtable {
> entry:
>   %x = alloca %struct.foo, align 8
>   call void @init_x(%struct.foo* %x) nounwind
>   call void @f(%struct.foo* byval align 8 %x) nounwind
>   ret void
> }
> 
> Notice the byval on the call to @f.  In theory it should be possible to set
> aside the stack frame for the call to @f before calling @init_x, and have
> @init_x initialize it.  Then there would be no need for any copying.

Yes, sorry, I misunderstood your example. It is a case the IL would get uglier. I would propose

--------------------------------
define void @g() nounwind uwtable optsize ssp {
entry:
  %x = alloca %struct.foo, align 8
  call void @init_x(%struct.foo* %x) nounwind optsize
  %0 = load %struct.foo* %x, align 8
  call void @f(%0) nounwind optsize
  ret void
}
declare void @init_x(%struct.foo*) optsize
declare void @f(%struct.foo* onstack) optsize
--------------------------------

I should still be possible to do the optimization you describe, but I agree the matching now has to account for the load. On the other side, if g itself has uses of x they could be done via the value %0.

The example makes me realize that it is more precise to say codegen would be responsible for storing the value in the call frame, not doing a copy.