Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for 'outreg' parameter attribute #12565

Open
llvmbot opened this issue Mar 6, 2012 · 18 comments
Open

Add support for 'outreg' parameter attribute #12565

llvmbot opened this issue Mar 6, 2012 · 18 comments
Labels
bugzilla Issues migrated from bugzilla llvm:codegen

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 6, 2012

Bugzilla Link 12193
Version trunk
OS All
Reporter LLVM Bugzilla Contributor
CC @asl,@sunfishcode

Extended Description

We currently have support for 'inreg' as a parameter attribute on calls.

There are situations in which front ends would benefit from having an 'outreg' parameter attribute available.

For example, in Clang we occasionally would like to lower structs which should be passed on the stack into scalar LLVM values (in order to avoid byval, which tends to pessimize codeine).

On x86_64, for example, we currently are unable to (easily) do this in situations in which the argument to be passed on the stack occurs before all integer registers are exhausted, because turning it into a scalar value would result in the argument occupying an integer register. We could reorder the arguments to deal with this, but that would be much more complicated for the frontend.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 6, 2012

Maybe 'onstack' is a more reasonable name.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 6, 2012

It would be really nice if the caller could see a value and the callee could see memory, as that is what will happen in practice.

Consider for example


struct foo {
  long a;  long b; char c;
};
void h(struct foo *x);
__attribute__((noinline)) void f(struct foo x) {
  h(&x);
}
void g(long a, long b, char c) {
  struct foo x = {a, b, c};
  f(x);
}

for the function f getting a byval argument is perfect. It knows the value is in memory and is compiled to


define void @f(%struct.foo* byval align 8 %x) nounwind uwtable optsize noinline ssp {
entry:
  tail call void @h(%struct.foo* %x) nounwind optsize
  ret void
}

For the function g byval is bad because it thinks it has to create a stack location, but codegen will copy it anyway. We produce:


define void @g(i64 %a, i64 %b, i8 signext %c) nounwind uwtable optsize ssp {
entry:
  %x = alloca %struct.foo, align 8
  %a1 = getelementptr inbounds %struct.foo* %x, i64 0, i32 0
  store i64 %a, i64* %a1, align 8, !tbaa !​0
  %b2 = getelementptr inbounds %struct.foo* %x, i64 0, i32 1
  store i64 %b, i64* %b2, align 8, !tbaa !​0
  %c3 = getelementptr inbounds %struct.foo* %x, i64 0, i32 2
  store i8 %c, i8* %c3, align 8, !tbaa !​1
  call void @f(%struct.foo* byval align 8 %x) optsize
  ret void
}

The IL for f could remain something like


define void @f(%struct.foo* onstack align 8 %x) nounwind uwtable optsize noinline ssp {
entry:
  tail call void @h(%struct.foo* %x) nounwind optsize
  ret void
}

and g become


define void @g(i64 %a, i64 %b, i8 signext %c) nounwind uwtable optsize ssp {
entry:
  call void @f(%struct.foo onstack align 8 {%a, %b, %c }) optsize
  ret void
}

That is, for onstack arguments, a pointer in the calle is matched by a value in the caller and codegen is the one responsible for creating the address when it produces the call frame. This should let it do a better job than what we currently do for g:

_g: ## @​g
subq $56, %rsp
movq %rdi, 32(%rsp)
movq %rsi, 40(%rsp)
movb %dl, 48(%rsp)
movq 48(%rsp), %rax
movq %rax, 16(%rsp)
movq 32(%rsp), %rax
movq 40(%rsp), %rcx
movq %rcx, 8(%rsp)
movq %rax, (%rsp)
callq _f
addq $56, %rsp
ret

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 6, 2012

Hi Rafael, suppose the example was this:

void g(long a, long b, char c) {
struct foo x = {a, b, c};
f(x);
do_something_with(x); <- or address of x, doesn't matter
}

Then making a copy of x is obligatory when calling f, because f is allowed
to write to the memory passed to it, but such changes are not allowed show up
in x.

So in order to do your proposed optimization, you first have to check that
it is safe to pass the address of x directly to f, by analyzing the code
following the call to check that it doesn't read from x.

If that check passes, then it is OK to pass the address of x even if there
were loads of complicated accesses to x before the call to f. However your
syntax
call void @​f(%struct.foo onstack align 8 {%a, %b, %c }) optsize
would only handle simple cases in which it is clear what is in x at the
moment of the call.

Instead, couldn't you just drop the 'byval' attribute on the call, and then
codegen should just pass the address of x straight through?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 6, 2012

PS: The analysis would also have to prove that &x doesn't escape before the call
to f, otherwise f might read from x (such reads aren't allowed to see any writes
that f makes to x).

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 6, 2012

Then making a copy of x is obligatory when calling f, because f is allowed
to write to the memory passed to it, but such changes are not allowed show up
in x.

Yes, Codegen would still do the copy, like it does todays for simple types like i64. The problem is just the representation at the IL level.

And yes, if we known that the function never modifies its argument, we can optimize the code to just drop the byval/onstack attribute. In the example I wrote that optimization was not done, there is still an implicit copy done by the codegen.

Cheers,
Rafael

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 6, 2012

Just to be clear, you don't need to know that f doesn't modify the argument.
It is OK if it modifies the argument as long x isn't used after the call to f.
So your testcase function g is fine because x isn't used after the call to f.

I'm not sure why you want a new IR construct when just dropping the byval on
the call should do the right thing in your testcase. Maybe you have some more
complicated testcases in mind?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 6, 2012

Just to be clear, you don't need to know that f doesn't modify the argument.
It is OK if it modifies the argument as long x isn't used after the call to f.
So your testcase function g is fine because x isn't used after the call to f.

No, it is fine because I am defining outreg as causing codegen to create a copy!

I'm not sure why you want a new IR construct when just dropping the byval on
the call should do the right thing in your testcase. Maybe you have some more
complicated testcases in mind?

No, I am trying to handle the general case. As I noted, the example I gave is not optimized on any assumptions about f.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 6, 2012

Are we talking at cross-purposes?

As far as I can see all that is missing for your testcase is an IR level
optimization that drops the 'byval' in the call for examples like the call
to f in your function g. I don't see why outreg is needed.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 6, 2012

Are we talking at cross-purposes?

As far as I can see all that is missing for your testcase is an IR level
optimization that drops the 'byval' in the call for examples like the call
to f in your function g. I don't see why outreg is needed.

Because I used a simple example. I am not interested on what is the best codgen for this small testcase given knowledge of what f does, but on a more general problem. Let me give a more complex one


struct foo {
long a;
long b;
char c;
};
int should_modify(void);
void h(struct foo *x);
attribute((noinline)) void f(struct foo x) {
if (should_modify())
x.b = 42;
h(&x);
}
void g(long a, long b, char c) {
struct foo x = {a, b, c};
f(x);
f(x);
}

In here the codegen for f would stays very similar to what it is today:


define void @​f(%struct.foo* onstack align 8 %x) nounwind uwtable optsize noinline ssp {
entry:
%call = tail call i32 @​should_modify() nounwind optsize
%tobool = icmp eq i32 %call, 0
br i1 %tobool, label %if.end, label %if.then

if.then: ; preds = %entry
%b = getelementptr inbounds %struct.foo* %x, i64 0, i32 1
store i64 42, i64* %b, align 8, !tbaa !​0
br label %if.end

if.end: ; preds = %entry, %if.then
tail call void @​h(%struct.foo* %x) nounwind optsize
ret void
}

but g would become


define void @​g(i64 %a, i64 %b, i8 signext %c) nounwind uwtable optsize ssp {
entry:
call void @​f(%struct.foo onstack align 8 {%a, %b, %c }) optsize ;codegen does a copy in here
call void @​f(%struct.foo onstack align 8 {%a, %b, %c }) optsize ;codegen does a copy in here
ret void
}

I am precisely interested in the cases where we don't know what f does. In fact, I included f's body just to show that the IL for it should stays similar to what it is today. The Il for g would stay the same even if you replaced f's definition with a declaration.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 6, 2012

I think you only avoid one copy (the first one):

define void @​g(i64 %a, i64 %b, i8 signext %c) nounwind uwtable optsize ssp {
entry:
call void @​f(%struct.foo onstack align 8 {%a, %b, %c }) optsize ;codegen does
a copy in here

^ I agree you would avoid a copy here. But the call to f may modify the part of
the stack you put %a, %b, %c in. Thus you will need to set the stack up again.

call void @​f(%struct.foo onstack align 8 {%a, %b, %c }) optsize ;codegen does
a copy in here

^ I.e. you will need to store %a, %b, %c to stack again before doing this call.

ret void
}

My proposed optimization would also save a copy in this case (the second one).

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 6, 2012

I think you only avoid one copy (the first one):

define void @​g(i64 %a, i64 %b, i8 signext %c) nounwind uwtable optsize ssp {
entry:
call void @​f(%struct.foo onstack align 8 {%a, %b, %c }) optsize ;codegen does
a copy in here

^ I agree you would avoid a copy here. But the call to f may modify the part
of
the stack you put %a, %b, %c in. Thus you will need to set the stack up again.

I would not avoid a copy. Codegen is doing the copy as it does today with byval, this avoids us having two copies, on at the IL level and one at codegen.

call void @​f(%struct.foo onstack align 8 {%a, %b, %c }) optsize ;codegen does
a copy in here

^ I.e. you will need to store %a, %b, %c to stack again before doing this call.

ret void
}

My proposed optimization would also save a copy in this case (the second one).

The problem is not a specific case. Add 500 calls to f in g if you want. Only the last one will be a special case if you do not know what f does. Lets keep this bug about the general case, not particular optimizations.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 6, 2012

The problem is not a specific case. Add 500 calls to f in g if you want. Only
the last one will be a special case if you do not know what f does. Lets keep
this bug about the general case, not particular optimizations.

This is probably what I'm not getting. As far as I can see you are only dealing
with a special case: the case in which you know at compile time what is in the
memory at the moment of the call.

Also, as far as I can see, you are only ever saving one copy: right now there
is a copy of %a, %b, %c into an alloca; then codegen generates a bunch of copies
from the alloca, one for each byval call. You save the copy into the alloca.
If you are actually saving more copies, please explain how, since I don't see
this.

I'm also saving only one copy, but a different one: no matter how many 500
calls to f and g etc, codegen doesn't have to do the last copy.

So if I'm correct that you are only saving one copy, and only doing it in a
special case, is it worth introducing this new IR construct when using my
approach you can probably in the same situations also save one copy without
any new IR construct?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 6, 2012

This is probably what I'm not getting. As far as I can see you are only
dealing
with a special case: the case in which you know at compile time what is in the
memory at the moment of the call.

no. I am just letting the caller uses values and know that the codegen will produce the copy to a stack space. A possible assembly for g is


_g:
subq $56, %rsp

move values to callee saved regs

    movq	%rdx, %rbx
    movq	%rsi, %r12
movq	%rdi, %r13

first copy, set up by codegen

    movq	%rbx, 16(%rsp)
    movq	%r12, 8(%rsp)
movq	%r13, (%rsp)
callq	_f

second copy, set up by codegen because we don't know if f modified the stack

or not the the caller (_g) has no copy available for us to use (its copy is

in regs).

    movq	%rbx, 16(%rsp)
    movq	%r12, 8(%rsp)
movq	%r13, (%rsp)
    callq	_f
addq	$56, %rsp
ret

Also, as far as I can see, you are only ever saving one copy: right now there
is a copy of %a, %b, %c into an alloca; then codegen generates a bunch of
copies
from the alloca, one for each byval call. You save the copy into the alloca.
If you are actually saving more copies, please explain how, since I don't see
this.

It saves having the two copies on the stack. Currently when we pass a struct by value we always get one because of the alloca and one because of codegen.

I'm also saving only one copy, but a different one: no matter how many 500
calls to f and g etc, codegen doesn't have to do the last copy.

Changing only codegen, how would you avoid codegen doing the copy? The callers copy is in an alloca. If I complicate f's signature just a bit (say "int f(int a, struct foo x)" codegen would have to be very clever to decide where that alloca would live for it to be in a valid frame according to the abi.

So if I'm correct that you are only saving one copy, and only doing it in a
special case, is it worth introducing this new IR construct when using my
approach you can probably in the same situations also save one copy without
any new IR construct?

It is not a special case. The above codegen avoids the copy for any f. It also saves stack space and lets the caller uses only values, wich are easier to handle in analysis than addresses.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 6, 2012

no. I am just letting the caller uses values and know that the codegen will
produce the copy to a stack space. A possible assembly for g is


_g:
subq $56, %rsp

move values to callee saved regs

    movq    %rdx, %rbx
    movq    %rsi, %r12
movq    %rdi, %r13

How do you know what the values are? Here it is easy because you are
just passing through %a, %b, %c. What about this:

void g() {
struct foo x;
init_x(&x);
f(x);
}

?

As far as I can see you can't handle it: you need to know what "the values" are.
That's why I'm saying that you only seem to be handling a special case.

Changing only codegen, how would you avoid codegen doing the copy? The callers
copy is in an alloca. If I complicate f's signature just a bit (say "int f(int
a, struct foo x)" codegen would have to be very clever to decide where that
alloca would live for it to be in a valid frame according to the abi.

OK, I can buy this.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 6, 2012

How do you know what the values are? Here it is easy because you are
just passing through %a, %b, %c. What about this:

void g() {
struct foo x;
init_x(&x);
f(x);
}

?

As far as I can see you can't handle it: you need to know what "the values"
are.
That's why I'm saying that you only seem to be handling a special case.

That is not one of the cases we use byval. As far as I can tell the Il can handle that perfectly today, both the caller and callee need a pointer and that is what they get. In fact, even for

void g() {
int x;
init_x(&x);
f(x);
}

we get (and should get) an alloca. Being more specific, the cases I find a onstack attribute interesting are the ones we currently need to use a byval. It might also be handy for cases where we don't currently use a byval but the ABI puts the values in memory anyway, but that is a smaller concern I think.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 6, 2012

struct foo {
long a; long b; char c;
};

extern void init_x(struct foo *);
extern void f(struct foo);

void g() {
struct foo x;
init_x(&x);
f(x);
}

->

define void @​g() nounwind uwtable {
entry:
%x = alloca %struct.foo, align 8
call void @​init_x(%struct.foo* %x) nounwind
call void @​f(%struct.foo* byval align 8 %x) nounwind
ret void
}

Notice the byval on the call to @​f. In theory it should be possible to set
aside the stack frame for the call to @​f before calling @​init_x, and have
@​init_x initialize it. Then there would be no need for any copying.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 6, 2012

Related patch adding support for not using byval on small structs for x86_64.
Added a patch to Clang which does a simple optimization to avoid using byval when trying to pass some small structs on the stack.

The patch includes comments about how we would like to use 'onstack' to be able to make the patch more general.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Mar 6, 2012

define void @​g() nounwind uwtable {
entry:
%x = alloca %struct.foo, align 8
call void @​init_x(%struct.foo* %x) nounwind
call void @​f(%struct.foo* byval align 8 %x) nounwind
ret void
}

Notice the byval on the call to @​f. In theory it should be possible to set
aside the stack frame for the call to @​f before calling @​init_x, and have
@​init_x initialize it. Then there would be no need for any copying.

Yes, sorry, I misunderstood your example. It is a case the IL would get uglier. I would propose


define void @​g() nounwind uwtable optsize ssp {
entry:
%x = alloca %struct.foo, align 8
call void @​init_x(%struct.foo* %x) nounwind optsize
%0 = load %struct.foo* %x, align 8
call void @​f(%0) nounwind optsize
ret void
}
declare void @​init_x(%struct.foo*) optsize
declare void @​f(%struct.foo* onstack) optsize

I should still be possible to do the optimization you describe, but I agree the matching now has to account for the load. On the other side, if g itself has uses of x they could be done via the value %0.

The example makes me realize that it is more precise to say codegen would be responsible for storing the value in the call frame, not doing a copy.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla llvm:codegen
Projects
None yet
Development

No branches or pull requests

1 participant