New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for 'outreg' parameter attribute #12565
Comments
Maybe 'onstack' is a more reasonable name. |
It would be really nice if the caller could see a value and the callee could see memory, as that is what will happen in practice. Consider for example
for the function f getting a byval argument is perfect. It knows the value is in memory and is compiled to define void @f(%struct.foo* byval align 8 %x) nounwind uwtable optsize noinline ssp {
entry:
tail call void @h(%struct.foo* %x) nounwind optsize
ret void
} For the function g byval is bad because it thinks it has to create a stack location, but codegen will copy it anyway. We produce: define void @g(i64 %a, i64 %b, i8 signext %c) nounwind uwtable optsize ssp {
entry:
%x = alloca %struct.foo, align 8
%a1 = getelementptr inbounds %struct.foo* %x, i64 0, i32 0
store i64 %a, i64* %a1, align 8, !tbaa !​0
%b2 = getelementptr inbounds %struct.foo* %x, i64 0, i32 1
store i64 %b, i64* %b2, align 8, !tbaa !​0
%c3 = getelementptr inbounds %struct.foo* %x, i64 0, i32 2
store i8 %c, i8* %c3, align 8, !tbaa !​1
call void @f(%struct.foo* byval align 8 %x) optsize
ret void
} The IL for f could remain something like define void @f(%struct.foo* onstack align 8 %x) nounwind uwtable optsize noinline ssp {
entry:
tail call void @h(%struct.foo* %x) nounwind optsize
ret void
} and g become
That is, for onstack arguments, a pointer in the calle is matched by a value in the caller and codegen is the one responsible for creating the address when it produces the call frame. This should let it do a better job than what we currently do for g: _g: ## @g |
Hi Rafael, suppose the example was this: void g(long a, long b, char c) { Then making a copy of x is obligatory when calling f, because f is allowed So in order to do your proposed optimization, you first have to check that If that check passes, then it is OK to pass the address of x even if there Instead, couldn't you just drop the 'byval' attribute on the call, and then |
PS: The analysis would also have to prove that &x doesn't escape before the call |
Yes, Codegen would still do the copy, like it does todays for simple types like i64. The problem is just the representation at the IL level. And yes, if we known that the function never modifies its argument, we can optimize the code to just drop the byval/onstack attribute. In the example I wrote that optimization was not done, there is still an implicit copy done by the codegen. Cheers, |
Just to be clear, you don't need to know that f doesn't modify the argument. I'm not sure why you want a new IR construct when just dropping the byval on |
No, it is fine because I am defining outreg as causing codegen to create a copy!
No, I am trying to handle the general case. As I noted, the example I gave is not optimized on any assumptions about f. |
Are we talking at cross-purposes? As far as I can see all that is missing for your testcase is an IR level |
Because I used a simple example. I am not interested on what is the best codgen for this small testcase given knowledge of what f does, but on a more general problem. Let me give a more complex one struct foo {
|
I think you only avoid one copy (the first one): define void @g(i64 %a, i64 %b, i8 signext %c) nounwind uwtable optsize ssp { ^ I agree you would avoid a copy here. But the call to f may modify the part of call void @f(%struct.foo onstack align 8 {%a, %b, %c }) optsize ;codegen does ^ I.e. you will need to store %a, %b, %c to stack again before doing this call. ret void My proposed optimization would also save a copy in this case (the second one). |
I would not avoid a copy. Codegen is doing the copy as it does today with byval, this avoids us having two copies, on at the IL level and one at codegen.
The problem is not a specific case. Add 500 calls to f in g if you want. Only the last one will be a special case if you do not know what f does. Lets keep this bug about the general case, not particular optimizations. |
This is probably what I'm not getting. As far as I can see you are only dealing Also, as far as I can see, you are only ever saving one copy: right now there I'm also saving only one copy, but a different one: no matter how many 500 So if I'm correct that you are only saving one copy, and only doing it in a |
no. I am just letting the caller uses values and know that the codegen will produce the copy to a stack space. A possible assembly for g is _g: move values to callee saved regs
first copy, set up by codegen
second copy, set up by codegen because we don't know if f modified the stackor not the the caller (_g) has no copy available for us to use (its copy isin regs).
It saves having the two copies on the stack. Currently when we pass a struct by value we always get one because of the alloca and one because of codegen.
Changing only codegen, how would you avoid codegen doing the copy? The callers copy is in an alloca. If I complicate f's signature just a bit (say "int f(int a, struct foo x)" codegen would have to be very clever to decide where that alloca would live for it to be in a valid frame according to the abi.
It is not a special case. The above codegen avoids the copy for any f. It also saves stack space and lets the caller uses only values, wich are easier to handle in analysis than addresses. |
How do you know what the values are? Here it is easy because you are void g() { ? As far as I can see you can't handle it: you need to know what "the values" are.
OK, I can buy this. |
That is not one of the cases we use byval. As far as I can tell the Il can handle that perfectly today, both the caller and callee need a pointer and that is what they get. In fact, even for void g() { we get (and should get) an alloca. Being more specific, the cases I find a onstack attribute interesting are the ones we currently need to use a byval. It might also be handy for cases where we don't currently use a byval but the ABI puts the values in memory anyway, but that is a smaller concern I think. |
struct foo { extern void init_x(struct foo *); void g() { -> define void @g() nounwind uwtable { Notice the byval on the call to @f. In theory it should be possible to set |
Related patch adding support for not using byval on small structs for x86_64. The patch includes comments about how we would like to use 'onstack' to be able to make the patch more general. |
Yes, sorry, I misunderstood your example. It is a case the IL would get uglier. I would propose define void @g() nounwind uwtable optsize ssp {
|
Extended Description
We currently have support for 'inreg' as a parameter attribute on calls.
There are situations in which front ends would benefit from having an 'outreg' parameter attribute available.
For example, in Clang we occasionally would like to lower structs which should be passed on the stack into scalar LLVM values (in order to avoid byval, which tends to pessimize codeine).
On x86_64, for example, we currently are unable to (easily) do this in situations in which the argument to be passed on the stack occurs before all integer registers are exhausted, because turning it into a scalar value would result in the argument occupying an integer register. We could reorder the arguments to deal with this, but that would be much more complicated for the frontend.
The text was updated successfully, but these errors were encountered: