Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pass-by-reference arg shows incorrect debuginfo value within loop after LICM #37030

Open
gregbedwell opened this issue Jun 4, 2018 · 8 comments
Labels
bugzilla Issues migrated from bugzilla

Comments

@gregbedwell
Copy link
Collaborator

Bugzilla Link 37682
Version trunk
OS Windows NT
CC @adrian-prantl,@dwblaikie,@FlameTop,@jmorse,@pogo59,@vedantk

Extended Description

Versions:

$ clang --version
clang version 7.0.0 (trunk 333874) (llvm/trunk 333894)
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: e:\build\Release\bin

$ lldb --version
lldb version 7.0.0 (https://git.llvm.org/git/lldb.git/ revision 0fbc22d3ed13e6825ddd28e9fdf50e8c7887850f)
  clang revision 333874
  llvm revision 333894

Repro case:

$ cat tests\fibonacci\test.cpp
__attribute__((__noinline__))
void Fibonacci(int terms, int& total)
{
  int first = 0;
  int second = 1;

  for (int i = 0; i < terms; ++i)
  {
    int next = first + second; // DexWatch('total', '*(int*)$rdx', '$r10')
    total += first;            // DexWatch('total', '*(int*)$rdx', '$r10')
    first = second;            // DexWatch('total', '*(int*)$rdx', '$r10')
    second = next;             // DexWatch('total', '*(int*)$rdx', '$r10')
  }
}

int main()
{
  int total = 0;
  Fibonacci(6, total);
  return total;
}

Trace output from DExTer ( http://llvm.org/devmtg/2018-04/slides/Bedwell-Measuring_the_User_Debugging_Experience.pdf ) with -opt-bisect-limit=31. Each line represents stepping once in the debugger. The values in the {} are the results of evaluating those expressions in the debugger while paused at that step.

We can see that variable total which is a function argument that has been passed by reference lives in the memory pointed at by $rdx, so the values match. Ignore $r10 for now.

$ py -3 dexter.py test --builder clang --debugger lldb tests\fibonacci -v --cflags="-O2 -g -mllvm -opt-bisect-limit=31" --ldflags="-g"

## BEGIN ##
[1, "main", "test.cpp", 18, 7, "BREAKPOINT", "FUNC", {}]
[2, "main", "test.cpp", 19, 3, "BREAKPOINT", "FORWARD", {}]
.   [3, "void Fibonacci(int,int &)", "test.cpp", 7, 21, "BREAKPOINT", "FUNC", {}]
.   [4, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT", "FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "0"}]
.   [5, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "BREAKPOINT", "FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "1"}]
.   [6, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [7, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT", "FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "1"}]
.   [8, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "BREAKPOINT", "FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "2"}]
.   [9, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [10, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT", "FORWARD", {"total": "1", "*(int*)$rdx": "1", "$r10": "2"}]
.   [11, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "BREAKPOINT", "FORWARD", {"total": "1", "*(int*)$rdx": "1", "$r10": "3"}]
.   [12, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [13, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT", "FORWARD", {"total": "2", "*(int*)$rdx": "2", "$r10": "3"}]
.   [14, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "BREAKPOINT", "FORWARD", {"total": "2", "*(int*)$rdx": "2", "$r10": "5"}]
.   [15, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [16, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT", "FORWARD", {"total": "4", "*(int*)$rdx": "4", "$r10": "5"}]
.   [17, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "BREAKPOINT", "FORWARD", {"total": "4", "*(int*)$rdx": "4", "$r10": "8"}]
.   [18, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [19, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT", "FORWARD", {"total": "7", "*(int*)$rdx": "7", "$r10": "8"}]
.   [20, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "BREAKPOINT", "FORWARD", {"total": "7", "*(int*)$rdx": "7", "$r10": "13"}]
.   [21, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [22, "void Fibonacci(int,int &)", "test.cpp", 14, 1, "BREAKPOINT", "FORWARD", {}]
[23, "main", "test.cpp", 20, 10, "BREAKPOINT", "FUNC", {}]
## END (23 steps) ##

Trace output from DExTer with -opt-bisect-limit=32 (BISECT: running pass (32) Loop Invariant Code Motion on loop).

Within the loop, variable 'total' now lives in $r10 and is only stored back to the address in $rdx after the loop terminates. The debug info for total still appears to be pointing at address of $rdx throughout though, so we get an incorrect value of 0.

$ py -3 dexter.py test --builder clang --debugger lldb tests\fibonacci -v --cflags="-O2 -g -mllvm -opt-bisect-limit=32" --ldflags="-g"

## BEGIN ##
[1, "main", "test.cpp", 18, 7, "BREAKPOINT", "FUNC", {}]
[2, "main", "test.cpp", 19, 3, "BREAKPOINT", "FORWARD", {}]
.   [3, "void Fibonacci(int,int &)", "test.cpp", 7, 21, "BREAKPOINT", "FUNC", {}]
.   [4, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "BREAKPOINT", "FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "0"}]
.   [5, "void Fibonacci(int,int &)", "test.cpp", 7, 3, "STEP", "BACKWARD", {}]
.   [6, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT", "FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "0"}]
.   [7, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "STEP", "FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "0"}]
.   [8, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [9, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT", "FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "0"}]
.   [10, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "STEP", "FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "0"}]
.   [11, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [12, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT", "FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "1"}]
.   [13, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "STEP", "FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "1"}]
.   [14, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [15, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT", "FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "2"}]
.   [16, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "STEP", "FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "2"}]
.   [17, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [18, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT", "FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "4"}]
.   [19, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "STEP", "FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "4"}]
.   [20, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [21, "void Fibonacci(int,int &)", "test.cpp", 9, 22, "BREAKPOINT", "FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "7"}]
.   [22, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "STEP", "FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "7"}]
.   [23, "void Fibonacci(int,int &)", "test.cpp", 7, 30, "STEP", "BACKWARD", {}]
.   [24, "void Fibonacci(int,int &)", "test.cpp", 10, 11, "STEP", "FORWARD", {"total": "0", "*(int*)$rdx": "0", "$r10": "12"}]
.   [25, "void Fibonacci(int,int &)", "test.cpp", 14, 1, "BREAKPOINT", "FORWARD", {}]
[26, "main", "test.cpp", 20, 10, "BREAKPOINT", "FUNC", {}]
## END (26 steps) ##

Even though "total" comes in as a reference I think it would be fair to set the location of total to the register throughout the loop. At the very least I'd expect to see a "variable not visible due to optimization" type message.

@vedantk
Copy link
Collaborator

vedantk commented Jun 5, 2018

I don't think it's feasible to assert that the debugger show updated values for 'total' within this loop, because it amounts to a demand that LICM never happen. As applied to this loop, the whole point of LICM is to sink the store.

Even though "total" comes in as a reference I think it would be fair to set the location of total to the register throughout the loop.

I think this would be misleading. Inspecting "total" should show the user the value pointed-to by the "total" pointer. This is the only way print a correct value when "total" is aliased.

At the very least I'd expect to see a "variable not visible due to optimization" type message.

This might create scenarios where the variable is never presented as available, even though the debugger is perfectly able to inspect it, because hoisting/sinking occurs frequently.

I also think this would be misleading, because users expect any inspection of "total" to show the value pointed-to by the "total" pointer.

@gregbedwell
Copy link
Collaborator Author

That seems eminently reasonable so I think we'll have to resolve this as it is, but it still leaves me uneasy. Putting myself in the shoes of some user who has written this loop and is trying to debug it, but doesn't know/care anything about compiler optimizations, the debugger is just plain wrong.

This is an unfortunate case where we're comparing pretty unfavourably with the Microsoft compiler in terms of quality of the optimized debugging experience. Looking into it more closely it seems the reason that they do so well is because they have a store inside the loop (tested with VS2017 at /Ox).

It's a shame that there's no way in DWARF (... I assume ... paging Paul ;) ) to represent this sort of situation where we know where the value according to the semantics of the user code being stepped through, but that it has not yet been stored to the location of the pointer. Feature request for a future standard?

I'll put this in the bucket of things to assess the impact of disabling for -Og mode. For reference, GCC 5.4.0 has the same -O2 behaviour as clang here, but the value of total is visible on every iteration with -Og.

@adrian-prantl
Copy link
Collaborator

It's a shame that there's no way in DWARF (... I assume ... paging Paul ;) ) to represent this sort of situation where we know where the value according to the semantics of the user code being stepped through, but that it has not yet been stored to the location of the pointer.

Drive-by comment without having read the context: DWARF has DW_OP_implicit_pointer:

An optimizing compiler may eliminate a pointer, while still retaining the value that
the pointer addressed. DW_OP_implicit_pointer allows a producer to describe this
value.
The DW_OP_implicit_pointer operation specifies that the object is a pointer
that cannot be represented as a real pointer, even though the value it would
point to can be described.

@pogo59
Copy link
Collaborator

pogo59 commented Jun 6, 2018

Within the loop, variable 'total' now lives in $r10 and is only stored
back to the address in $rdx after the loop terminates.

That's not really stating the situation accurately. 'total' is a reference,
not an int; it's just that syntactically there's no difference within the
loop. If you declared 'total' as 'int *' I don't think the confusion would
occur.

The value of the pointed-to variable is being computed in the loop, and
not actually stored until it terminates. There are lots of cases where we
might be computing something and not actually update a variable until later.

As Adrian points out, in DWARF v5 we could describe this more accurately.
In earlier versions, we cannot, because there's no way to describe a
pointer to a register.

FTR, I am sympathetic to the idea that '-Og' should avoid code-motion
optimizations. Not just for cases like this, but because it messes with
single-stepping something awful.

@gregbedwell
Copy link
Collaborator Author

Within the loop, variable 'total' now lives in $r10 and is only stored
back to the address in $rdx after the loop terminates.

That's not really stating the situation accurately. 'total' is a reference,
not an int; it's just that syntactically there's no difference within the
loop. If you declared 'total' as 'int *' I don't think the confusion would
occur.

Right. I'm trying to come at this problem by putting aside all of my knowledge of compiler internals and just think about the experience of the average user who doesn't care to drop down to stepping through disassembly and just expects source level stepping to work. In this case, from the user point of view I can happily step through the loop and see some of my variables being updated, but not this one in particular. With my other compiler for my other platform I can see all my variables being updated on each iteration. Therefore I prefer to debug on that other platform.

I guess my question at this stage is whether:
a) would it be desirable
b) would it be feasible
to have the debugger report something like "the actual value of the reference is 0, but I've currently got a value of 7 sitting in a register waiting to be stored there"?

@pogo59
Copy link
Collaborator

pogo59 commented Jun 6, 2018

I guess my question at this stage is whether:
a) would it be desirable
b) would it be feasible
to have the debugger report something like "the actual value of the
reference is 0, but I've currently got a value of 7 sitting in a register
waiting to be stored there"?

It would obviously improve the debugging experience in this kind of situation,
but it would take more expressive power than DWARF has to describe it.

@gregbedwell
Copy link
Collaborator Author

Even though "total" comes in as a reference I think it would be fair to
set the location of total to the register throughout the loop.

I think this would be misleading. Inspecting "total" should show the user
the value pointed-to by the "total" pointer. This is the only way print a
correct value when "total" is aliased.

I also think this would be misleading, because users expect any inspection
of "total" to show the value pointed-to by the "total" pointer.

That's not really stating the situation accurately. 'total' is a
reference, not an int; it's just that syntactically there's no difference
within the loop. If you declared 'total' as 'int *' I don't think the
confusion would occur.

I've been thinking about this recently, and the more I think about it, the more I'm not sure that I agree with this entirely. My concern comes down to that fact that regardless of the underlying compiler implementation it's a reference, not a pointer:

"It is neither a pointer to the object, nor a copy of the object. It is the object." - https://isocpp.org/wiki/faq/references#overview-refs

So, from a pure user perspective, I still think that there's an argument to be made that for a few instructions within the loop the location of 'total' (and any aliases of it) is a register, not an address. If total had come into the function as int* I'd agree entirely with the above points (but still arguing in favour of a way to represent 'pending' values to the debugger).

It may well be that we have to accept this as a limitation of the LLVM representation, but I'm leaning towards disagreeing with the assessment that it would be incorrect behaviour to temporarily bind the debug location of total to the register, as opposed to reasonable behaviour that may be not feasible to properly model.

@pogo59
Copy link
Collaborator

pogo59 commented Aug 30, 2018

Hmmm.

If 'total' was a global 'int' variable instead of an 'int&' parameter,
and the generated code was effectively the same (e.g. when compiled as
PIC), I think there would be less disagreement about the appropriateness
of identifying the location as in the register, for the course of those
few instructions.

So, I'm becoming more convinced that Greg has a point, and the problem
is mainly that we can't describe it correctly prior to DWARF v5.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

4 participants