LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 34136 - Improve optimized debug info for escaped variables that live in memory
Summary: Improve optimized debug info for escaped variables that live in memory
Status: NEW
Alias: None
Product: libraries
Classification: Unclassified
Component: DebugInfo (show other bugs)
Version: trunk
Hardware: PC Windows NT
: P enhancement
Assignee: Unassigned LLVM Bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-09 15:17 PDT by Reid Kleckner
Modified: 2021-01-11 10:28 PST (History)
21 users (show)

See Also:
Fixed By Commit(s):


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Reid Kleckner 2017-08-09 15:17:56 PDT
Consider:

int getint(void);
void alias(char *);
void crash(void);
void f() {
  int x = getint();
  int y = getint();
  int z = getint();
  alias((char*)&x);
  alias((char*)&y);
  alias((char*)&z);
  crash();
}

InstCombine runs LowerDbgDeclares on this code, and we end up with very poor debug info. We basically say that x, y, and z live in EAX after each call to getint, and they are immediately clobbered, never to be available again.

Those casts to char* also prevent the transform from adding more dbg.values that would extend the range.

This was reduced from http://crbug.com/753934
Comment 1 Reid Kleckner 2017-08-09 15:31:24 PDT
IR before instcombine:

; Function Attrs: nounwind uwtable
define void @f() #0 !dbg !11 {
entry:
  %x = alloca i32, align 4
  %y = alloca i32, align 4
  %z = alloca i32, align 4
  %0 = bitcast i32* %x to i8*, !dbg !19
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %0) #4, !dbg !19
  call void @llvm.dbg.declare(metadata i32* %x, metadata !15, metadata !20), !dbg !21
  %call = call i32 @getint(), !dbg !22
  store i32 %call, i32* %x, align 4, !dbg !21, !tbaa !23
  %1 = bitcast i32* %y to i8*, !dbg !27
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %1) #4, !dbg !27
  call void @llvm.dbg.declare(metadata i32* %y, metadata !17, metadata !20), !dbg !28
  %call1 = call i32 @getint(), !dbg !29
  store i32 %call1, i32* %y, align 4, !dbg !28, !tbaa !23
  %2 = bitcast i32* %z to i8*, !dbg !30
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %2) #4, !dbg !30
  call void @llvm.dbg.declare(metadata i32* %z, metadata !18, metadata !20), !dbg !31
  %call2 = call i32 @getint(), !dbg !32
  store i32 %call2, i32* %z, align 4, !dbg !31, !tbaa !23
  %3 = bitcast i32* %x to i8*, !dbg !33
  call void @alias(i8* %3), !dbg !34
  %4 = bitcast i32* %y to i8*, !dbg !35
  call void @alias(i8* %4), !dbg !36
  %5 = bitcast i32* %z to i8*, !dbg !37
  call void @alias(i8* %5), !dbg !38
  call void @crash(), !dbg !39
  %6 = bitcast i32* %z to i8*, !dbg !40
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %6) #4, !dbg !40
  %7 = bitcast i32* %y to i8*, !dbg !40
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %7) #4, !dbg !40
  %8 = bitcast i32* %x to i8*, !dbg !40
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %8) #4, !dbg !40
  ret void, !dbg !40
}

IR after instcombine:

define void @f() #0 !dbg !11 {
entry:
  %x = alloca i32, align 4
  %y = alloca i32, align 4
  %z = alloca i32, align 4
  %0 = bitcast i32* %x to i8*, !dbg !19
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %0) #4, !dbg !19
  %call = call i32 @getint() #4, !dbg !20
  call void @llvm.dbg.value(metadata i32 %call, metadata !15, metadata !21), !dbg !22
  store i32 %call, i32* %x, align 4, !dbg !22, !tbaa !23
  %1 = bitcast i32* %y to i8*, !dbg !27
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %1) #4, !dbg !27
  %call1 = call i32 @getint() #4, !dbg !28
  call void @llvm.dbg.value(metadata i32 %call1, metadata !17, metadata !21), !dbg !29
  store i32 %call1, i32* %y, align 4, !dbg !29, !tbaa !23
  %2 = bitcast i32* %z to i8*, !dbg !30
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %2) #4, !dbg !30
  %call2 = call i32 @getint() #4, !dbg !31
  call void @llvm.dbg.value(metadata i32 %call2, metadata !18, metadata !21), !dbg !32
  store i32 %call2, i32* %z, align 4, !dbg !32, !tbaa !23
  %3 = bitcast i32* %x to i8*, !dbg !33
  call void @alias(i8* %3) #4, !dbg !34
  %4 = bitcast i32* %y to i8*, !dbg !35
  call void @alias(i8* %4) #4, !dbg !36
  %5 = bitcast i32* %z to i8*, !dbg !37
  call void @alias(i8* %5) #4, !dbg !38
  call void @crash() #4, !dbg !39
  %6 = bitcast i32* %z to i8*, !dbg !40
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %6) #4, !dbg !40
  %7 = bitcast i32* %y to i8*, !dbg !40
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %7) #4, !dbg !40
  %8 = bitcast i32* %x to i8*, !dbg !40
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %8) #4, !dbg !40
  ret void, !dbg !40
}

Note that we only have 3 dbg.value intrinsics. These ultimately turn into DEBUG_VALUE EAX MIR instructions:

        #DEBUG_VALUE: f:x <- %EAX
        .cv_loc 0 1 5 7                 # store.c:5:7
        movl    %eax, 52(%rsp)
        .cv_loc 0 1 6 11                # store.c:6:11
        callq   getint
.Ltmp1:
        #DEBUG_VALUE: f:y <- %EAX
        .cv_loc 0 1 6 7                 # store.c:6:7
        movl    %eax, 48(%rsp)
        .cv_loc 0 1 7 11                # store.c:7:11
        callq   getint
.Ltmp2:
        #DEBUG_VALUE: f:z <- %EAX
        .cv_loc 0 1 7 7                 # store.c:7:7
        movl    %eax, 44(%rsp)
        leaq    52(%rsp), %rcx
        .cv_loc 0 1 8 3                 # store.c:8:3
        callq   alias

They are all immediately clobbered, so that's no good.

Even if we drop dbg.declare, we should have something to indicate that the variable will live in memory. The dbg.value could use a DW_OP_deref DIExpression after the store, for example.
Comment 2 Reid Kleckner 2017-08-10 10:20:00 PDT
LowerDbgDeclare gives up on any use that isn't a direct use of the entire alloca. Aggregates are typically built up one field at a time, so we can often lose all debug info for them. Consider this example:

struct Foo { int x, y; };
int getint(void);
void escape(char *);
void f() {
  struct Foo o;
  o.x = getint();
  o.y = getint();
  escape((char *)&o);
}

I'm starting to feel more and more like LowerDbgDeclare is just a bad idea. We should keep dbg.declare around as long as the variable's alloca is still alive. Even if instcombine, GVN, or some other pass deletes or moves some stores to a local variable, I'd much rather tell the user the variable is in memory with a stale value than that we don't have any info at all.
Comment 3 David Blaikie 2017-08-10 10:22:13 PDT
(In reply to Reid Kleckner from comment #2)
> I'd much rather tell the user the variable is in
> memory with a stale value than that we don't have any info at all.

There's no way to communicate to the user that the value is potentially stale though - so that seems likely to /really/ confuse a user & lead them way down the garden path, potentially.
Comment 4 Adrian Prantl 2017-08-10 10:25:01 PDT
We probably want both.

My favorite example is this:

void g(int*);
void f(int i) {
  g(&i); // alloca is live until here
  if (i == 0) { // end of alloca live range
     ...  // i is now a constant, but without lowering to dbg.value we can't capture that and would still point to the stack slot that may even have been reused by now.
  }
}
Comment 5 Reid Kleckner 2017-08-10 11:05:06 PDT
(In reply to David Blaikie from comment #3)
> (In reply to Reid Kleckner from comment #2)
> > I'd much rather tell the user the variable is in
> > memory with a stale value than that we don't have any info at all.
> 
> There's no way to communicate to the user that the value is potentially
> stale though - so that seems likely to /really/ confuse a user & lead them
> way down the garden path, potentially.

I don't think I agree with that perspective. In the tradeoff between giving the user more information that might be potentially misleading vs. no information at all, I'd prefer to give more information. Even if LLVM moves some stores around and the debugger prints uninitialized garbage for my local variables, I can step around until it gets initialized and then stare at the assembly to figure out what happened. That's a pretty normal workflow when debugging optimized code, IMO.


(In reply to Adrian Prantl from comment #4)
> We probably want both.
> 
> My favorite example is this:
> 
> void g(int*);
> void f(int i) {
>   g(&i); // alloca is live until here
>   if (i == 0) { // end of alloca live range
>      ...  // i is now a constant, but without lowering to dbg.value we can't
> capture that and would still point to the stack slot that may even have been
> reused by now.
>   }
> }

I think we could fix this by modifying stack coloring and GVN. For the changes to stack coloring, consider this simpler example:

void g(restrict int *);
int getint(void);
void f(int i) {
  g(&i);
  int j = getint();
  g(&j);
}

If stack coloring reuses i's slot for j, maybe it could update the debug info to shrink the live range of i. I don't think we can do this today, because our frame index table uses the labels inserted for the scope that ultimately come from the DILocations stapled on MachineInstrs.

GVN could also help in the original example by inserting llvm.dbg.value i == 0 calls when it does that predication. Again, we'd have to learn how to mix dbg.declare with dbg.value, but I think we'll need to do something like that in the long run.
Comment 6 Adrian Prantl 2017-08-10 11:08:10 PDT
> I don't think I agree with that perspective. In the tradeoff between giving the user more information that might be potentially misleading vs. no information at all, I'd prefer to give more information. 

I strongly disagree with this statement. If what the debugger reports cannot be trusted  in *some* cases, then everything the debugger presents to the user becomes worthless, because the user won't be able to tell the difference between information that is correct and information that may be correct.
Comment 7 David Blaikie 2017-08-10 12:06:30 PDT
(In reply to Adrian Prantl from comment #6)
> > I don't think I agree with that perspective. In the tradeoff between giving the user more information that might be potentially misleading vs. no information at all, I'd prefer to give more information. 
> 
> I strongly disagree with this statement. If what the debugger reports cannot
> be trusted  in *some* cases, then everything the debugger presents to the
> user becomes worthless, because the user won't be able to tell the
> difference between information that is correct and information that may be
> correct.

Yeah - while there might be cases where the answer is unclear (due to optimizations, code gets shuffled around, there may be uncertainty around what the value of a variable is at this instruction) & I can understand debating whether there's a "good enough" answer in some of those cases.

But in cases where it's objectively incorrect - the value's moved into a register, the register is mutated to reflect the new value but the memory location still holds the old value - I don't think it's a good idea to consider that anything other than a bug to be fixed.

It's not so much the user getting "uninitialized garbage" they can easily identify, ignore, & move on from. But about them getting answers that are close enough to be plausible & then spending a lot more time trying to understand a problem that doesn't exist because the debugger told them it did.
Comment 8 Reid Kleckner 2017-08-14 10:09:37 PDT
(In reply to Adrian Prantl from comment #6)
> I strongly disagree with this statement. If what the debugger reports cannot
> be trusted  in *some* cases, then everything the debugger presents to the
> user becomes worthless, because the user won't be able to tell the
> difference between information that is correct and information that may be
> correct.

I think there are cases where we will have to accept stale variable values. I don't think we should treat it as an absolute goal to never give incorrect values. Otherwise we burden *every* LLVM pass that has the ability to delete memory writes with the responsibility of updating debug info. I think 90% of memory updates are eliminated by a few core passes: GVN and instcombine. If we can teach them to emit dbg.value when deleting stores to static allocas with debug info, we will have a reasonably good debug experience. There is a long tail of passes like memcpyopt that delete stores that I don't expect to preserve debug info. We can invent a way for them to say "I deleted this dead memcpy, so the value of the variable is unavailable at this point, and becomes available again here (dbg.value+undef?)", but I don't think we'll be able to maintain that for all passes.

It's also worth considering that we delete dead stores to heap objects, so the user can already observe impossible program states in the debugger. We can't just put the blinders on and say "sorry we optimized the program, nothing is certain". The user is already debugging an optimized program, they shouldn't trust anything it says, and our existing user base seems to already know that.
Comment 9 David Blaikie 2017-08-14 10:23:14 PDT
(In reply to Reid Kleckner from comment #8)
> The user is already debugging an optimized program,
> they shouldn't trust anything it says, and our existing user base seems to
> already know that.

What sort of feedback from users is available on this?

& there's also a risk that the more unreliable these things get, the less users trust the answers they get, the less the use it because it's untrustworthy.

I imagine there are fairly few cases where writes to heap values are optimized away in a way that's especially user visible. Function call boundaries happen often and are a pretty strong impediment to alias analysis for heap values, etc.
Comment 10 Zachary Turner 2017-08-14 10:40:48 PDT
(In reply to Adrian Prantl from comment #6)
> > I don't think I agree with that perspective. In the tradeoff between giving the user more information that might be potentially misleading vs. no information at all, I'd prefer to give more information. 
> 
> I strongly disagree with this statement. If what the debugger reports cannot
> be trusted  in *some* cases, then everything the debugger presents to the
> user becomes worthless, because the user won't be able to tell the
> difference between information that is correct and information that may be
> correct.

I strongly disagree with this statement.  I think it's an overly idealistic view of debugging optimized code.  Nobody *wants* to debug optimized code, because it's a pain. And the reason it's a pain is *because* the original source code doesn't map well to the generated machine code.  At the same time, the compiler has more information than the user does about the transformations that happened, and for the compiler to say "I can't be 100% sure that this information is 100% accurate, therefore I'm not even going to *try* to help you" seems like the wrong approach.  

Generating debug info is not like generating binary code, where it is either provably correct or provably incorrect.  The entire point of debug info is to help the user as much as possible.  By generating nothing, we are not helping anyone, and we may even be making it impossible for a user to track down a bug.  By generating something, with the caveat that it may be stale, we at least point the user in a direction.  If it's wrong, then they're no worse off than when they started.  But if it's right, then we saved them a ton of time.

Anyone debugging optimized code is goign to be going back and forth between the assembly and source view anyway, and should be able to confirm that the debugger is reporting accurate information.
Comment 11 Reid Kleckner 2017-08-14 10:49:50 PDT
(In reply to David Blaikie from comment #9)
> (In reply to Reid Kleckner from comment #8)
> > The user is already debugging an optimized program,
> > they shouldn't trust anything it says, and our existing user base seems to
> > already know that.
> 
> What sort of feedback from users is available on this?
> 
> & there's also a risk that the more unreliable these things get, the less
> users trust the answers they get, the less the use it because it's
> untrustworthy.

Every build system I've ever encountered has at least two configurations: debug and release, even if they don't go by those names. The ubiquity of those configurations reflects the doubts that users have about the quality of optimized debug info.

> I imagine there are fairly few cases where writes to heap values are
> optimized away in a way that's especially user visible. Function call
> boundaries happen often and are a pretty strong impediment to alias analysis
> for heap values, etc.

This logic also applies to escaped locals to some extent. Locals that cannot be promoted to SSA are modeled as "read/write" at every external function call. Alias analysis, of course, is a lot stronger, so things like GVN do fire in practice.
Comment 12 David Blaikie 2017-08-14 11:21:52 PDT
(In reply to Zachary Turner from comment #10)
>  If it's wrong, then they're no worse off than when they started.

This is the bit where I think there's a lot of/strong disagreement.

If the debugger tells someone the wrong answer & the user doesn't know it's wrong - it may cost them a lot of time going down the wrong debugging path, trying to understand why that answer is happening in their program (when it isn't).

So, yes, I do think it's more costly to give the wrong answer than no answer. How much more costly (how many wrong answers would be worth it to get some right answers instead of no answers)? Not sure.

I think there is a valid tradeoff there, especially as an intermediate state, where we say "yeah, this technique gives the wrong answer sometimes but gets us a whole lot more right answers (rather than no answer) - /and/ we have a path forward to address the wrong answers it creates (& that they are bugs and should be treated as such)".
Comment 13 David Blaikie 2017-08-14 11:27:08 PDT
(In reply to David Blaikie from comment #12)
> (In reply to Zachary Turner from comment #10)
> >  If it's wrong, then they're no worse off than when they started.
> 
> This is the bit where I think there's a lot of/strong disagreement.
> 
> If the debugger tells someone the wrong answer & the user doesn't know it's
> wrong - it may cost them a lot of time going down the wrong debugging path,
> trying to understand why that answer is happening in their program (when it
> isn't).
> 
> So, yes, I do think it's more costly to give the wrong answer than no
> answer. How much more costly (how many wrong answers would be worth it to
> get some right answers instead of no answers)? Not sure.
> 
> I think there is a valid tradeoff there, especially as an intermediate
> state, where we say "yeah, this technique gives the wrong answer sometimes
> but gets us a whole lot more right answers (rather than no answer) - /and/
> we have a path forward to address the wrong answers it creates (& that they
> are bugs and should be treated as such)".

(In reply to Reid Kleckner from comment #11)
> (In reply to David Blaikie from comment #9)
> > (In reply to Reid Kleckner from comment #8)
> > > The user is already debugging an optimized program,
> > > they shouldn't trust anything it says, and our existing user base seems to
> > > already know that.
> > 
> > What sort of feedback from users is available on this?
> > 
> > & there's also a risk that the more unreliable these things get, the less
> > users trust the answers they get, the less the use it because it's
> > untrustworthy.
> 
> Every build system I've ever encountered has at least two configurations:
> debug and release, even if they don't go by those names. The ubiquity of
> those configurations reflects the doubts that users have about the quality
> of optimized debug info.

I think it's reasonable to expect that debugging optimized programs will be harder - I don't think that necessarily means expecting the answers to be wrong. Code reordering (especially now that we're dropping debug locations when code moves between blocks) can be confusing, values may be optimized away, etc. Yeah, it's harder & interacting with code that behaves more like the source code is easier.

I don't think that necessarily implies users are fine with/expect the debugger to give answers that are wrong.

> > I imagine there are fairly few cases where writes to heap values are
> > optimized away in a way that's especially user visible. Function call
> > boundaries happen often and are a pretty strong impediment to alias analysis
> > for heap values, etc.
> 
> This logic also applies to escaped locals to some extent. Locals that cannot
> be promoted to SSA are modeled as "read/write" at every external function
> call. Alias analysis, of course, is a lot stronger, so things like GVN do
> fire in practice.

Right - and there it's pretty noticeable as you're saying and would be great to improve.

I think there's substantial overlap in goals/desires here, feels like this is turning into a philosophical/principle stand when maybe it doesn't need to be? There's a technical tradeoff here - in both extremes ("everything's optimized away, we don't know where anything is" and "every variable has the erroneous value 3") it wouldn't be any good for a user. And somewhere there are practical tradeoffs around accuracy and usability for users - and a place from which improvements on both sides (removing erroneous values, adding info about values that were previously unknown) would be valuable.

I'm not sure exactly how to make that tradeoff judgement, but I think it's probably more helpful to look at it from that perspective.
Comment 14 Sigurður Ásgeirsson 2017-08-14 11:35:35 PDT
(In reply to David Blaikie from comment #9)
> (In reply to Reid Kleckner from comment #8)
> > The user is already debugging an optimized program,
> > they shouldn't trust anything it says, and our existing user base seems to
> > already know that.
> 
> What sort of feedback from users is available on this?

I'm a long-time user of debugging tools on Microsoft Visual C++ optimized binaries. I can say that we who do this know not to trust what the debugger presents, and often I find e.g. this==nullptr and other anomalies in debugging.
Note that even with perfect debugging information, the debugger will not always be able to unwind the stack correctly. (Think 3rd party modules with no symbols, JIT code with no unwind info, malware with bad intent, etc.).
So, no matter the fidelity of the symbols, it will always be necessary to exercise caution when debugging Windows crashes.
Comment 15 Hal Finkel 2017-08-14 11:38:34 PDT
(In reply to David Blaikie from comment #13)
> (In reply to David Blaikie from comment #12)
> > (In reply to Zachary Turner from comment #10)
> > >  If it's wrong, then they're no worse off than when they started.
> > 
> > This is the bit where I think there's a lot of/strong disagreement.
> > 
> > If the debugger tells someone the wrong answer & the user doesn't know it's
> > wrong - it may cost them a lot of time going down the wrong debugging path,
> > trying to understand why that answer is happening in their program (when it
> > isn't).
> > 
> > So, yes, I do think it's more costly to give the wrong answer than no
> > answer. How much more costly (how many wrong answers would be worth it to
> > get some right answers instead of no answers)? Not sure.
> > 
> > I think there is a valid tradeoff there, especially as an intermediate
> > state, where we say "yeah, this technique gives the wrong answer sometimes
> > but gets us a whole lot more right answers (rather than no answer) - /and/
> > we have a path forward to address the wrong answers it creates (& that they
> > are bugs and should be treated as such)".
> 
> (In reply to Reid Kleckner from comment #11)
> > (In reply to David Blaikie from comment #9)
> > > (In reply to Reid Kleckner from comment #8)
> > > > The user is already debugging an optimized program,
> > > > they shouldn't trust anything it says, and our existing user base seems to
> > > > already know that.
> > > 
> > > What sort of feedback from users is available on this?
> > > 
> > > & there's also a risk that the more unreliable these things get, the less
> > > users trust the answers they get, the less the use it because it's
> > > untrustworthy.
> > 
> > Every build system I've ever encountered has at least two configurations:
> > debug and release, even if they don't go by those names. The ubiquity of
> > those configurations reflects the doubts that users have about the quality
> > of optimized debug info.
> 
> I think it's reasonable to expect that debugging optimized programs will be
> harder - I don't think that necessarily means expecting the answers to be
> wrong. Code reordering (especially now that we're dropping debug locations
> when code moves between blocks) can be confusing, values may be optimized
> away, etc. Yeah, it's harder & interacting with code that behaves more like
> the source code is easier.
> 
> I don't think that necessarily implies users are fine with/expect the
> debugger to give answers that are wrong.

This is consistent with my experience as well. Users get really annoyed when the debugger provides wrong values (even more annoyed than when the debugger provides no values).
Comment 16 Jim Ingham 2017-08-14 11:57:17 PDT
I really disagree with the statement:  

If it's wrong, then they're no worse off than when they started.  But if it's right, then we saved them a ton of time.

It would be too exhausting to use a debugger if you have to keep in the forefront of your mind that every value you look at might very well be wrong. So you put that aside and trust it, and then when you're in the middle chasing down some problem and some variable has a seemingly impossible value you spend hours trying to figure out how it could possibly be that value till you remember that the debugger is a piece of crap and has been lying to you again.  You end up much worse off than when you started because you've lost your train of thought on the wild goose chase the debugger led you, you get to start again realizing that your tools are untrustworthy.

This is not a theoretical opinion on my part.

One difference between gdb & lldb is that when gdb can't unwind a register, it just attributes it to the last known value in a younger frame.  The end result is gdb would confidently declare a value for some variable (usually an argument in the "bt" list) which would be totally bogus.  That caused many many wasted hours over the years, and even people who had been bitten before would fall into the trap of trusting these values because you just can't use the thing if you have to always distrust it.  

I have been on the receiving end when people in this situation have been very forcefully conveying to me that they were much worse off for our having lied to them than if we had just told them we didn't know.

In lldb, if we can't unwind a register we report it as unknown.  People fairly easily understand that not everything is recoverable in optimized code debugging.  From many years supporting folks using the two approaches it is clear lldb's was the right choice.

Also, if lldb knows the places where the values are correct, we can help the user propagate those values to the places they need to stop.  For instance, we can present to them the places they can stop where the value IS trustworthy.  We can even do things like put hidden stops at known good locations and record info behind the user's back to represent when they later stop..  Or we can get lldb to do some assembly analysis to try to figure out where the value moved from it's last good locations, automating the work folks have to do by hand when the debug info isn't helpful.  

OTOH, if all we know is "declared locations are sometimes right" you'd never know when to do this work, and anything you do will be colored by the uncertainty.

If we added a DW_AT_probable_location and only converted it to DW_AT_location when clang is confident of its analysis, that would be okay.  Then you could tell people what to expect.  But forcing people to live constantly under "the values of Damocles" seems an awfully bad way to construct a tools environment.
Comment 17 Jason Molenda 2017-08-14 14:07:10 PDT
To throw two cents in, from the debugger's perspective it's always better to say "I don't know" rather than "this is the value" and give the wrong value.  The former is incredibly annoying to anyone working on optimized code -- I work with several groups that have to always build optimized and debugging that code is way harder than it should be today, I fully understand where the pain is coming from.  But thinking about the failure modes makes it clear what the right answer is.  

If I say "I cannot give you the value of this variable", the dev will either find the value in another stack frame, or they'll resort to reading assembly, find where it is actually still live, and get it.  They will curse the debugger for making their lives so hard (I'd prefer they curse the compiler, but that's not what they do, alas.)

If I hand out a value that is wrong, the developer can waste DAYS of time working on the assumption that this code somehow got this impossible value.  I've gotten those bug reports -- people who, literally, wasted days because the debugger gave out a value that was not correct.  It's so much worse of a failure mode, we've always considered this unacceptable when we've encountered these situations in lldb.  The more you work with users of the debugger and their bug reports, the more you realize the importance of getting this right.

The debugger must be conservative in these situations, as frustrating as that is.  The debug info must be conservative in these situations.  Or if the compiler wants to emit variable locations that are possibly correct, add a different tag in the DWARF for it and debuggers like lldb will be free to ignore it.  

Any debug info that ckauns a variable is accessible at a location, and that location does not have the variable value, is buggy debug info.
Comment 18 Nico Weber 2017-08-15 10:14:58 PDT
One thing where this is hurting us a lot in chrome/win land at the moment is minidumps. Minidumps aren't really a thing on non-Windows, which might be why expectations here are so different.

On Windows, when chrome (or what have you) crashes, you can download a minidump from the crash server, load that into your debugger, and then use that to inspect the state of variables. Users are aware that they look at optimized debug info. Someone told me "Misleading information is mitigated by the fact you can ask the debugger where's it's getting its information (dv /v) and if you see stuff in a register, then you know to start distrusting the values."

With clang, when users ask for the values of things while debugging minidumps, they currently almost always get "no answer" instead of "likely answer is X". This is very different from how things used to work for them, and this makes people unhappy too.

Given the different expectations, would adding this behavior behind a (default-off) flag be a short-term option? Then we can improve optimized debug info asynchronously and see if it's possible to get it to be good enough without having to sometimes show invalid information.
Comment 19 David Blaikie 2017-08-15 10:36:59 PDT
(In reply to Nico Weber from comment #18)
> One thing where this is hurting us a lot in chrome/win land at the moment is
> minidumps. Minidumps aren't really a thing on non-Windows, which might be
> why expectations here are so different.
> 
> On Windows, when chrome (or what have you) crashes, you can download a
> minidump from the crash server, load that into your debugger, and then use
> that to inspect the state of variables. Users are aware that they look at
> optimized debug info. Someone told me "Misleading information is mitigated
> by the fact you can ask the debugger where's it's getting its information
> (dv /v) and if you see stuff in a register, then you know to start
> distrusting the values."

Interestingly it seems like that's the opposite of the likely erroneous information that would happen in the direction discussed here.

Here the issue is that LLVM isn't describing the memory location backing a variable once it copies the value into a register. By adding the stack memory location the likely mistakes are that the value may be stale because a dead store has been eliminated and the value was /only/ in that register (or nowhere at all).

> With clang, when users ask for the values of things while debugging
> minidumps, they currently almost always get "no answer" instead of "likely
> answer is X". This is very different from how things used to work for them,
> and this makes people unhappy too.

*nod* It's clearly a regression for MSVC users switching to Clang, no doubt. And it's clear LLVM can do better - yeah, when "f(&x)" is called, Clang should be describing the location of 'x' without any doubt.

Though I wonder how many false positives ("the value of X is Y" when it's really Z) MSVC has compared to what the proposed direction/fixes have in Clang. Might well be that the users also don't expect as many false positives as this would create in Clang.

> Given the different expectations, would adding this behavior behind a
> (default-off) flag be a short-term option? Then we can improve optimized
> debug info asynchronously and see if it's possible to get it to be good
> enough without having to sometimes show invalid information.

Maybe - though the divergence does make me a touch uncomfortable, it does amount to sort of an A/B test with different implementations and users.

I'm not sure exactly where the tradeoffs are of the proposed fixes & what the path to addressing any gaps would be.

Adrian: Where're you at, philosophically, with all this. Are you of the opinion that no false positive regression is acceptable? Or are some acceptable (with the acknowledgement that they are bugs that should be fixed and should be identified, tracked, have a likely path to resolution, etc) to reduce false negatives ("no location" when there's really an obvious, unambiguous, unavoidable location that should be reported in the DWARF)?

If you feel like no false positive regression is acceptable, then yeah - I guess forking the default here with a flag might be the path forward - then as the gaps in the analysis are filled (tracking when a value moves back into memory), eventually the flag can be removed/ignored as the only thing it would be enabling would be the false positives - reintroducing the "unknown values" when it's truly unknown.
Comment 20 Hal Finkel 2017-08-15 10:44:23 PDT
(In reply to Nico Weber from comment #18)
> One thing where this is hurting us a lot in chrome/win land at the moment is
> minidumps. Minidumps aren't really a thing on non-Windows, which might be
> why expectations here are so different.
> 
> On Windows, when chrome (or what have you) crashes, you can download a
> minidump from the crash server, load that into your debugger, and then use
> that to inspect the state of variables. Users are aware that they look at
> optimized debug info. Someone told me "Misleading information is mitigated
> by the fact you can ask the debugger where's it's getting its information
> (dv /v) and if you see stuff in a register, then you know to start
> distrusting the values."

Why? I thought that minidumps have registers and stack, but lack other memory contents.

> 
> With clang, when users ask for the values of things while debugging
> minidumps, they currently almost always get "no answer" instead of "likely
> answer is X". This is very different from how things used to work for them,
> and this makes people unhappy too.

Heh. So we've trained users to expect broken behavior and now they want it back ;) [this is unfortunately common.]

> 
> Given the different expectations, would adding this behavior behind a
> (default-off) flag be a short-term option? Then we can improve optimized
> debug info asynchronously and see if it's possible to get it to be good
> enough without having to sometimes show invalid information.
Comment 21 Jason Molenda 2017-08-15 10:50:46 PDT
(In reply to Nico Weber from comment #18)

> Given the different expectations, would adding this behavior behind a
> (default-off) flag be a short-term option? Then we can improve optimized
> debug info asynchronously and see if it's possible to get it to be good
> enough without having to sometimes show invalid information.


I am real uncomfortable with this being an option on the producer of debug info (clang).  Now the person who builds a program is deciding how the debugger behaves -- making an assumption about what the debug developer is going to see.  The person debugging a binary may have no idea how the program was compiled.  The person who is compiling the program may not understand the meaning of the clang flag ("-fpossibly-wrong-but-maybe-really-helpful-debug-info?  Yes please!") themselves.

If we want to emit possibly-correct location information in parallel to known-correct location information in the DWARF in a separate list, and add an option to the consumer (lldb) to use possibly-correct location information (with default-off), I'd be OK with that approach.  Then the person enabling the use of this in lldb is responsible for their own results.

I don't think MSVC's behavior is a feature to be emulated, though.  We've had many cases like this over the years, and in lldb we consider showing incorrect values to the user as our worst class of bugs.
Comment 22 Nico Weber 2017-08-15 11:53:49 PDT
(In reply to Jason Molenda from comment #21)
> (In reply to Nico Weber from comment #18)
> 
> > Given the different expectations, would adding this behavior behind a
> > (default-off) flag be a short-term option? Then we can improve optimized
> > debug info asynchronously and see if it's possible to get it to be good
> > enough without having to sometimes show invalid information.
> 
> 
> I am real uncomfortable with this being an option on the producer of debug
> info (clang).  Now the person who builds a program is deciding how the
> debugger behaves -- making an assumption about what the debug developer is
> going to see.  The person debugging a binary may have no idea how the
> program was compiled.  The person who is compiling the program may not
> understand the meaning of the clang flag
> ("-fpossibly-wrong-but-maybe-really-helpful-debug-info?  Yes please!")
> themselves.

We could make it CodeView-only and not touch the DWARF bits. That should address lldb concerns.

I agree with dblaikie that having two codepaths here long-term seems undesirable, but coming up with a grand unified scheme is going to take time, and ideally we wouldn't have to block switching chrome/win to clang on this work.
Comment 23 Adrian Prantl 2017-08-15 13:37:06 PDT
> Adrian: Where're you at, philosophically, with all this. Are you of the opinion that no false positive regression is acceptable? Or are some acceptable (with the acknowledgement that they are bugs that should be fixed and should be identified, tracked, have a likely path to resolution, etc) to reduce false negatives ("no location" when there's really an obvious, unambiguous, unavoidable location that should be reported in the DWARF)?

As I said earlier in the thread I strongly believe that we should not knowingly introduce new false positives (unless there is a way to mark them in the debug info as not trustworthy and debuggers will actually surface that to the user). If the user can't trust *some* of the variable values, *every* value reported by the debugger becomes not trustworthy.

I also think that this discussion is misguided, because I am convinced that there is a lot of lower-hanging fruit to pick that does not involve watering down the debug info with not-to-be-trusted information. We have lots of documented PRs of optimizations dropping dbg.values that can be fixed without introducing any uncertainty and I think we should focus our attention there.

I am also talking with Reid to work out a plan to fix the issue described in this PR without introducing new false positives — this will need a little more design but it is worth taking the time to solve this correctly. We have a lot of the necessary components already in place.
Comment 24 Reid Kleckner 2017-08-15 18:12:43 PDT
Keno wrote an alternative proposal, essentially trying to improve on the inference that instcombine does today: https://gist.github.com/Keno/480b8057df1b7c63c321

I'm proposing that we remove instcombine's conversion of dbg.declare and push that responsibility onto the passes that delete stores.
Comment 25 Reid Kleckner 2017-08-17 15:25:47 PDT
Adrian and I sat down and worked out a long term plan to improve things in this area without knowingly introducing any inaccuracies in variable info. For a long time, we've discussed unifying dbg.declare and dbg.value. We can't mix dbg.declare and dbg.value, so to fix this bug, we should standardize everything on dbg.value, and then we don't need instcombine or any other pass to lossily convert dbg.declare to dbg.value. The steps are:

- Add a flag to clang to always emit dbg.value+DW_OP_deref for local variables instead of dbg.declare. Maybe call it -gdbg-value? Document the new scheme in SourceLevelDebugging.rst.

- Teach all passes that eliminate dead stores (primarily DSE & instcombine) to insert dbg.value calls when deleting stores to allocas with a dominating dbg.value+deref. This logic would be factored into a shared utility, like salvageDebugInfo. One subtlety is that we need to insert dbg.value+deref calls at the store that kills the store being eliminated, indicating that the value once again lives in memory.

- Test the flag, flip the default when it's ready, remove the flag in the next release.

In the long run, this will make it so we start printing variable locations as assembly comments at -O0, which is a nice side benefit.
Comment 26 Adrian Prantl 2017-08-17 15:52:52 PDT
I think we also had on the list:

- Add support for memory locations in LiveDebugValues

We also identified two nice-to-have features (these are things that currently don't work either):

- insert a dbg.value(undef) at the point where a lifetime.end marker is generated (or directly in the stack coloring pass)

- let clang generate allocas for indirectly passed function arguments (e.g., sret)


We also came to the conclusion that — given the input comes from a well-behaved frontend, and we shall document what exactly that means* — it should never be necessary to insert dbg.values for load or call instructions.

*) A well-behaved frontend produces an alloca for a variable and generates a store to the alloca each time the variable is modified. It further generates a dbg.value describing the variable in the alloca as the one true location for the variable. This contract is what allows us to pay attention to dead store elimination (cf. Reid's second point).

By describing allocas with dbg.value we have the opportunity to get a more precise start location for variable liveness, but we need to be careful not to regress the DWARF representation at -O0 to a location list, since this would prevent the debugger from letting the user assign new values to the variable (implicit vs. explicit location description). We could represent stack-allocated variables at -O0 with frame index location + a DW_AT_start_scope attribute to achieve this.
Comment 27 Hal Finkel 2017-08-17 15:58:26 PDT
> instead of dbg.declare. Maybe call it -gdbg-value? Document the new scheme in SourceLevelDebugging.rst.

So you want to give users the option to try it?

I'd rather not introduce temporary user-facing options. Could this just be a -Xclang (-cc1) option?
Comment 28 Reid Kleckner 2017-08-17 18:15:09 PDT
(In reply to Adrian Prantl from comment #26)
> - Add support for memory locations in LiveDebugValues

Yep, I realized I forgot to post that while discussing this in person with Zach.

> We also identified two nice-to-have features (these are things that
> currently don't work either):
> 
> - insert a dbg.value(undef) at the point where a lifetime.end marker is
> generated (or directly in the stack coloring pass)

It's true, stack coloring needs to add DBG_VALUE MIR instructions on lifetime end. I haven't actually been able to craft a C test case that forces confusing stack colorings to occur, so I don't think it's very high priority. (I tried hard! :)

> - let clang generate allocas for indirectly passed function arguments (e.g.,
> sret)

This works, but I filed https://bugs.llvm.org/show_bug.cgi?id=34227 as an alternative to that.

> By describing allocas with dbg.value we have the opportunity to get a more
> precise start location for variable liveness, but we need to be careful not
> to regress the DWARF representation at -O0 to a location list, since this
> would prevent the debugger from letting the user assign new values to the
> variable (implicit vs. explicit location description). We could represent
> stack-allocated variables at -O0 with frame index location + a
> DW_AT_start_scope attribute to achieve this.

Indeed, preserving the -O0 experience is important.

(In reply to Hal Finkel from comment #27)
> So you want to give users the option to try it?

Yes. We don't have a lot of good tools for evaluating debug info quality beyond deploying changes to users and giving them flags to flip. =/ Adrian said he's trying to upstream his tool to calculate DW_AT_location coverage, but it's still an approximation of quality.

> I'd rather not introduce temporary user-facing options. Could this just be a
> -Xclang (-cc1) option?

I'd prefer it if users didn't have to know about -Xclang, but I don't care that much.
Comment 29 Adrian Prantl 2017-08-18 09:53:10 PDT
> I'd rather not introduce temporary user-facing options. Could this just be a
> -Xclang (-cc1) option?


I'd prefer it if users didn't have to know about -Xclang, but I don't care that much.

Since the point of the option is to enable incremental development of the feature until it is ready to become the default, users ideally will never have to interact with the option, because they won't want to turn it on until it's ready for prime time. Driver options are expensive compared to cc1 options, because we typically have to support them indefinitely, so making this (temporary!) option a cc1 option seems right to me.
Comment 30 Paul Robinson 2017-08-21 13:14:37 PDT
Sorry for being late to the party, I've been away a couple of weeks.

There's always a tension between providing "truthful" info and
"potentially useful" info.  IMO the way to build users' trust in
the value of debug info (especially optimized) is to provide truthful 
info.  Half-truths make the debugger into an unreliable tool.
(Personally I use debuggers only as a last resort.)

P.S. Using instructions to convey debug info is a poor design, period.
It causes all kinds of other problems because people writing passes
rarely think about the effect on debug info, and we've fixed a bunch
of cases where instructions/uses/etc were counted inappropriately.
I'd rather be thinking about a design that eliminates all dbg.* 
instructions.
Comment 31 Reid Kleckner 2017-08-21 13:27:44 PDT
(In reply to Paul Robinson from comment #30)
> P.S. Using instructions to convey debug info is a poor design, period.
> It causes all kinds of other problems because people writing passes
> rarely think about the effect on debug info, and we've fixed a bunch
> of cases where instructions/uses/etc were counted inappropriately.
> I'd rather be thinking about a design that eliminates all dbg.* 
> instructions.

I agree, but fundamentally we need to represent some placeholder in the instruction stream. I think without radically changing the MI and IR basic block representations, that means we represent variable tracking info with elements in the linked list, regardless of how we name them. The best we can do is probably to just add filtering instruction iterators that skip dbg instructions.

The use-list issues seem like they have been fixed since 2014, when we added ValueAsMetadata / MetadataAsValue, right? I guess this might still be an issue in MIR.
Comment 32 Paul Robinson 2017-08-21 13:43:44 PDT
Filtering iterators helps.  We still have cases of -g affecting codegen
but all the really egregious cases have been taken care of, at this point.

Eliminating dbg.* instructions is probably a topic for a BoF or something.
I'll just point out that when I first looked at dbg.declare, it seemed
that it added exactly one item of info to what alloca already had, and
that really didn't seem worth having a whole separate instruction.
Another kind of metadata on alloca should have worked just as well.
Comment 33 Chris Jackson 2020-07-20 06:07:09 PDT
Some experimental improvemnts to lowerDbgDeclare brought
this issue to our interest recently, initially looking at issue
https://bugs.llvm.org/show_bug.cgi?id=45667. 

Messges #25 & and #26 on this Bug (34136) outline a decent plan to
begin improving debuginfo for the escaped variables.

We intend to start addressing this issue, is anyone currently working on this,
are there any objections to our now implementing improvements?
Comment 34 Hans Wennborg 2020-07-20 06:37:27 PDT
> We intend to start addressing this issue, is anyone currently working on
> this,
> are there any objections to our now implementing improvements?

Sounds great. I'm not aware of anyone working on it from our side.
Comment 35 Orlando Cazalet-Hyams 2021-01-11 10:28:43 PST
I have just reported an issue which seems relevant to this thread:  https://llvm.org/PR48719.

The bug report shows how we can run into incorrect debug info for an escaped variable when promoting or partially promoting it introduces PHI instructions, combined with SimplifyCFG squashing the PHI block and its preds together.