43530 – Single-stepping through clang-cl code randomly drops into assembly language

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 43530 - Single-stepping through clang-cl code randomly drops into assembly language

Summary: Single-stepping through clang-cl code randomly drops into assembly language

Status:	RESOLVED FIXED

Alias:	None

Product:	clang
Classification:	Unclassified
Component:	-New Bugs (show other bugs)
Version:	unspecified
Hardware:	PC Windows NT

Importance:	P enhancement
Assignee:	Unassigned Clang Bugs

URL:
Keywords:

Duplicates (1):	44300 (view as bug list)
Depends on:
Blocks:

Reported:	2019-10-01 13:20 PDT by Bruce Dawson
Modified:	2020-01-03 16:29 PST (History)
CC List:	8 users (show)

See Also:
Fixed By Commit(s):

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Bruce Dawson 2019-10-01 13:20:44 PDT

While debugging either compiler_proxy.exe or chrome.exe I have noticed that sometimes single-stepping through plain C++ code will suddenly drop me into the disassembly window. The exact details depend on user settings and whether you are debugging a locally built binary or one from the build machines. Some people have auto-switch to disassembly disabled so they just see a warning page saying:

    cpu.cc not found

    You need to find cpu.cc to view the source for the current call stack frame

Viewing details gives me this:

    Locating source for 'c:\src\chromium3\src\base\cpu.cc'. Checksum: MD5 {f2 8b 7a f0 cb f 9d f3 29 47 b3 b6 9c c0 2b 33}
    The file 'c:\src\chromium3\src\base\cpu.cc' exists.
    Determining whether the checksum matches for the following locations:
    1: c:\src\chromium3\src\base\cpu.cc Checksum: MD5 {f2 8b 7a f0 cb f 9d f3 29 47 b3 b6 9c c0 2b 33} Checksum matches.
    The debugger found source in the following locations:
    1: c:\src\chromium3\src\base\cpu.cc Checksum: MD5 {f2 8b 7a f0 cb f 9d f3 29 47 b3 b6 9c c0 2b 33}
    The debugger will use the source at location 1.

There is a Browse and find cpu.cc link but it doesn't work.

It is odd that the debugger says that the source matches and that it will use it but then it doesn't. It *might* be a debugger bug but it seems suspicious that the debugger happily steps through code and then switches to assembly language. It feels like there is a discontinunity in the source mappings. When debugging VC++-generated code I have not seen this behavior.

If I use Ctrl+O to load the specified file it loads but doesn't show any "execution is here" cursor. If I use Ctrl+F11 to switch to assembly mode it asks me to disambiguate and then takes me here (the cmp instruction):

  static_assert(kParameterSize * sizeof(cpu_info) + 1 == base::size(cpu_string),
                "cpu_string has wrong size");

  if (max_parameter >= kParameterEnd) {
001A3C49 3D 04 00 00 80       cmp         eax,80000004h  
001A3C4E 7C 68                jl          base::CPU::Initialize+238h (01A3CB8h)  
// Copyright (c) 2012 The Chromium Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.


Again, no cursor showing that execution is there. Right-clicking and selecting Show Next Statement takes me here - on the first mov instruction:

  return PENTIUM;
}

}  // namespace base

001A3C50 89 C7                mov         edi,eax  
    size_t i = 0;
    for (int parameter = kParameterStart; parameter <= kParameterEnd;
         ++parameter) {
      __cpuid(cpu_info, parameter);
001A3C52 B8 02 00 00 80       mov         eax,80000002h  


If I single-step to the next instruction then Ctrl+F11 works and I can go back to source code with an execution cursor and all is well.

So, it looks like there is a source-to-assembly discontinuity on the instruction at address 001A3C50.

The easiest way to reproduce this is to set a breakpoint on line 255 of base\cpu.cc then single step (F10). That's it.

I reproduced this at Chromium hash 8a9aaaa8c338d22a97442e87518f23a661bef002. I don't know how stable this exact repro is. Here are the gn args I used:

is_component_build = false
is_debug = false
target_cpu = "x86"
enable_nacl = false
dcheck_always_on = true
use_goma = true
blink_symbol_level = 1

Comment 1 Reid Kleckner 2019-10-09 16:15:51 PDT

I was able to reproduce this with base_unittests and will dig into it a bit.

Comment 2 Reid Kleckner 2019-10-09 16:30:21 PDT

It looks like line zero upsets Visual Studio, which is unfortunate, since the DWARF folks have been tirelessly working to use line zero in more places when we don't know the current source location. This is the compiler-generated assembly for that bit of code:

	.cv_loc	20 1 255 0              # ../../base/cpu.cc:255:0
	cmp	eax, -2147483644
	jl	.LBB2_20
.Ltmp87:
# %bb.18:
	.cv_loc	20 1 0 0                # ../../base/cpu.cc:0:0
	mov	edi, eax
.Ltmp88:
	.cv_loc	20 1 259 0              # ../../base/cpu.cc:259:0
	mov	eax, -2147483646
	xor	ecx, ecx
	#APP
	cpuid
	#NO_APP

I was able to make VS do the same thing by artificially setting the line number to zero like this:

#include <stdio.h>
int main() {
  printf("%d\n", __LINE__);
#line 0
  printf("%d\n", __LINE__);
#line 7
  printf("%d\n", __LINE__);
  printf("%d\n", __LINE__);
}

I tried compiling that sample with MSVC, but it elides the "line zero" locations. The line table with VS looks like this:

C:\src\llvm-project\build\t.cpp (MD5: 951FC80309A6E680E37501997E65A1DF)
  0001:000003A4-000003F0, line/addr entries = 5
     2 000003A4      3 000003A8      7 000003C7      8 000003D8      9 000003E9

So, prologue, first printf, then third printf, then fourth, then epilogue.

Comment 3 David Blaikie 2019-10-10 11:45:03 PDT

(In reply to Reid Kleckner from comment #2)
> It looks like line zero upsets Visual Studio, which is unfortunate, since
> the DWARF folks have been tirelessly working to use line zero in more places
> when we don't know the current source location.

I don't think that's necessarily a problem/in tension. If CodeView has no way to represent ambiguous/unlocated instructions - then the line zero can be ignored & the previous instructions location can be propagated as is the behavior for some instructions that come from no particular location (I think the frontend creates some instructions with no location (not location zero, but no location at all) - so this is a supported flow/not a bug that's going to be fixed/etc)

Comment 4 Reid Kleckner 2019-10-10 12:02:53 PDT

I committed r374267 which avoids putting line zero in cv line tables.

(In reply to David Blaikie from comment #3)
> I don't think that's necessarily a problem/in tension. If CodeView has no
> way to represent ambiguous/unlocated instructions - then the line zero can
> be ignored & the previous instructions location can be propagated as is the
> behavior for some instructions that come from no particular location (I
> think the frontend creates some instructions with no location (not location
> zero, but no location at all) - so this is a supported flow/not a bug that's
> going to be fixed/etc)

Well, the more location-less instructions we have, the more important it becomes to implement some kind of data flow analysis to backfill source locations, when it might have been easier to keep them around during optimization. This is what I wrote in the commit message of r374267:

    The fix is incomplete, because it's possible to have a basic block with
    no source locations at all. In this case, we don't emit a .cv_loc, but
    that will result in wrong stepping behavior in the debugger if the
    layout predecessor of the location-less BB has an unrelated source
    location. We could try harder to find a valid location that dominates or
    post-dominates the current BB, but in general it's a dataflow problem,
    and one still might not exist. I left a FIXME about this.

I think what I described there is a problem for DWARF as well. This is a sketch of the what the situation would look like in assembly:

bbA:
  .loc 1
  jmp shared
bbB:
  .loc 2
  jmp shared
bbC:
  .loc 3
  ret
bbD:
  .loc 4
  ret
shared:
  # no loc
  cmp
  jcc bbC
  jmp bbD

In this case, the location-less block 'shared' would pick up the location from it's layout predecessor, bbD, which depends on what block placement decides to do. Layout tries to achieve as much fallthrough as possible, so I guess in practice most line tables end up being smooth enough that we don't notice. Anyway, there's a FIXME in the code for that case, but I think in practice there are very few location-less BBs.

Comment 5 David Blaikie 2019-10-10 12:12:22 PDT

(In reply to Reid Kleckner from comment #4)
> I committed r374267 which avoids putting line zero in cv line tables.
> 
> (In reply to David Blaikie from comment #3)
> > I don't think that's necessarily a problem/in tension. If CodeView has no
> > way to represent ambiguous/unlocated instructions - then the line zero can
> > be ignored & the previous instructions location can be propagated as is the
> > behavior for some instructions that come from no particular location (I
> > think the frontend creates some instructions with no location (not location
> > zero, but no location at all) - so this is a supported flow/not a bug that's
> > going to be fixed/etc)
> 
> Well, the more location-less instructions we have, the more important it
> becomes to implement some kind of data flow analysis to backfill source
> locations, when it might have been easier to keep them around during
> optimization.

Yep - agreed on all counts.

Comment 6 Bruce Dawson 2019-10-10 12:56:19 PDT

Is the VS behavior a code-view limitation or a UI limitation?

Either way it seems like there might be a need for a VS bug. The VS message on this was extremely confusing, presumably because it had found the right file but fell down on the line number and didn't know how to express that.

I'm not sure if we should file a bug saying that their error message is rubbish, or that they should support #line 0 data, or if this is just pointless since clang-cl is fixed.

Thanks for the fix. This was quite painful when stepping through goma code.

Comment 7 Reid Kleckner 2020-01-03 16:29:19 PST

*** Bug 44300 has been marked as a duplicate of this bug. ***