llvm-gsymutil is failing with "child range not contained in parent" error #46501

petrhosek · 2020-08-13T18:55:22Z


Bugzilla Link	47157
Version	trunk
OS	All
CC	@dwblaikie,@JDevlieghere,@walkerkd,@pogo59,@frobtech,@thanm

Extended Description

I haven't yet managed to create a minimal reproducer, so here is a full binary for now until I manage to reduce it: https://storage.googleapis.com/fuchsia-build/basic_fuzzer_test.lzma

To reproduce the issue, simply run:

$ llvm-gsymutil --convert=basic_fuzzer_test --out-file=basic_fuzzer_test.gsym
...
llvm-gsymutil: error: basic_fuzzer_test: child range not contained in parent

This binary is somewhat unusual in that it's a Go fuzzer (compiled with the Go compiler) that calls into C++ test harness (compiled with Clang), so the binary includes an entire Go runtime and the generated DWARF looks somewhat different from DWARF generated by Clang.

The text was updated successfully, but these errors were encountered:

dwblaikie · 2020-08-13T19:50:43Z

running llvm-dwarfdump --verify on this file gives a more informative error related to the "child range not contained in parent" including dumping the relevant DIEs. The first example the verifier prints is:

0x00145adb: DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x000000000014199a "runtime.lockOSThread")
DW_AT_ranges (0x0001b790
[0x000000000007dc1f, 0x000000000007dc2e)
[0x000000000007dc2e, 0x000000000007dc34)
[0x000000000007dc43, 0x000000000007dc44))
DW_AT_call_file ("./runtime/cgocall.go")
DW_AT_call_line (190)

0x00145aea: DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x00000000001419b3 "runtime.dolockOSThread")
DW_AT_ranges (0x0001b7e0
[0x000000000007dc34, 0x000000000007dc43)
[0x000000000007dc47, 0x000000000007dc4e)
[0x000000000007dc4e, 0x000000000007dc52))
DW_AT_call_file ("./runtime/proc.go")
DW_AT_call_line (3675)

And that certainly looks like bogus DWARF to me. The first range in the runtime.doLockOSThread is exactly not contained in the outer, runtime.lockOSThread range list, I think.

llvmbot · 2020-08-13T21:10:23Z

The main questions is should llvm-gsymutil be returning an error, or just a warning, when it encounters invalid DWARF? I am fine with either. There are many LTO binaries that have this issue, and it would be a shame to not generate a gsym if the DWARF has any errors.

dwblaikie · 2020-08-13T21:13:50Z

I don't know of any aspect of LLVM's LTO implementation that could lead to this DWARF being emitted, FWIW. The way scopes are represented makes it pretty impossible to have a nested scope that doesn't have a nested address range, to the best of my knowledge.

llvmbot · 2020-08-13T21:53:14Z

Which LTO? Monolithic LTO of Thin LTO? Or Both?

dwblaikie · 2020-08-13T22:06:10Z

Either - it's baked into the IR representation to the best of understanding of it.

Any given IR instruction may have a debug location, that debug location has a chain of scopes - we only put them inside each other in the DIE tree when they're in that chain, and their address ranges are derived from that scope chain - I don't know of any way a child scope could include an address/range that is not included in its parent scope, due to the nature of this implementation strategy.

(there was a bug recently about instruction bundles (used on Hexagon and Sparc at least) - that results in a correct line table, but missing scopes in the DIE tree - but that wouldn't result in children with addresses not included in their parent so far as I can imagine)

Essentially this applies to any IR (single compilation, Thin or full LTO, JIT, weird hand-crafted IR, etc) - the nature of the way scoping is represented makes it, to the best of my knowledge, fairly impossible (always the possibility of bugs, to be sure - but I have a hard time imagining the shape of the bugs necessary to allow it in this situation) to have a child scope include addresses that are not in the parent scope.

petrhosek · 2020-08-13T23:22:05Z

This looks a like an issue in the Go compiler, so it's something we need to file a bug for with Go maintainers.

Whether llvm-gsymutil should just print a warning or fail, that's a great question. I'm not sure if there's any precedent among LLVM tools, but it'd be nice to make that behavior configurable, for example by having a -strict mode (other suggestions are welcome).

The way I discovered this issue (as well as #46494) is by trying to enable llvm-gsymutil in our build as a post-processing step. Currently, that fails the build because of these issues, and while I hope that we can address those, it may take some time, especially if we need to fix other compilers, and so I'd like to make these errors non-fatal in the meantime even if it means that we aren't going to produce GSYM files for some binaries. We could achieve the same by wrapping llvm-gsymutil in a script that discards the exit code, but having a flag would be nicer.

llvmbot · 2020-08-14T00:10:59Z

This brings up an issue that we have had for a while: compilers and linkers generating DWARF with no checks on the validity of the produced DWARF. llvm-gsymutil is catching these kinds of issues when it parses the DWARF and bringing the issues to our attention. The functionality behind the "llvm-dwarfdump --verify" is available in:

bool DWARFContext::verify(raw_ostream &OS, DIDumpOptions DumpOpts);

So it is easy for tools that are based on llvm/clang sources to integrate this into debug builds of the drivers for the compiler and linker. We will probably need to make some modifications to the verification code to ensure it handles all files (unlinked .o files, linked executables and dynamic libraries) correctly, but it is a great way to verify the DWARF is correct before we pass it on to other tools.

So sounds like this bug should probably be switched over to the Go compiler to fix this DWARF issue?

petrhosek · 2020-08-14T00:54:23Z

This was already filed as an issue against Go over a year ago but so far there hasn't been any response: golang/go#33188

petrhosek · 2020-08-14T00:56:45Z

For reference, this is a workaround implemented by delve: go-delve/delve#1807

thanm · 2020-08-14T19:06:45Z

A colleague of mine is asking for clarification on whether this particular oddity is definitely illegal according to the DWARF spec. Is there a spot in the specification one could point to that disallows this particular insanity? Thanks.

dwblaikie · 2020-08-14T19:29:26Z

A colleague of mine is asking for clarification on whether this particular
oddity is definitely illegal according to the DWARF spec. Is there a spot
in the specification one could point to that disallows this particular
insanity? Thanks.

Don't think so, no. Believe it falls under the whole "DWARF is a permissive standard" and there are certainly some non-trivial cases where it could make sense. A C++ lambda or other local type with member functions defined inline (couldn't be defined anywhere else) could be represented as a truly nested type (so you'd have subprogram { structure_type { subprogram { } }) but it might not make sense to describe the local type's member functions as being within the range outer subprogram - they are distinct functions.

dwblaikie · 2020-08-14T19:30:32Z

(that case might not be warned on by llvm-dwarfdump --verify, though... maybe - I mean, I don't think GCC or LLVM produce that sort of DWARF anyway so any implementation there might be accidental)

llvmbot · 2020-08-14T23:04:46Z

The other thing to think about is what clients that use this DWARF will expect. For example if you want to symbolicate an address to get the function name and scope correctly, clients will typically check the DW_TAG_compile_unit's ranges and see if the address falls within those ranges and if the address does, we search all of the DIEs. Most code I have seen will, when it encounters a DW_TAG_subprogram DIEs, just check that address range of that DIE and skip to the sibling of that DW_TAG_subprogram and continuing to search. Accelerator tables in DWARF also just point you to a compile unit that you must linearly search. LLVM code will fail to do the lookup on this DWARF because the code does exactly this:

DWARFContext::DIEsForAddress DWARFContext::getDIEsForAddress(uint64_t Address) {
  DIEsForAddress Result;

  DWARFCompileUnit *CU = getCompileUnitForAddress(Address);
  if (!CU)
    return Result;

  Result.CompileUnit = CU;
  Result.FunctionDIE = CU->getSubroutineForAddress(Address);

  std::vector<DWARFDie> Worklist;
  Worklist.push_back(Result.FunctionDIE);
  while (!Worklist.empty()) {
    DWARFDie DIE = Worklist.back();
    Worklist.pop_back();

    if (!DIE.isValid())
      continue;

    if (DIE.getTag() == DW_TAG_lexical_block &&
        DIE.addressRangeContainsAddress(Address)) {
      Result.BlockDIE = DIE;
      break;
    }

    for (auto Child : DIE)
      Worklist.push_back(Child);
  }

  return Result;
}

So all lookups will fail if your DWARF is as it currently emitted for any addresses in "runtime.dolockOSThread".

pogo59 · 2020-08-17T14:13:14Z

The other thing to think about is what clients that use this DWARF will
expect. For example if you want to symbolicate an address to get the
function name and scope correctly, clients will typically check the
DW_TAG_compile_unit's ranges and see if the address falls within those
ranges and if the address does, we search all of the DIEs. Most code I have
seen will, when it encounters a DW_TAG_subprogram DIEs, just check that
address range of that DIE and skip to the sibling of that DW_TAG_subprogram
and continuing to search.

Hmmm that's making an assumption about nested subprograms that isn't valid
for all languages. The skip-to-sibling works for C because it doesn't have
nested functions, nor did C++ until relatively recently.
But for something like Pascal nested procedures, I'd expect the DIE tree to
reflect the lexical parentage, while the generated-code range of the inner
procedure would not be contained within the generated-code range of its parent.
The local class and lambda cases in C++ should follow a similar pattern IMO.

The situation is different for inlined subprograms, which obviously should
be contained within the caller's code range. But that's different from the
lexical-parent case.

dwblaikie · 2020-08-17T17:14:27Z

The other thing to think about is what clients that use this DWARF will
expect. For example if you want to symbolicate an address to get the
function name and scope correctly, clients will typically check the
DW_TAG_compile_unit's ranges and see if the address falls within those
ranges and if the address does, we search all of the DIEs. Most code I have
seen will, when it encounters a DW_TAG_subprogram DIEs, just check that
address range of that DIE and skip to the sibling of that DW_TAG_subprogram
and continuing to search.

Hmmm that's making an assumption about nested subprograms that isn't valid
for all languages. The skip-to-sibling works for C because it doesn't have
nested functions, nor did C++ until relatively recently.
But for something like Pascal nested procedures, I'd expect the DIE tree to
reflect the lexical parentage, while the generated-code range of the inner
procedure would not be contained within the generated-code range of its
parent.
The local class and lambda cases in C++ should follow a similar pattern IMO.

Yeah, but for now they don't. GCC and LLVM always put their subprogram definitions outside any nesting like that (I think GCC goes further and puts subprogram definitions always as direct children of the compile_unit - won't even define a subprogram in a namespace - instead it'll declare it in the namespace and define it at the root/CU scope), so at least some consumers (including llvm-symbolizer, for instance) rely on this - it's faster, means you can skip whole subtrees when searching for "which DWARF describes this instruction address". It doesn't handle all possible DWARF - but given how permissive DWARF is, I'm not sure that's a practical goal, generally.

(I wonder what GCC's DWARF for the Pascal example you gave is - I'm guessing they do something like they do with C++ Examples and declare the nested function inside the outer function, but define the nested function at the CU/root scope of the DIE tree)

petrhosek · 2020-10-09T22:58:03Z

I wonder if we could adopt the same solution delve did in go-delve/delve#1807 for llvm-gsymutil to make it more tolerant? While it'd be ideal to address this in Go, it might take a while until that happens, and even longer until all binaries are rebuilt using the new compiler.

frobtech · 2020-10-11T01:15:35Z

DWARF is explicitly a loose specification and it's always wise for DWARF consumers to be flexible in their expectations of the input. Rather than demanding the input have a certain shape, always do the best you can discern the most useful interpretation of the input you get. When the input has an unexpected shape that makes it hard to be confident what interpretation is really most useful, then warn about the data but rarely should such a case be a hard error as invalid input data.

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llvm-gsymutil is failing with "child range not contained in parent" error #46501

llvm-gsymutil is failing with "child range not contained in parent" error #46501

petrhosek commented Aug 13, 2020

dwblaikie commented Aug 13, 2020

llvmbot commented Aug 13, 2020

dwblaikie commented Aug 13, 2020

llvmbot commented Aug 13, 2020

dwblaikie commented Aug 13, 2020

petrhosek commented Aug 13, 2020 •

edited

llvmbot commented Aug 14, 2020

petrhosek commented Aug 14, 2020

petrhosek commented Aug 14, 2020

thanm commented Aug 14, 2020

dwblaikie commented Aug 14, 2020

dwblaikie commented Aug 14, 2020

llvmbot commented Aug 14, 2020

pogo59 commented Aug 17, 2020

dwblaikie commented Aug 17, 2020

petrhosek commented Oct 9, 2020

frobtech commented Oct 11, 2020

llvm-gsymutil is failing with "child range not contained in parent" error #46501

llvm-gsymutil is failing with "child range not contained in parent" error #46501

Comments

petrhosek commented Aug 13, 2020

Extended Description

dwblaikie commented Aug 13, 2020

llvmbot commented Aug 13, 2020

dwblaikie commented Aug 13, 2020

llvmbot commented Aug 13, 2020

dwblaikie commented Aug 13, 2020

petrhosek commented Aug 13, 2020 • edited

llvmbot commented Aug 14, 2020

petrhosek commented Aug 14, 2020

petrhosek commented Aug 14, 2020

thanm commented Aug 14, 2020

dwblaikie commented Aug 14, 2020

dwblaikie commented Aug 14, 2020

llvmbot commented Aug 14, 2020

pogo59 commented Aug 17, 2020

dwblaikie commented Aug 17, 2020

petrhosek commented Oct 9, 2020

frobtech commented Oct 11, 2020

petrhosek commented Aug 13, 2020 •

edited