LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 48714 - ld64.lld.darwinnew produces invalid debug info, causing lldb to err with "N_SO in symbol with UID 145239 has invalid sibling in debug map, please file a bug and attach the binary listed in this error"
Summary: ld64.lld.darwinnew produces invalid debug info, causing lldb to err with "N_S...
Status: RESOLVED FIXED
Alias: None
Product: lld
Classification: Unclassified
Component: MachO (show other bugs)
Version: unspecified
Hardware: PC Linux
: P enhancement
Assignee: Jez Ng
URL:
Keywords:
Depends on:
Blocks: 49459
  Show dependency tree
 
Reported: 2021-01-11 08:10 PST by Nico Weber
Modified: 2021-04-07 09:19 PDT (History)
5 users (show)

See Also:
Fixed By Commit(s): rG982e3c05108b606701d99d43098331357d9dd0ca


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nico Weber 2021-01-11 08:10:48 PST
Known issue, but I figured I'd file a bug so that I can link to it. Possible repro, extracted from https://bugs.llvm.org/show_bug.cgi?id=48657#c5 (there are likely way smaller repros):

1. Download https://drive.google.com/file/d/1thKfcfKUMhyJ22HRSjorIKZjH42k3bnZ/view?usp=sharing (warning: large, 1GB compressed, 5.3GB unzipped -- but it links very quickly, less than a second with both linkers).

2. Link as usual (`ld64.lld.darwinnew @response.txt`)

3. Run like e.g. so: `lldb -- ./mksnapshot --turbo_instruction_scheduling --target_os=mac --target_arch=x64 --embedded_src embedded.S --embedded_variant Default --random-seed 314159265 --startup_blob snapshot_blob.bin --native-code-counters --verify-heap`

When loading the lld-linked binary into lldb, it prints many lines looking like

error: mksnapshot N_SO in symbol with UID 145239 has invalid sibling in debug map, please file a bug and attach the binary listed in this error

This doesn't happen with ld.


(Once this is fixed, debug info isn't terribly useful without the actual source files somewhere. Due to https://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html one has to run `settings set target.source-map ../.. actual/local/path/to/src` in lldb even if src files are available locally somewhere.)
Comment 1 Jez Ng 2021-04-06 16:39:15 PDT
Seems like fixing the function size calculation didn't address this problem. I will investigate this soonish.
Comment 2 Jez Ng 2021-04-06 16:40:12 PDT
Subscribing Greg Clayton in case he has ideas.
Comment 3 Greg Clayton 2021-04-06 17:39:39 PDT
So LLDB, when parsing a symbol table, will look for N_SO symbols and it tries to match up a N_SO symbol with a name (source path) to the N_SO symbol without a name.

So for this binary:

$ dsymutil -s a.out 
----------------------------------------------------------------------
Symbol table for: 'a.out' (x86_64)
----------------------------------------------------------------------
Index    n_strx   n_type             n_sect n_desc n_value
======== -------- ------------------ ------ ------ ----------------
[     0] 00000035 0e (     SECT    ) 08     0000   0000000100008008 '__dyld_private'
[     1] 00000044 64 (N_SO         ) 00     0000   0000000000000000 '/Users/gclayton/Documents/src/args/'
[     2] 00000068 64 (N_SO         ) 00     0000   0000000000000000 'main.cpp'
[     3] 00000071 66 (N_OSO        ) 03     0001   0000000060664660 '/Users/gclayton/Documents/src/args/main.o'
[     4] 00000001 2e (N_BNSYM      ) 01     0000   0000000100003ef0
[     5] 0000009b 24 (N_FUN        ) 01     0000   0000000100003ef0 '_main'
[     6] 00000001 24 (N_FUN        ) 00     0000   0000000000000084
[     7] 00000001 4e (N_ENSYM      ) 01     0000   0000000000000084
[     8] 00000001 64 (N_SO         ) 01     0000   0000000000000000
[     9] 00000002 0f (     SECT EXT) 01     0010   0000000100000000 '__mh_execute_header'
[    10] 00000016 0f (     SECT EXT) 01     0000   0000000100003ef0 '_main'
[    11] 0000001c 01 (     UNDF EXT) 00     0200   0000000000000000 '_printf'
[    12] 00000024 01 (     UNDF EXT) 00     0200   0000000000000000 'dyld_stub_binder'

LLDB will simplify the symbol table like so:

(lldb) image dump symtab a.out
Symtab, file = /Users/gclayton/Documents/src/args/a.out, num_symbols = 7:
               Debug symbol
               |Synthetic symbol
               ||Externally Visible
               |||
Index   UserID DSX Type            File Address/Value Load Address       Size               Flags      Name
------- ------ --- --------------- ------------------ ------------------ ------------------ ---------- ----------------------------------
[    0]      1 D   SourceFile      0x0000000000000000                    Sibling -> [    3] 0x00640000 /Users/gclayton/Documents/src/args/main.cpp
[    1]      3 D   ObjectFile      0x0000000060664660                    0x0000000000000000 0x00660001 /Users/gclayton/Documents/src/args/main.o
[    2]      5 D X Code            0x0000000100003ef0                    0x0000000000000084 0x000f0000 main
[    3]      0     Data            0x0000000100008008                    0x0000000000000008 0x000e0000 _dyld_private
[    4]      9   X Data            0x0000000100000000                    0x0000000000003ef0 0x000f0010 _mh_execute_header
[    5]     11     Trampoline      0x0000000100003f74                    0x0000000000000006 0x00010200 printf
[    6]     12   X Undefined       0x0000000000000000                    0x0000000000000000 0x00010200 dyld_stub_binder


Note that the UserID column refers to the original symbol index. Since the mach-o symbol table has so many duplicate symbol entries for something (like '_main' is described by symbols 5, 6 and 10, but LLDB will make only a single symbol for it in the symbol table that LLDB uses.

We see that the first symbol represents the N_SO:
[    0]      1 D   SourceFile      0x0000000000000000                    Sibling -> [    3] 0x00640000 /Users/gclayton/Documents/src/args/main.cpp

It points to a sibling symbol (as we see from "Sibling -> [    3]"), or the first symbol that doesn't belong to the N_SO. This lets us simplify the symbol table that LLDB uses, but it still maintains the original scoping where all symbols from the first N_SO with a path and the last N_SO with no name create a scope where everything inside belongs to the source file.

So a few things could cause this:
- you have a N_SO in your execute with a name that isn't followed by a N_SO with no name (LLD bug)
- you have have a valid N_SO in your executable with a name and you have a N_SO with no name, but it is the last symbol in the symbol table (LLDB bug) and LLDB is complaining. The error message points to the N_SO with a UID of 145239, so that means if you dump the symbol table of the binary that causes this error, using "dsymutil -s /path/to/binary", then look for the symbol with index 145239, that should point you to the N_SO symbol that doesn't have a N_SO symbol after it with no name, or it might be that the N_SO symbol is the last symbol in the symbol table (this would be very rare as all local symbols usually come first in the mach-o symbol table, followed by exported symbols and then by undefined symbols
Comment 4 Jez Ng 2021-04-06 18:23:57 PDT
Ohhh. Our intended N_SOs with no name had a string index of zero, but because of D89639, it meant that they were pointing to a string with a single space, rather than an empty string. This was not at all obvious with llvm-nm because it doesn't quote strings, so a single space is indistinguishable from the empty string. But it's a lot more obvious with `dsymutil -s` since that *does* quote its strings.

Thanks Greg!!
Comment 5 Greg Clayton 2021-04-06 20:58:29 PDT
The reason I originally wrote "dsymutil -s" in the first dsymutil was so I could see exactly what was in the mach-o symbol table and the quotes helped way back when. Glad you figured it out from my explanation!