Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[linux] debug information generated by clang much larger than gcc's, making linking clang objects 20% slower than gcc #7926

Closed
llvmbot opened this issue Jul 2, 2010 · 80 comments
Labels
bugzilla Issues migrated from bugzilla clang Clang issues not falling into any other category

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 2, 2010

Bugzilla Link 7554
Resolution FIXED
Resolved on Nov 07, 2018 00:22
Version trunk
OS Linux
Reporter LLVM Bugzilla Contributor
CC @asl,@chandlerc,@lattner,@dwblaikie,@echristo,@zmodem,@nico

Extended Description

I am looking into why gcc Chrome is ~600mb while clang builds of Chrome are 1.7gb. (Both with debugging symbols.) I've attached the preprocessed version of one of the files that differs the most between the two compilers.

With the same input file, I can generate two output files:
g++ -pthread -O0 -g -c -o profile.gcc profile.ii
../llvm/Release/bin/clang++ -pthread -O0 -g -c -o profile.gcc profile.ii

Their sizes differ significantly:
-rw-r--r-- 1 evanm evanm 4709032 2010-07-01 15:30 profile.clang
-rw-r--r-- 1 evanm evanm 2570352 2010-07-01 15:30 profile.gcc

gzipped file attached.

llvm$ svn info . tools/clang/ | grep Revision
Revision: 107405
Revision: 107405

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 2, 2010

Er, that second command line should use "-o profile.clang"

../llvm/Release/bin/clang++ -pthread -O0 -g -c -o profile.clang profile.ii

-rw-r--r-- 1 evanm evanm 4835656 2010-07-01 15:44 profile.clang

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 2, 2010

g++ -v?

Is the size due to debug info? What are the sizes with -g0?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 2, 2010

preprocessed source

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 2, 2010

$ gcc --version
gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 2, 2010

With -g0, they are pretty close:

$ ls -l profile.clang profile.gcc
-rw-r--r-- 1 evanm evanm 649592 2010-07-01 15:55 profile.clang
-rw-r--r-- 1 evanm evanm 615512 2010-07-01 15:55 profile.gcc

@lattner
Copy link
Collaborator

lattner commented Jul 5, 2010

Devang, this looks like an example of where Clang is producing much more bloated debug info for c++ code than GCC is. Can you investigate?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 6, 2010

My g++-4.2 does not compile attached preprocessed source file. It'd be great if you could attach an example that I can compile using g++-4.2 for comparison. Otherwise, it'd be easier to analyze size bloat if there is a smaller test case.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 31, 2010

Attached a hacked up version of the source which will build with gcc-4.2 (although the code wouldn't work properly, it should be fine for investigating debug info).

On Darwin, Clang is actually doing better than GCC -- it seems like gcc-4.4 has made improvements in this area.

ddunbar@giles:tmp$ xclang -v
Apple clang version 2.0 (trunk 109497)
Target: x86_64-apple-darwin10
Thread model: posix
ddunbar@giles:tmp$ xclang -g -c -o t.clang.o t.ii
ddunbar@giles:tmp$ gcc -g -c -o t.gcc.o t.ii
ddunbar@giles:tmp$ ls -l t..o
ls -l t.
.o
-rw-r--r-- 1 ddunbar wheel 4463664 Jul 31 13:18 t.clang.o
-rw-r--r-- 1 ddunbar wheel 5065984 Jul 31 13:18 t.gcc.o
ddunbar@giles:tmp$

I think we are going to need a more reduced test case to investigate more thoroughly...

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 31, 2010

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 31, 2010

In building Chrome with clang vs gcc and comparing files pair-wise, the size difference distribution was all over -- some were smaller with clang, some larger. The summed result though is a binary larger by a factor of a 2x.

I grabbed this file because it exhibited the largest difference, so I had hoped it would have made the problem more obvious (e.g. maybe it's just one huge symbol, or maybe each symbol is just a little bit larger). I can provide any of the other files, but I'm not sure whether they'll help much.

I guess what I'm saying is: I'd like to help, please give me advice on how.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jul 31, 2010

I agree with your approach, starting with the file with the biggest difference is a good place. The problem is that we can't reproduce the issue on Darwin with GCC 4.2. If you attach the .s files generated with GCC 4.4 and Clang on your platform, Devang might be able to tell which bits of debug info GCC is leaving out.

Another approach would be to see if the size difference manifests on Darwin using GCC 4.2. If so, and you can grab the .o file with the biggest difference there, we can probably do a better job of investigating.

p.s. Does Chrome built with Clang work?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Aug 1, 2010

Adding Nico, who has successfully built on Mac.

Nico: to find files with large differences, I did something like
find gcc_output -name '*.o' | xargs ls -l | awk "{...}' > gcc-sizes
to get a list of 'filename size' pairs, and then the same thing for clang, and then used the coreutils 'join' to merge those lists, and then another awk pass to find the largest difference. Dunno if GNU coreutils are available on Mac, but I'm sure you can figure something out.

I might be able to try a build on an Ubuntu Hardy machine (also gcc 4.2), but my recollection is that there are other problems there like
http://llvm.org/bugs/show_bug.cgi?id=6379

Chrome on Clang: yeah, it runs! On-page images come out slightly corrupted (gonna be annoying to track down) and there's one last v8 patch that is some template trickiness that they want me to perftest on gcc/MSVC to make sure I don't regress there, and I haven't gotten around to it.

@nico
Copy link
Contributor

nico commented Aug 2, 2010

(Chrome/Mac on Clang: Main binary builds and runs in Debug, but fails to link in Release. unit_tests, the binary which contains most of chrome's automated tests, builds in Debug but reports many test failures when build with clang and eventually crashes.)

@nico
Copy link
Contributor

nico commented Sep 28, 2010

FWIW, on OS X they're in the same ballpark, clang's ~11% bigger:

hummer:src thakis$ ls -l clang/sym/Debug/Chromium\ Framework.framework/Versions/Current/Chromium\ Framework
-rwxr-xr-x 1 thakis eng 168898096 Sep 22 14:16 clang/sym/Debug/Chromium Framework.framework/Versions/Current/Chromium Framework
hummer:src thakis$ ls -l xcodebuild/Debug/Chromium\ Framework.framework/Versions/Current/Chromium\ Framework
-rwxr-xr-x 1 thakis eng 150708884 Sep 23 12:36 xcodebuild/Debug/Chromium Framework.framework/Versions/Current/Chromium Framework

@llvmbot
Copy link
Collaborator Author

llvmbot commented Sep 28, 2010

Here are the ten largest sections and their from Linux Chrome. (Ignore the first number in each line, that's just the section number).

gcc 4.4.3:
34 .debug_info 332565787
37 .debug_pubnames 64351219
40 .debug_str 44044988
13 .text 36689692
36 .debug_loc 2637699
17 .rodata 24766708
35 .debug_line 19909279
41 .debug_frame 15068712
33 .debug_abbrev 8697456
39 .debug_ranges 4837592

clang:
31 .debug_info 2007297046
35 .debug_line 60710971
11 .text 50932856
30 .debug_frame 30143604
12 .rodata 20520661
25 .eh_frame 1464953
37 .debug_pubnames 6682424
32 .debug_abbrev 6138321
15 .eh_frame_hdr 2925708
21 .data 1863576

Generated with:
objdump -h ./out/Debug/chrome | perl -pe 's/([0-9a-f]{8}).*/hex($1)/eg' | sort -k3 -n | tail -10 | tac

@llvmbot
Copy link
Collaborator Author

llvmbot commented Sep 28, 2010

Er, I meant:
"Here are the ten largest sections and their size from Linux Chrome."

ls -lh says gcc produces 589M binary, while clang produces 2.1G binary.

@zmodem
Copy link
Collaborator

zmodem commented Jan 12, 2011

I've been trying to find a reduced test case for this, basically taking the preprocessed profile.cc, ripping out stuff while it still compiles and still generates a significantly larger .o file with Clang than with GCC.

I'm not sure the reasons for the difference in size here is the same as for the difference in size for all of Chrome, but hopefully it could be helpful in some way.

Compiling the attached file like this:

clang++ -g -c /tmp/reduction.ii -o /tmp/a.o.clang

And comparing with GCC (4.2.1 on Darwin, 4.4.3 on Linux) using the same flags yields these file sizes (in bytes):

On x86_64-unknown-linux-gnu:
Clang: 4136
GCC: 2448

On x86_64-apple-darwin10:
Clang: 3224
GCC: 3528

(This was using Clang built from r123315.)

So it seems that Clang generates a larger .o file than GCC on Linux. On Darwin it seems about the same. Not using the -g option makes them come out about the same size.

@zmodem
Copy link
Collaborator

zmodem commented Jan 12, 2011

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 12, 2011

4k seems small enough to objdump -D the files and look for clues.

@nico
Copy link
Contributor

nico commented Jan 18, 2011

chandlerc says that gcc uses a version of dwarf debugging information that has been tuned to be small (dwarf 2?), while clang uses the old and bloaty version (dwarf 1?).

@nico
Copy link
Contributor

nico commented Jan 19, 2011

The debug information causes big .o files, which in turn makes ld very slow. This bug makes chrome builds with clang/linux 5x slower than gcc. For now, we've disabled debug information generation on the clang/linux bots because of this. See http://code.google.com/p/chromium/issues/detail?id=70000

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 19, 2011

I know of two things that can be done to improve clang's debug info size:

  • Enable .cfi_* support. This should help on os X too.
  • Create comdat's. This would make it possible for gnu ld and gold to drop duplicated debug info.

There might be more advantages in dwarf4, but I am not sure.

@chandlerc
Copy link
Member

I suspect a large chunk of this problem will be solved by turning on the .cfi_* sections Rafael. That should in particular help the non-type debug information size. The type debug info size is what would be helped most by dwarf4 (from my faded memory...)

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 19, 2011

Regarding "comment 20", it is not true. llvm emits dwarf3.

Someone with access to linux machine should take a very small test case and see why debug info is bloated 5x (as per claims) on linux but not on darwin.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 19, 2011

Someone with access to linux machine should take a very small test case and see
why debug info is bloated 5x (as per claims) on linux but not on darwin.

Comments #​8 and #​11 indicate that gcc 4.2 (the Darwin one) has behavior similar to clang, but current gcc versions are the ones that produce the problem.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 19, 2011

Someone with access to linux machine should take a very small test case and see
why debug info is bloated 5x (as per claims) on linux but not on darwin.

Comments #​8 and #​11 indicate that gcc 4.2 (the Darwin one) has behavior similar
to clang, but current gcc versions are the ones that produce the problem.

That was even more confusing, sorry.

  • gcc 4.2 has behavior similar to clang, with large outputs
  • current gcc has much smaller debug output

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 19, 2011

Aha.. thanks for clarification.

@nico
Copy link
Contributor

nico commented Jan 22, 2011

Evan: I'm not sure that's completely right. On the chromium waterfall, the linux/clang bot was way slower than the clang/mac bot before I disabled debug information generation. I believe both bots have similar hardware.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Jan 22, 2011

One big difference is that on darwin most of the debug info is not copied from the .o files, so it is not hit as hard by having large debug info.

Are you using gold on linux btw?

@nico
Copy link
Contributor

nico commented Jan 22, 2011

Yes, we use gold.

@nico
Copy link
Contributor

nico commented Feb 7, 2012

Updating the title, since it's no longer 5x as slow.

Here's some up-to-date data: (gcc 4.4, clang from last week – 149419 – with memcpy() codegen bug, chromium r120581).

thakis@yearofthelinuxdesktop:/usr/local/google/chrome/src$ ls -l out_clang/Debug/chrome
-rwxr-x--- 1 thakis eng 1398968296 2012-02-07 11:50 out_clang/Debug/chrome

thakis@yearofthelinuxdesktop:/usr/local/google/chrome/src$ ls -l out_gcc/Debug/chrome
-rwxr-x--- 1 thakis eng 1220239880 2012-02-07 13:43 out_gcc/Debug/chrome

thakis@yearofthelinuxdesktop:/usr/local/google/chrome/src$ objdump -h ./out_clang/Debug/chrome | perl -pe 's/([0-9a-f]{8}).*/hex($1)/eg'| sort -k3 -n | tail -10 | tac
33 .debug_info 971787546
40 .debug_str 85365496
39 .debug_pubtypes 82737857
13 .text 76172952
37 .debug_line 50219067
17 .rodata 18496976
19 .eh_frame 17648412
34 .debug_abbrev 8310060
9 .rela.dyn 5944992
20 .eh_frame_hdr 4633348

thakis@yearofthelinuxdesktop:/usr/local/google/chrome/src$ objdump -h ./out_gcc/Debug/chrome | perl -pe 's/([0-9a-f]{8}).*/hex($1)/eg'| sort -k3 -n | tail -10 | tac
34 .debug_info 652246456
37 .debug_pubnames 111541924
39 .debug_str 78688819
36 .debug_loc 77704123
13 .text 69931528
35 .debug_line 39035120
17 .rodata 34763456
19 .eh_frame 16709788
40 .debug_ranges 16457872
9 .rela.dyn 14992176

@nico
Copy link
Contributor

nico commented Feb 7, 2012

@nico
Copy link
Contributor

nico commented Feb 7, 2012

Preprocessed example file.
Output of the previous script:

3949 kB bigger: out_clang/Debug/obj/chrome/common/common.logging_chrome.o
3647 kB bigger: out_clang/Debug/obj/third_party/WebKit/Source/WebCore/WebCore.gyp/gen/webkit/webcore_bindings.SVGElementFactory.o
3612 kB bigger: out_clang/Debug/obj/chrome/browser/profiles/browser.profile_impl.o
3214 kB bigger: out_clang/Debug/obj/chrome/browser/automation/browser.testing_automation_provider.o
2975 kB bigger: out_clang/Debug/obj/chrome/browser/ui/browser.browser_init.o
2885 kB bigger: out_clang/Debug/obj/chrome/browser/sync/browser.profile_sync_components_factory_impl.o
2882 kB bigger: out_clang/Debug/obj/third_party/WebKit/Source/WebKit/chromium/src/webkit.WebViewImpl.o
2772 kB bigger: out_clang/Debug/obj/chrome/browser/browser.chrome_browser_main.o
2724 kB bigger: out_clang/Debug/obj/third_party/WebKit/Source/WebKit/chromium/src/webkit.WebFrameImpl.o
2717 kB bigger: out_clang/Debug/obj/chrome/browser/browser.chrome_content_browser_client.o

So logging_chrome.cc ( http://code.google.com/codesearch#OAMlx_jo-ck/src/chrome/common/logging_chrome.cc&exact_package=chromium&q=logging_chrome.cc ) is almost 4 MB bigger when built with clang.

Here's the breakdown for that file:

thakis@yearofthelinuxdesktop:/usr/local/google/chrome/src$ objdump -h out_clang/Debug/obj/chrome/common/common.logging_chrome.o | perl -pe 's/([0-9a-f]{8})./hex($1)/eg'| sort -k3 -n | tail -10 | tac
2885 .debug_str 1701546
2878 .debug_info 1111490
2884 .debug_pubtypes 208317
5767 .eh_frame 118168
2882 .debug_line 85517
2887 .text.startup 49914
2875 .text 3462
2879 .debug_abbrev 2641
2877 .bss 1391
4308 .text.ZNSt6vectorIPN9__gnu_cxx15_Hashtable_nodeISt4pairIKjPFvPSsPKN3IPC7MessageES4_EEEESaISD_EE14_M_fill_insertENS0_17__normal_iteratorIPSD_SF_EEmRKSD 1114
thakis@yearofthelinuxdesktop:/usr/local/google/chrome/src$ objdump -h out_gcc/Debug/obj/chrome/common/common.logging_chrome.o | perl -pe 's/([0-9a-f]{8}).
/hex($1)/eg'| sort -k3 -n | tail -10 | tac
1517 .debug_info 322155
3039 .debug_str 279041
3036 .debug_pubnames 264787
3035 .debug_loc 116052
3042 .eh_frame 42816
1518 .debug_line 42710
3038 .debug_ranges 24464
3037 .debug_aranges 24256
1513 .text 20197
1516 .debug_abbrev 2759

I'm attaching the (clang-)preprocessed output of logging_chrome.cc.

@echristo
Copy link
Contributor

echristo commented Feb 8, 2012

Interesting. Thanks for the additional information!

@nico
Copy link
Contributor

nico commented Feb 8, 2012

Here are the current chrome build numbers, from the dupe:

[reply] [-] Description Nico Weber 2012-02-07 13:32:21 CST
object files generated by clang tend to take ~20% longer to link than the ones
generated by gcc, due to chubby/different debug information.

I did incremental builds (touch one file, measure rebuild time – this measures
almost exclusively ld time) of the chrome binary in several scenarios. In debug
builds, the object files generated by clang take up to 6 seconds longer to link
into a final binary (36s instead of 30s, or in with a shared library build, 31s
instead of 27s). In release builds, there's no big difference.

So while compiling with clang is faster, linking the resulting object files
currently takes longer.

Raw numbers (each is the min of 3 runs):

Chrome incremental build times (ninja, gold on by default, debug
builds, gcc4.4, chromeclang)
gcc
touch file in net (net/base/mime_util.cc) 29.4s
touch file in browser (c/b/u/g/chrome_gtk_frame.cc) 32.8s

clang
touch file in net 35.9s
touch file in browser 36.3s

gcc component build (libv8.so instead of libv8.a etc)
touch file in net 26.5s
touch file in browser 8.7s

clang component build
touch file in net 30.8s
touch file in browser 10.6s

(release builds)
gcc
touch file in net 6.4s
touch file in browser 5.9s

clang
touch file in net 6.3s
touch file in browser 6.2s

gcc component
touch file in net 6.7s
touch file in browser 3.1s

clang component
touch file in net 6.6s
touch file in browser 3.2s

…and with that, I'm done spamming this bug for a while.

@echristo
Copy link
Contributor

Hrm. logging_chrome.ii is preprocessed on linux and not on the mac. I'll probably need someone with access to gcc on linux to do a bit of analysis of which types are included and which aren't - and we can try to come up with a "why" and "how" from there.

@echristo
Copy link
Contributor

FWIW I've got the -wi output of logging_chrome.ii from linux. The formatting is... annoying so it's slow going looking through it. It's obviously much larger though.

@nico
Copy link
Contributor

nico commented Mar 9, 2012

I sent echristo a mach-o version of logging_chrome.o (built with clang, and with a local build of gcc4.6). If anybody else wants that, let me know. (It's too big to attach it.)

@echristo
Copy link
Contributor

Not looking at this currently.

@nico
Copy link
Contributor

nico commented Jul 31, 2013

I checked if this has improved. It hasn't. clang's debug binary output is now 2GB (due to chrome growing over time I suppose). gcc4.6's has grown a lot, it's now 1.76GB (still 12% smaller than clang's).

Linking 'chrome' in debug mode after touching a single file takes 1m42s with gcc (warm cache), but 7m11s with clang over 400% slower by now.

@echristo
Copy link
Contributor

First approximation guess would be relocations, but I'll need to get something set up to look at it. I'll move this up my priority list.

@echristo
Copy link
Contributor

echristo commented Aug 2, 2013

As another side note I'm not quite sure what's going on in the debug information in particular that's making linking so slow. It'll be worth looking into.

@dwblaikie
Copy link
Collaborator

First semi-random example of differences between Clang & GCC:

int func(void (*)());
int func(int);

struct foo {
enum { ID = 42 };
static void bar();
};

int i = func(foo::bar); // neither emit 'foo'
// int j = func(foo::ID); // both emit 'foo'

struct bar {
bar() {
func(foo::bar); // neither emit 'foo'
// func(foo::ID); // Clang emits 'foo', GCC does not
}
} b;

This affects all the "ChromeUtilityHostMsg" objects that use a registration system not unlike foo+bar here. See "ChromeUtilityHostMsg_ParseJSON_Succeeded" for example. GCC doesn't produce any debug info for that type, even though the ctor called from the global initializer for g_LoggerRegisterHelperChromeUtilityHostMsg_ParseJSON_Succeeded references the type (both the ID enum, and the Log builder function)

(side note: why do you have the extra LoggerRegisterHelperChromeUtilityHostMsg_ParseJSON_Succeeded class? Why not have a generic registration type with ctor parameters: LoggerRegisterHelper g_LoggerRegisterHelper(ChromeUtilityHostMsg_ParseJSON_Succeeded::ID, ChromeUtilityHostMsg_ParseJSON_Succeeded::Log); ? that would reduce Clang's debug info (you'd drop one out of every pair of these types), though it'd increase GCC's because it would now actually start emitting the debug info for ChromeUtilityHostMsg_ParseJSON_Succeeded)

@dwblaikie
Copy link
Collaborator

Another issue we noticed is that Clang produces debug info for implicit special members that are ODR used, but never codegen'd (because they were frontend inlined*, or used in inline functions that were never called/codegen'd, etc). eg:

struct foo {
int i;
};

void func(foo*);

int main() {
foo f;
func(&f);
}

Clang produces a description of 'foo' with one member 'i' and one subprogram 'foo' (the ctor) for which there is no definition. GCC only describes 'i' and does not describe a ctor 'foo'.

Both compilers, given code such as:

void func(foo&(foo::*)(const foo&));
...
func(&foo::operator=);

produce debug info describing the type 'foo' with a subprogram 'operator=', the implicit default copy assignment operator.

As a minor note - when Clang (either rightly or wrongly) emits the declarations of these implicit members, it provides file and line numbers for those declarations. GCC does not.

Also, Clang produces "DW_AT_accessibility" on all members, even with the default accessibility of the type.

  • perhaps we could use the inline debug info metadata description from the frontend to describe these inlinings to produce more complete debug info

@dwblaikie
Copy link
Collaborator

Similar to the previous point about special members, Clang emits the declaration of all template specializations, even those that are never codegen'd as in this example:

struct foo {
template
void func() {
}
};

inline void func() {
foo().func();
}

int main() {
foo f;
f.func();
}

Since 'func' is never codegen'd (as it is never called), func is never generated either - Clang still emits a debug info description for both func and func. GCC only emits func.

This issue and the implicit special members can both be addressed in a similar/unified way - don't emit those declarations when emitting the type, but add them after the fact if we end up codegening them.

@dwblaikie
Copy link
Collaborator

Probable GCC bug is producing small debug info when it comes to stream-related code such as:

#include

int main() {
std::ifstream f;
return f.bad();
}

The total object file size from GCC is nearly 1/4th that of Clang, but that's because GCC only produced debug info describing the /declaration/ of basic_ifstream. It didn't describe any of the members (including the 'bad()' function called there). Clang does describe this type (& its dependencies) in full, which seems appropriate.

I haven't narrowed this down fully, but it seems to be related to libstdc++'s use of extern template as follows:

extern template class basic_ifstream;

if that line is not present, the full type information is emitted just like Clang. This might seem sort of reasonable, if the full definition of basic_ifstream was emitted in the debug-built binaries of libstdc++, but so far as I can tell, it is not (I only see another declaration there). But perhaps I've done something wrong in that investigation.

If I can come up with a standalone repro, I'll post those details.

@dwblaikie
Copy link
Collaborator

A little more detail on the fstream issue. I've simplified it down to:

struct a {
};

template
struct b : virtual a {
void func() {
}
};

extern template class b;

int main() {
b x;
x.func();
}

In GCC's DWARF: 'b' is emitted as a declaration but it contains the DW_TAG_subprograms of 'b's ctor and dtor (if those are explicit, rather than implicit, it does not include them).

If the "extern template class b" is proceeded by "template class b" then the debug info for 'func' is emitted (I'm not sure what/where/why the problem exists for this in libstdc++'s debug symbol builds - I may've looked at the wrong libs or in the wrong way, or linked to the wrong ones, etc...)

@llvmbot
Copy link
Collaborator Author

llvmbot commented Aug 9, 2013

Referring to comment 67, looking at gcc 4.6 behaviour, gcc does ship all the debug info for an extern template where it's instantiated, not where it's used. So this program:

#include

int main() {
std::ifstream f;
return f.bad();
}

builds a binary that does not have debug info for ifstream, but its debug info can be found in the libstdc++ .so instead.

@dwblaikie
Copy link
Collaborator

First semi-random example of differences between Clang & GCC:

int func(void (*)());
int func(int);

struct foo {
enum { ID = 42 };
static void bar();
};

int i = func(foo::bar); // neither emit 'foo'
// int j = func(foo::ID); // both emit 'foo'

struct bar {
bar() {
func(foo::bar); // neither emit 'foo'
// func(foo::ID); // Clang emits 'foo', GCC does not
}
} b;

This affects all the "ChromeUtilityHostMsg" objects that use a registration
system not unlike foo+bar here. See
"ChromeUtilityHostMsg_ParseJSON_Succeeded" for example. GCC doesn't produce
any debug info for that type, even though the ctor called from the global
initializer for
g_LoggerRegisterHelperChromeUtilityHostMsg_ParseJSON_Succeeded references
the type (both the ID enum, and the Log builder function)

(side note: why do you have the extra
LoggerRegisterHelperChromeUtilityHostMsg_ParseJSON_Succeeded class? Why not
have a generic registration type with ctor parameters: LoggerRegisterHelper
g_LoggerRegisterHelper(ChromeUtilityHostMsg_ParseJSON_Succeeded::ID,
ChromeUtilityHostMsg_ParseJSON_Succeeded::Log); ? that would reduce Clang's
debug info (you'd drop one out of every pair of these types), though it'd
increase GCC's because it would now actually start emitting the debug info
for ChromeUtilityHostMsg_ParseJSON_Succeeded)

Worse than this - even if you switch out the registration system for something more like what I described, something else seems to be getting in the way:

struct base {
virtual void func() {
}
};
struct foo: base {
enum { ID };
};
struct reg {
reg(int);
};
reg r(foo::ID);

GCC emits a declaration for 'foo', not a definition (so GCC's debug info doesn't mention 'base' at all. The absence of a virtual function in 'base' causes this not to happen & the full definition of 'foo' to be emitted, including the DW_AT_inheritance refering to 'base', and the complete 'base' definition.

Given the absence of any key function here, I'm not sure what GCC's deal is. These are polymorphic classes that may be seen in no other TU than this one where their code must be emitted (due to the presence of virtual functions). I don't know how GCC could rely on these types to be emitted in any other TU - and if not, this seems like another GCC bug.

Perhaps someone from GCC can explain how/why this is correct & what the logic is here & we can implement it, but for now I'm suspicious.

I'd like to find a way around this bug or optimization so I can compare Clang v GCC debug info sans this issue & see how much it contributes to the problems with this TU (my theory is that it's the major issue/difference in this TU - the dependencies brought in via the base class & parameter types of these registration objects are substantial & there are many of them) but I'm not entirely sure how to do that. I'll keep experimenting.

@dwblaikie
Copy link
Collaborator

So - a theory to explain Comments 64, 67-70:

GCC is emitting type information only when the vtable (& other virtual 'stuff' - such as virtual base handling) is emitted for any type that has a vtable.

That's why this looks weird for the explicit template instantiation case - it's not specifically targeting that case, it just gets it for free because the type then ends up with a key function (at least one out of line virtual function) & thus the vtable isn't emitted at the call site.

This is a fairly sound idea (one we've considered implementing before) though there are some trivial ways it can fail (whether or not such code is likely to occur in the wild is open for debate):

struct foo {
virtual ~foo();
static void func();
};

int main() {
foo::func();
}

in this case, the virtual functions of 'foo' may never be defined - since no instance of 'foo' is ever constructed, this is valid. Yet the referenced static member will still be uncallable from the debugger because GCC emits no mention of teh 'foo' type at all.

@dwblaikie
Copy link
Collaborator

Just while I've got this here, I've prototyped the "only emit debug info for class definitions along with the vtable for any type with a vtable" idea and here are some numbers:

with the original logging TU:

Strings with baseline Clang: 30919
Strings with improved Clang: 7413
Strings with GCC 4.7: 6601

If we change the registration system so GCC doesn't win by avoiding the intermediate registration types (due to the ID reference being ignored by GCC):

Strings with improved Clang: 6068
Strings with GCC 4.7: 5560

So we get pretty close.

@dwblaikie
Copy link
Collaborator

I'm going to consider this resolved by r188576 as it reduces the debug info for the example down to

If you have other particularly bad examples, please file new bugs/data & I'll investigate.

GCC 4.7:
3034 .debug_str 359705
3028 .debug_info 349411
3030 .debug_loc 146400
3033 .debug_line 43328
3032 .debug_ranges 24592
3031 .debug_aranges 24224
3029 .debug_abbrev 2854

Clang (old):
2885 .debug_str 1807241
2878 .debug_info 1121994
2882 .debug_line 88887
2879 .debug_abbrev 2445

Clang (r188576):
2878 .debug_info 356859
2885 .debug_str 331528
2882 .debug_line 82402
2879 .debug_abbrev 1258

@nico
Copy link
Contributor

nico commented Oct 16, 2013

I finally had the opportunity to check this.

With current trunk chromium, gcc 4.6, and clang r192635, a (statically linked) debug build of chrome is 1.9GB with gcc and 1.3GB with clang. Incremental build times after touching a file in the ui/ directory is 3m55s with clang and 4m02s with gcc. So this looks much better now, thanks!

@dwblaikie
Copy link
Collaborator

Awesome! thanks for all your (everyone on this bug) help with repros, reductions, etc. Sorry it was a bit of a while coming.

@echristo
Copy link
Contributor

Woot.

@nico
Copy link
Contributor

nico commented Oct 18, 2013

As a follow-up (probably not related to this bug): I also measure incremental build times with a component build, and clang is ~10% slower than gcc 4.6 for a full build of chrome (24min vs 22min), and still a good deal slower in incremental builds (1.6s vs 1.2s). So generally I feel that more perf work is needed on linux.

@echristo
Copy link
Contributor

OK, interesting. If you ever get any information for it let me know, otherwise I'll see what I can work up as I get to it.

@echristo
Copy link
Contributor

As an update, a lot of things have been improved and we're seeing build and link times (including incremental) under gcc at this point. Thanks again everyone!

1 similar comment
@echristo
Copy link
Contributor

As an update, a lot of things have been improved and we're seeing build and link times (including incremental) under gcc at this point. Thanks again everyone!

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla clang Clang issues not falling into any other category
Projects
None yet
Development

No branches or pull requests

8 participants