LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 7554 - [linux] debug information generated by clang much larger than gcc's, making linking clang objects 20% slower than gcc
Summary: [linux] debug information generated by clang much larger than gcc's, making l...
Status: RESOLVED FIXED
Alias: None
Product: clang
Classification: Unclassified
Component: -New Bugs (show other bugs)
Version: trunk
Hardware: PC Linux
: P normal
Assignee: Unassigned Clang Bugs
URL:
Keywords:
: 11941 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-07-01 17:35 PDT by Evan Martin
Modified: 2018-11-07 00:22 PST (History)
18 users (show)

See Also:
Fixed By Commit(s):


Attachments
preprocessed source (624.70 KB, application/x-gzip)
2010-07-01 17:51 PDT, Evan Martin
Details
Hacked up profile.ii, which builds with gcc-4.2 (632.92 KB, application/octet-stream)
2010-07-31 15:23 PDT, Daniel Dunbar
Details
Heavily reduced profile.ii (1.08 KB, application/octet-stream)
2011-01-12 10:39 PST, Hans Wennborg
Details
clang readelf output (17.68 KB, text/plain)
2011-04-26 13:00 PDT, Roman Divacky
Details
gcc readelf output (5.03 KB, text/plain)
2011-04-26 13:00 PDT, Roman Divacky
Details
Script to find .o files where clang's output is much bigger. (537 bytes, text/x-python-script)
2012-02-07 15:54 PST, Nico Weber
Details
Preprocessed example file. (504.07 KB, application/zip)
2012-02-07 15:59 PST, Nico Weber
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Evan Martin 2010-07-01 17:35:02 PDT
I am looking into why gcc Chrome is ~600mb while clang builds of Chrome are 1.7gb.  (Both with debugging symbols.)  I've attached the preprocessed version of one of the files that differs the most between the two compilers.

With the same input file, I can generate two output files:
g++ -pthread -O0 -g -c -o profile.gcc profile.ii
../llvm/Release/bin/clang++ -pthread -O0 -g -c -o profile.gcc profile.ii

Their sizes differ significantly:
-rw-r--r-- 1 evanm evanm 4709032 2010-07-01 15:30 profile.clang
-rw-r--r-- 1 evanm evanm 2570352 2010-07-01 15:30 profile.gcc

gzipped file attached.



llvm$ svn info . tools/clang/ | grep Revision
Revision: 107405
Revision: 107405
Comment 1 Evan Martin 2010-07-01 17:40:53 PDT
Er, that second command line should use "-o profile.clang" 

../llvm/Release/bin/clang++ -pthread -O0 -g -c -o profile.clang profile.ii

-rw-r--r-- 1 evanm evanm 4835656 2010-07-01 15:44 profile.clang
Comment 2 Nick Lewycky 2010-07-01 17:42:08 PDT
g++ -v?

Is the size due to debug info? What are the sizes with -g0?
Comment 3 Evan Martin 2010-07-01 17:51:03 PDT
Created attachment 5156 [details]
preprocessed source
Comment 4 Evan Martin 2010-07-01 17:51:15 PDT
$ gcc --version
gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3
Comment 5 Evan Martin 2010-07-01 17:52:27 PDT
With -g0, they are pretty close:

$ ls -l profile.clang profile.gcc
-rw-r--r-- 1 evanm evanm 649592 2010-07-01 15:55 profile.clang
-rw-r--r-- 1 evanm evanm 615512 2010-07-01 15:55 profile.gcc
Comment 6 Chris Lattner 2010-07-04 17:30:48 PDT
Devang, this looks like an example of where Clang is producing much more bloated debug info for c++ code than GCC is.  Can you investigate?
Comment 7 Devang Patel 2010-07-06 13:27:47 PDT
My g++-4.2 does not compile attached preprocessed source file. It'd be great if you could attach an example that I can compile using g++-4.2 for comparison. Otherwise, it'd be easier to analyze size bloat if there is a smaller test case.
Comment 8 Daniel Dunbar 2010-07-31 15:20:42 PDT
Attached a hacked up version of the source which will build with gcc-4.2 (although the code wouldn't work properly, it should be fine for investigating debug info).

On Darwin, Clang is actually doing better than GCC -- it seems like gcc-4.4 has made improvements in this area.
--
ddunbar@giles:tmp$ xclang -v
Apple clang version 2.0 (trunk 109497)
Target: x86_64-apple-darwin10
Thread model: posix
ddunbar@giles:tmp$ xclang -g -c -o t.clang.o t.ii
ddunbar@giles:tmp$ gcc -g -c -o t.gcc.o t.ii
ddunbar@giles:tmp$ ls -l t.*.o
ls -l t.*.o
-rw-r--r--  1 ddunbar  wheel  4463664 Jul 31 13:18 t.clang.o
-rw-r--r--  1 ddunbar  wheel  5065984 Jul 31 13:18 t.gcc.o
ddunbar@giles:tmp$ 
--

I think we are going to need a more reduced test case to investigate more thoroughly...
Comment 9 Daniel Dunbar 2010-07-31 15:23:31 PDT
Created attachment 5304 [details]
Hacked up profile.ii, which builds with gcc-4.2
Comment 10 Evan Martin 2010-07-31 15:39:02 PDT
In building Chrome with clang vs gcc and comparing files pair-wise, the size difference distribution was all over -- some were smaller with clang, some larger.  The summed result though is a binary larger by a factor of a 2x.

I grabbed this file because it exhibited the largest difference, so I had hoped it would have made the problem more obvious (e.g. maybe it's just one huge symbol, or maybe each symbol is just a little bit larger).  I can provide any of the other files, but I'm not sure whether they'll help much.

I guess what I'm saying is: I'd like to help, please give me advice on how.
Comment 11 Daniel Dunbar 2010-07-31 16:38:41 PDT
I agree with your approach, starting with the file with the biggest difference is a good place. The problem is that we can't reproduce the issue on Darwin with GCC 4.2. If you attach the .s files generated with GCC 4.4 and Clang on your platform, Devang might be able to tell which bits of debug info GCC is leaving out.

Another approach would be to see if the size difference manifests on Darwin using GCC 4.2. If so, and you can grab the .o file with the biggest difference there, we can probably do a better job of investigating.

p.s. Does Chrome built with Clang work?
Comment 12 Evan Martin 2010-07-31 17:46:51 PDT
Adding Nico, who has successfully built on Mac.

Nico: to find files with large differences, I did something like
  find gcc_output -name '*.o' | xargs ls -l | awk "{...}' > gcc-sizes
to get a list of 'filename size' pairs, and then the same thing for clang, and then used the coreutils 'join' to merge those lists, and then another awk pass to find the largest difference.  Dunno if GNU coreutils are available on Mac, but I'm sure you can figure something out.

I might be able to try a build on an Ubuntu Hardy machine (also gcc 4.2), but my recollection is that there are other problems there like
  http://llvm.org/bugs/show_bug.cgi?id=6379


Chrome on Clang: yeah, it runs!  On-page images come out slightly corrupted (gonna be annoying to track down) and there's one last v8 patch that is some template trickiness that they want me to perftest on gcc/MSVC to make sure I don't regress there, and I haven't gotten around to it.
Comment 13 Nico Weber 2010-08-02 01:05:06 PDT
(Chrome/Mac on Clang: Main binary builds and runs in Debug, but fails to link in Release. unit_tests, the binary which contains most of chrome's automated tests, builds in Debug but reports many test failures when build with clang and eventually crashes.)
Comment 14 Nico Weber 2010-09-28 13:56:10 PDT
FWIW, on OS X they're in the same ballpark, clang's ~11% bigger:

hummer:src thakis$ ls -l clang/sym/Debug/Chromium\ Framework.framework/Versions/Current/Chromium\ Framework 
-rwxr-xr-x  1 thakis  eng  168898096 Sep 22 14:16 clang/sym/Debug/Chromium Framework.framework/Versions/Current/Chromium Framework
hummer:src thakis$ ls -l xcodebuild/Debug/Chromium\ Framework.framework/Versions/Current/Chromium\ Framework 
-rwxr-xr-x  1 thakis  eng  150708884 Sep 23 12:36 xcodebuild/Debug/Chromium Framework.framework/Versions/Current/Chromium Framework
Comment 15 Evan Martin 2010-09-28 14:01:56 PDT
Here are the ten largest sections and their from Linux Chrome.  (Ignore the first number in each line, that's just the section number).

gcc 4.4.3:
 34 .debug_info   332565787
 37 .debug_pubnames 64351219
 40 .debug_str    44044988
 13 .text         36689692
 36 .debug_loc    26376994
 17 .rodata       24766708
 35 .debug_line   19909279
 41 .debug_frame  15068712
 33 .debug_abbrev 8697456
 39 .debug_ranges 4837592

clang:
 31 .debug_info   2007297046
 35 .debug_line   60710971
 11 .text         50932856
 30 .debug_frame  30143604
 12 .rodata       20520661
 25 .eh_frame     14649532
 37 .debug_pubnames 6682424
 32 .debug_abbrev 6138321
 15 .eh_frame_hdr 2925708
 21 .data         1863576

Generated with:
objdump -h ./out/Debug/chrome | perl -pe 's/([0-9a-f]{8}).*/hex($1)/eg' | sort -k3 -n | tail -10 | tac
Comment 16 Evan Martin 2010-09-28 14:03:05 PDT
Er, I meant:
"Here are the ten largest sections and their *size* from Linux Chrome."

ls -lh says gcc produces 589M binary, while clang produces 2.1G binary.
Comment 17 Hans Wennborg 2011-01-12 10:38:33 PST
I've been trying to find a reduced test case for this, basically taking the preprocessed profile.cc, ripping out stuff while it still compiles and still generates a significantly larger .o file with Clang than with GCC.

I'm not sure the reasons for the difference in size here is the same as for the difference in size for all of Chrome, but hopefully it could be helpful in some way.


Compiling the attached file like this:

clang++ -g -c /tmp/reduction.ii -o /tmp/a.o.clang

And comparing with GCC (4.2.1 on Darwin, 4.4.3 on Linux) using the same flags yields these file sizes (in bytes):

On x86_64-unknown-linux-gnu:
Clang: 4136
GCC:   2448

On x86_64-apple-darwin10:
Clang: 3224
GCC:   3528

(This was using Clang built from r123315.)

So it seems that Clang generates a larger .o file than GCC on Linux. On Darwin it seems about the same. Not using the -g option makes them come out about the same size.
Comment 18 Hans Wennborg 2011-01-12 10:39:02 PST
Created attachment 5997 [details]
Heavily reduced profile.ii
Comment 19 Evan Martin 2011-01-12 10:47:12 PST
4k seems small enough to objdump -D the files and look for clues.
Comment 20 Nico Weber 2011-01-18 11:10:56 PST
chandlerc says that gcc uses a version of dwarf debugging information that has been tuned to be small (dwarf 2?), while clang uses the old and bloaty version (dwarf 1?).
Comment 21 Nico Weber 2011-01-18 23:43:57 PST
The debug information causes big .o files, which in turn makes ld very slow. This bug makes chrome builds with clang/linux 5x slower than gcc. For now, we've disabled debug information generation on the clang/linux bots because of this. See http://code.google.com/p/chromium/issues/detail?id=70000
Comment 22 Rafael Ávila de Espíndola 2011-01-19 07:39:20 PST
I know of two things that can be done to improve clang's debug info size:

* Enable .cfi_* support. This should help on os X too.
* Create comdat's. This would make it possible for gnu ld and gold to drop duplicated debug info.

There might be more advantages in dwarf4, but I am not sure.
Comment 23 Chandler Carruth 2011-01-19 10:06:17 PST
I suspect a large chunk of this problem will be solved by turning on the .cfi_* sections Rafael. That should in particular help the non-type debug information size. The type debug info size is what would be helped most by dwarf4 (from my faded memory...)
Comment 24 Devang Patel 2011-01-19 11:34:00 PST
Regarding "comment 20", it is not true. llvm emits dwarf3.

Someone with access to linux machine should take a very small test case and see why debug info is bloated 5x (as per claims) on linux but not on darwin.
Comment 25 Evan Martin 2011-01-19 11:37:27 PST
(In reply to comment #24)
> Someone with access to linux machine should take a very small test case and see
> why debug info is bloated 5x (as per claims) on linux but not on darwin.

Comments #8 and #11 indicate that gcc 4.2 (the Darwin one) has behavior similar to clang, but current gcc versions are the ones that produce the problem.
Comment 26 Evan Martin 2011-01-19 11:38:24 PST
(In reply to comment #25)
> (In reply to comment #24)
> > Someone with access to linux machine should take a very small test case and see
> > why debug info is bloated 5x (as per claims) on linux but not on darwin.
> 
> Comments #8 and #11 indicate that gcc 4.2 (the Darwin one) has behavior similar
> to clang, but current gcc versions are the ones that produce the problem.

That was even *more* confusing, sorry.
- gcc 4.2 has behavior similar to clang, with large outputs
- current gcc has much smaller debug output
Comment 27 Devang Patel 2011-01-19 12:09:22 PST
Aha.. thanks for clarification.
Comment 28 Nico Weber 2011-01-22 00:19:23 PST
Evan: I'm not sure that's completely right. On the chromium waterfall, the linux/clang bot was way slower than the clang/mac bot before I disabled debug information generation. I believe both bots have similar hardware.
Comment 29 Rafael Ávila de Espíndola 2011-01-22 08:29:55 PST
One big difference is that on darwin most of the debug info is not copied from the .o files, so it is not hit as hard by having large debug info.

Are you using gold on linux btw?
Comment 30 Nico Weber 2011-01-22 13:17:44 PST
Yes, we use gold.
Comment 31 Anton Korobeynikov 2011-02-08 04:20:19 PST
(In reply to comment #23)
> I suspect a large chunk of this problem will be solved by turning on the .cfi_*
> sections Rafael. That should in particular help the non-type debug information
> size. The type debug info size is what would be helped most by dwarf4 (from my
> faded memory...)
One can try to enable .cfi stuff right now. I must admit I haven't tested the emission thoroughly.
Basically, on order to enable .cfi emission it's sufficient to change the exceptiontype inside x86 mcasminfo from DwarfTable to DwarfCFI and recompile llvm + clang.
Comment 32 Roman Divacky 2011-04-26 12:59:44 PDT
I compiled the reduction.ii with clang++ and g++47 and dumped the dwarf tables via "readelf -w", it looks like clang++ is emiting DW_TAGs for subprograms that g++47 is not. Formal parameters and their types must be emitted too thus the debug info is much bigger.

It seems to me that g++47 is not emiting debug info for unused stuff, no idea how it knows whats unused as the cmdline I used is:

clang++|g++47 -c -O0 -g2 reduction.ii && ls -l reduction.o



the readelf -w dumps are attached
Comment 33 Roman Divacky 2011-04-26 13:00:18 PDT
Created attachment 6504 [details]
clang readelf output
Comment 34 Roman Divacky 2011-04-26 13:00:50 PDT
Created attachment 6505 [details]
gcc readelf output
Comment 35 Devang Patel 2011-04-29 12:10:40 PDT
Roman, thanks for analysis and test case reduction. It seems gcc does not emit Profile member GetWebDataService() at all. This means it emits incomplete type info for class Profile. Unless there is a guarantee to have and select complete type info for class Profile at link time, skipping member info may not be a good idea IMO.
Comment 36 Chris Lattner 2011-04-29 12:32:31 PDT
If that method is used in some other translation unit, then it will be emitted there, right?  This seems exactly the same as not emitting debug info for unused "static inline" C functions.
Comment 37 Devang Patel 2011-04-29 12:39:41 PDT
IMO print all Profile members. 

gdb) ptype Profile

It possible that not all Profile members are used in an application. And it is quite possible, I think quite normal, that each translation unit uses different subset of Profile members. The debug info for class Profile should be either complete or just a forward decl such that it is resolved to complete type info in the end.

Is it the case that gcc is omitting GetWebDataService() just because it is an abstract virtual member function ?
Comment 38 Chris Lattner 2011-04-29 12:40:53 PDT
The same argument applies to static and inline functions. I agree that there is a size/quality tradeoff here, but generating MUCH MUCH larger debug info is a bug, not a feature. :)

-Chris
Comment 39 Devang Patel 2011-04-29 12:53:27 PDT
IMO whenever size/quality tradeoff is made, it should never mislead users.

Two points why this is not same as static function tradeoff:

1) In case of static and inline function, we do not expect that it is reasonable to ask "list me all functions you got" and get correct answer.  However, in case of a class, it is reasonable to ask and expect correct answer for "list me all class members".  That's what "ptype" command does in gdb.

2) If the compiler is going to emit potentially distinct type info for a class in each translation unit then, unless there is a link time step to unify them in a complete type info, the debugger may present you random type info for your class depending upon where you hit the breakpoint in project. I do not think it is user friendly.
Comment 40 Rafael Ávila de Espíndola 2011-04-29 13:24:15 PDT
If you really want to emit full types for each translation unit, you might want to consider

http://wiki.dwarfstd.org/index.php?title=COMDAT_Type_Sections

But even then, if we still get a 5x slowdown I don't think it is worth it.
Comment 41 Devang Patel 2011-04-29 13:34:59 PDT
Rafael, we are debating scope of "complete" in  "emitting complete debug info for types that are used". If two incompatible type info for a single class is emitted in two separate translation units then its type signature won't match if the linker wants to eliminate duplicates.

BTW, the doc you linked mentions -femit-struct-debug-baseonly implemented by GNU compilers. Do you know what it is exactly, how widely it is used, is it on by default  etc.. ? 

However, We should be able to avoid debug info for class Expensive in following artificial example, though.
---
class Expensive {
public:
  int a,b,c,d,e;
};

class A {
public:
  int foo()  {
    Expensive e1;
    return e1.a;
  }
  static int i;
};

int A::i = 0;
---
Comment 42 Evan Martin 2011-04-29 14:08:39 PDT
I can't comment on the technical debate here but I thought I'd add that if it can't be made to Just Work, there is an escape hatch in the form of the -g flag.  You could include more or less info depending on the -g level.  From man gcc:

       -glevel
           Request debugging information and also use level to specify how much information.  The default level is 2.

           Level 0 produces no debug information at all.  Thus, -g0 negates -g.

           Level 1 produces minimal information, enough for making backtraces in parts of the program that you don't plan to
           debug.  This includes descriptions of functions and external variables, but no information about local variables
           and no line numbers.

           Level 3 includes extra information, such as all the macro definitions present in the program.  Some debuggers
           support macro expansion when you use -g3.
Comment 43 Rafael Ávila de Espíndola 2011-04-29 14:22:04 PDT
I perfectly understand that complete in here is "emitting complete debug info
for types that are used". That implies that it should be the same in each TU. If we really must do it, then we should at least put it in a comdat so that only the .o files are big, no the final output.

I have never seen -femit-struct-debug-baseonly  being used.

To give a perspective on how bad the problem is, a debug build of clang compiled with gcc 4.4 is 332 MB, one compiled with clang is 670 MB and one compiled with gcc 4.6 -gdwarf-4 is 309 MB.
Comment 44 Chris Lattner 2011-04-29 19:41:52 PDT
Devang, I agree that we don't want to just throw out all information, but there is clearly something wrong here if we're generating 5x more debug information than GCC.  The debug experience with GCC is "good enough" in this case, so it seems that there is something we can do.  If there is another way to recover this hit in a different/better way, then I'm certainly in favor of it, but we shouldn't just ignore such a huge compile time and size regression from GCC.
Comment 45 Bob Wilson 2011-04-30 14:08:42 PDT
Do we know for sure that recent versions of gcc are emitting incomplete debug info for some types?  If so, can someone with access to those compilers look into how they decide what to leave out?  Perhaps more importantly, how much of the overall size difference is due to that?

It's hard for me to believe that a 5x difference is due to that alone.  Such a large difference seems more likely to involve use of comdat sections or some similar big win.

It would be great if someone outside Apple could take the lead on this.  We're comparing against our gcc-4.2 compiler, which isn't that much different than clang.
Comment 46 Rafael Ávila de Espíndola 2011-04-30 14:27:33 PDT
Very new versions of gcc produce complete info in a comdat if using -gdwarf-4. That way the linker can keep only one copy and is very fast at doing so.

Roman's test suggests that when not using comdat they emit debug info only for the bits that are needed for that translation unit.
Comment 47 Devang Patel 2011-05-02 11:10:35 PDT
Chris I agree with you. I think, this example is just an outlier. But as Bob, said, more thorough investigation is needed.  If I am reading Rafael's numbers, than -gdwarf-4 is also a distraction at the moment.

clang : 670  MB
gcc-4.4 : 332 MB
gcc-4.6 -gdwarf-4 : 309 MB

At the moment, 670 MB -> 332 MB  drop is important to understand then 10% drop gained by using -gdwarf-4 in gcc.
Comment 48 Rafael Ávila de Espíndola 2011-05-02 12:58:23 PDT
-gdwarf-4 enables the use of comdats as explained in the wiki link. It might actually be easier to implement this is LLVM since it is producing complete types, just putting them in a comdat so that the linker can drop duplicates.
Comment 49 Chris Lattner 2011-05-13 02:00:20 PDT
Cloned this to rdar://9432568 for tracking.
Comment 50 russell.power 2011-08-11 17:38:53 PDT
Just chiming in here; it seems a possible explanation for the varying of sizes between cases is that gcc is using indirect strings in the debug_info and llvm is not:

readelf --debug-dump profile-gcc.o | grep 'DW_AT_name' | grep -c indirect 
15398
readelf --debug-dump profile-gcc.o | grep 'DW_AT_name' | grep -vc indirect
2048
readelf --debug-dump profile-clang.o | grep 'DW_AT_name' | grep -c indirect
48270
readelf --debug-dump profile-clang.o | grep 'DW_AT_name' | grep -vc indirect
0

If the number of duplicated symbol references is small, or the names are short, then using direct names isn't a big deal, but obviously it can become so.

I'm just guessing here though, as I'm not sure how dwarf debugging is structured; but grepping over the object files seems to confirm significantly more duplication in the llvm code.

readelf -p '.debug_info' profile-clang.o >| debug-clang.dump
readelf -p '.debug_info' profile-gcc.o >| debug-gcc.dump

grep -ac new_allocator debug-{clang,gcc}.dump
debug-clang.dump:3198
debug-gcc.dump:0
Comment 51 Nico Weber 2012-02-07 15:25:49 PST
*** Bug 11941 has been marked as a duplicate of this bug. ***
Comment 52 Nico Weber 2012-02-07 15:43:25 PST
Updating the title, since it's no longer 5x as slow.

Here's some up-to-date data: (gcc 4.4, clang from last week – 149419 – with memcpy() codegen bug, chromium r120581).


thakis@yearofthelinuxdesktop:/usr/local/google/chrome/src$ ls -l out_clang/Debug/chrome
-rwxr-x--- 1 thakis eng 1398968296 2012-02-07 11:50 out_clang/Debug/chrome

thakis@yearofthelinuxdesktop:/usr/local/google/chrome/src$ ls -l out_gcc/Debug/chrome
-rwxr-x--- 1 thakis eng 1220239880 2012-02-07 13:43 out_gcc/Debug/chrome

thakis@yearofthelinuxdesktop:/usr/local/google/chrome/src$ objdump -h ./out_clang/Debug/chrome | perl -pe 's/([0-9a-f]{8}).*/hex($1)/eg'| sort -k3 -n | tail -10 | tac
 33 .debug_info   971787546
 40 .debug_str    85365496
 39 .debug_pubtypes 82737857
 13 .text         76172952
 37 .debug_line   50219067
 17 .rodata       18496976
 19 .eh_frame     17648412
 34 .debug_abbrev 8310060
  9 .rela.dyn     5944992
 20 .eh_frame_hdr 4633348

thakis@yearofthelinuxdesktop:/usr/local/google/chrome/src$ objdump -h ./out_gcc/Debug/chrome | perl -pe 's/([0-9a-f]{8}).*/hex($1)/eg'| sort -k3 -n | tail -10 | tac
 34 .debug_info   652246456
 37 .debug_pubnames 111541924
 39 .debug_str    78688819
 36 .debug_loc    77704123
 13 .text         69931528
 35 .debug_line   39035120
 17 .rodata       34763456
 19 .eh_frame     16709788
 40 .debug_ranges 16457872
  9 .rela.dyn     14992176
Comment 53 Nico Weber 2012-02-07 15:54:43 PST
Created attachment 8015 [details]
Script to find .o files where clang's output is much bigger.
Comment 54 Nico Weber 2012-02-07 15:59:31 PST
Created attachment 8016 [details]
Preprocessed example file.

Output of the previous script:

3949 kB bigger: out_clang/Debug/obj/chrome/common/common.logging_chrome.o 
3647 kB bigger: out_clang/Debug/obj/third_party/WebKit/Source/WebCore/WebCore.gyp/gen/webkit/webcore_bindings.SVGElementFactory.o 
3612 kB bigger: out_clang/Debug/obj/chrome/browser/profiles/browser.profile_impl.o 
3214 kB bigger: out_clang/Debug/obj/chrome/browser/automation/browser.testing_automation_provider.o 
2975 kB bigger: out_clang/Debug/obj/chrome/browser/ui/browser.browser_init.o 
2885 kB bigger: out_clang/Debug/obj/chrome/browser/sync/browser.profile_sync_components_factory_impl.o 
2882 kB bigger: out_clang/Debug/obj/third_party/WebKit/Source/WebKit/chromium/src/webkit.WebViewImpl.o 
2772 kB bigger: out_clang/Debug/obj/chrome/browser/browser.chrome_browser_main.o 
2724 kB bigger: out_clang/Debug/obj/third_party/WebKit/Source/WebKit/chromium/src/webkit.WebFrameImpl.o 
2717 kB bigger: out_clang/Debug/obj/chrome/browser/browser.chrome_content_browser_client.o 

So logging_chrome.cc ( http://code.google.com/codesearch#OAMlx_jo-ck/src/chrome/common/logging_chrome.cc&exact_package=chromium&q=logging_chrome.cc ) is almost 4 MB bigger when built with clang.


Here's the breakdown for that file:


thakis@yearofthelinuxdesktop:/usr/local/google/chrome/src$ objdump -h out_clang/Debug/obj/chrome/common/common.logging_chrome.o | perl -pe 's/([0-9a-f]{8}).*/hex($1)/eg'| sort -k3 -n | tail -10 | tac
2885 .debug_str    1701546
2878 .debug_info   1111490
2884 .debug_pubtypes 208317
5767 .eh_frame     118168
2882 .debug_line   85517
2887 .text.startup 49914
2875 .text         3462
2879 .debug_abbrev 2641
2877 .bss          1391
4308 .text._ZNSt6vectorIPN9__gnu_cxx15_Hashtable_nodeISt4pairIKjPFvPSsPKN3IPC7MessageES4_EEEESaISD_EE14_M_fill_insertENS0_17__normal_iteratorIPSD_SF_EEmRKSD_ 1114
thakis@yearofthelinuxdesktop:/usr/local/google/chrome/src$ objdump -h out_gcc/Debug/obj/chrome/common/common.logging_chrome.o | perl -pe 's/([0-9a-f]{8}).*/hex($1)/eg'| sort -k3 -n | tail -10 | tac
1517 .debug_info   322155
3039 .debug_str    279041
3036 .debug_pubnames 264787
3035 .debug_loc    116052
3042 .eh_frame     42816
1518 .debug_line   42710
3038 .debug_ranges 24464
3037 .debug_aranges 24256
1513 .text         20197
1516 .debug_abbrev 2759


I'm attaching the (clang-)preprocessed output of logging_chrome.cc.
Comment 55 Eric Christopher 2012-02-07 16:08:04 PST
Interesting. Thanks for the additional information!
Comment 56 Nico Weber 2012-02-07 16:14:19 PST
Here are the current chrome build numbers, from the dupe:


[reply] [-] Description Nico Weber 2012-02-07 13:32:21 CST
object files generated by clang tend to take ~20% longer to link than the ones
generated by gcc, due to chubby/different debug information.

I did incremental builds (touch one file, measure rebuild time – this measures
almost exclusively ld time) of the chrome binary in several scenarios. In debug
builds, the object files generated by clang take up to 6 seconds longer to link
into a final binary (36s instead of 30s, or in with a shared library build, 31s
instead of 27s). In release builds, there's no big difference.

So while compiling with clang is faster, linking the resulting object files
currently takes longer.

Raw numbers (each is the min of 3 runs):

Chrome incremental build times (ninja, gold on by default, debug
builds, gcc4.4, chromeclang)
gcc
touch file in net (net/base/mime_util.cc) 29.4s
touch file in browser (c/b/u/g/chrome_gtk_frame.cc) 32.8s

clang
touch file in net 35.9s
touch file in browser 36.3s

gcc component build (libv8.so instead of libv8.a etc)
touch file in net 26.5s
touch file in browser 8.7s

clang component build
touch file in net 30.8s
touch file in browser 10.6s

(release builds)
gcc
touch file in net 6.4s
touch file in browser 5.9s

clang
touch file in net 6.3s
touch file in browser 6.2s

gcc component
touch file in net 6.7s
touch file in browser 3.1s

clang component
touch file in net 6.6s
touch file in browser 3.2s


…and with that, I'm done spamming this bug for a while.
Comment 57 Eric Christopher 2012-02-20 18:22:45 PST
Hrm. logging_chrome.ii is preprocessed on linux and not on the mac. I'll probably need someone with access to gcc on linux to do a bit of analysis of which types are included and which aren't - and we can try to come up with a "why" and "how" from there.
Comment 58 Eric Christopher 2012-02-20 19:33:03 PST
FWIW I've got the -wi output of logging_chrome.ii from linux. The formatting is... annoying so it's slow going looking through it. It's obviously much larger though.
Comment 59 Nico Weber 2012-03-08 17:19:26 PST
I sent echristo a mach-o version of logging_chrome.o (built with clang, and with a local build of gcc4.6). If anybody else wants that, let me know. (It's too big to attach it.)
Comment 60 Eric Christopher 2012-10-25 13:24:39 PDT
Not looking at this currently.
Comment 61 Nico Weber 2013-07-30 17:41:58 PDT
I checked if this has improved. It hasn't. clang's debug binary output is now 2GB (due to chrome growing over time I suppose). gcc4.6's has grown a lot, it's now 1.76GB (still 12% smaller than clang's).

Linking 'chrome' in debug mode after touching a single file takes 1m42s with gcc (warm cache), but 7m11s with clang over 400% slower by now.
Comment 62 Eric Christopher 2013-07-30 20:34:30 PDT
First approximation guess would be relocations, but I'll need to get something set up to look at it. I'll move this up my priority list.
Comment 63 Eric Christopher 2013-08-02 13:14:05 PDT
As another side note I'm not quite sure what's going on in the debug information in particular that's making linking so slow. It'll be worth looking into.
Comment 64 David Blaikie 2013-08-08 12:22:20 PDT
First semi-random example of differences between Clang & GCC:

int func(void (*)());
int func(int);

struct foo {
  enum { ID = 42 };
  static void bar();
};

int i = func(foo::bar); // neither emit 'foo'
// int j = func(foo::ID); // both emit 'foo'

struct bar {
  bar() {
    func(foo::bar); // neither emit 'foo'
    // func(foo::ID); // Clang emits 'foo', GCC does not
  }
} b;

This affects all the "ChromeUtilityHostMsg" objects that use a registration system not unlike foo+bar here. See "ChromeUtilityHostMsg_ParseJSON_Succeeded" for example. GCC doesn't produce any debug info for that type, even though the ctor called from the global initializer for g_LoggerRegisterHelperChromeUtilityHostMsg_ParseJSON_Succeeded references the type (both the ID enum, and the Log builder function)

(side note: why do you have the extra LoggerRegisterHelperChromeUtilityHostMsg_ParseJSON_Succeeded class? Why not have a generic registration type with ctor parameters: LoggerRegisterHelper g_LoggerRegisterHelper(ChromeUtilityHostMsg_ParseJSON_Succeeded::ID, ChromeUtilityHostMsg_ParseJSON_Succeeded::Log); ? that would reduce Clang's debug info (you'd drop one out of every pair of these types), though it'd increase GCC's because it would now actually start emitting the debug info for ChromeUtilityHostMsg_ParseJSON_Succeeded)
Comment 65 David Blaikie 2013-08-08 12:48:20 PDT
Another issue we noticed is that Clang produces debug info for implicit special members that are ODR used, but never codegen'd (because they were frontend inlined*, or used in inline functions that were never called/codegen'd, etc). eg:

struct foo {
  int i;
};

void func(foo*);

int main() {
  foo f;
  func(&f);
}

Clang produces a description of 'foo' with one member 'i' and one subprogram 'foo' (the ctor) for which there is no definition. GCC only describes 'i' and does not describe a ctor 'foo'.

Both compilers, given code such as:

void func(foo&(foo::*)(const foo&));
...
func(&foo::operator=);

produce debug info describing the type 'foo' with a subprogram 'operator=', the implicit default copy assignment operator.

As a minor note - when Clang (either rightly or wrongly) emits the declarations of these implicit members, it provides file and line numbers for those declarations. GCC does not.

Also, Clang produces "DW_AT_accessibility" on all members, even with the default accessibility of the type.

* perhaps we could use the inline debug info metadata description from the frontend to describe these inlinings to produce more complete debug info
Comment 66 David Blaikie 2013-08-08 12:50:40 PDT
Similar to the previous point about special members, Clang emits the declaration of all template specializations, even those that are never codegen'd as in this example:

struct foo {
  template<typename T>
  void func() {
  }
};

inline void func() {
  foo().func<int>();
}

int main() {
  foo f;
  f.func<float>();
}

Since 'func' is never codegen'd (as it is never called), func<int> is never generated either - Clang still emits a debug info description for both func<int> and func<float>. GCC only emits func<float>.

This issue and the implicit special members can both be addressed in a similar/unified way - don't emit those declarations when emitting the type, but add them after the fact if we end up codegening them.
Comment 67 David Blaikie 2013-08-08 13:01:01 PDT
Probable GCC bug is producing small debug info when it comes to stream-related code such as:

#include <fstream>

int main() {
  std::ifstream f;
  return f.bad();
}

The total object file size from GCC is nearly 1/4th that of Clang, but that's because GCC only produced debug info describing the /declaration/ of basic_ifstream<char>. It didn't describe any of the members (including the 'bad()' function called there). Clang does describe this type (& its dependencies) in full, which seems appropriate.

I haven't narrowed this down fully, but it seems to be related to libstdc++'s use of extern template as follows:

extern template class basic_ifstream<char>;

if that line is not present, the full type information is emitted just like Clang. This might seem sort of reasonable, if the full definition of basic_ifstream<char> was emitted in the debug-built binaries of libstdc++, but so far as I can tell, it is not (I only see another declaration there). But perhaps I've done something wrong in that investigation.

If I can come up with a standalone repro, I'll post those details.
Comment 68 David Blaikie 2013-08-08 16:54:46 PDT
A little more detail on the fstream issue. I've simplified it down to:

struct a {
};

template<typename T>
struct b : virtual a {
  void func() {
  }
};

extern template class b<int>;

int main() {
  b<int> x;
  x.func();
}

In GCC's DWARF: 'b' is emitted as a declaration but it contains the DW_TAG_subprograms of 'b's ctor and dtor (if those are explicit, rather than implicit, it does not include them).

If the "extern template class b<int>" is proceeded by "template class b<int>" then the debug info for 'func' is emitted (I'm not sure what/where/why the problem exists for this in libstdc++'s debug symbol builds - I may've looked at the wrong libs or in the wrong way, or linked to the wrong ones, etc...)
Comment 69 Nick Lewycky 2013-08-08 17:06:38 PDT
Referring to comment 67, looking at gcc 4.6 behaviour, gcc does ship all the debug info for an extern template where it's instantiated, not where it's used. So this program:

#include <fstream>

int main() {
  std::ifstream f;
  return f.bad();
}

builds a binary that does not have debug info for ifstream, but its debug info can be found in the libstdc++ .so instead.
Comment 70 David Blaikie 2013-08-12 01:23:13 PDT
(In reply to comment #64)
> First semi-random example of differences between Clang & GCC:
> 
> int func(void (*)());
> int func(int);
> 
> struct foo {
>   enum { ID = 42 };
>   static void bar();
> };
> 
> int i = func(foo::bar); // neither emit 'foo'
> // int j = func(foo::ID); // both emit 'foo'
> 
> struct bar {
>   bar() {
>     func(foo::bar); // neither emit 'foo'
>     // func(foo::ID); // Clang emits 'foo', GCC does not
>   }
> } b;
> 
> This affects all the "ChromeUtilityHostMsg" objects that use a registration
> system not unlike foo+bar here. See
> "ChromeUtilityHostMsg_ParseJSON_Succeeded" for example. GCC doesn't produce
> any debug info for that type, even though the ctor called from the global
> initializer for
> g_LoggerRegisterHelperChromeUtilityHostMsg_ParseJSON_Succeeded references
> the type (both the ID enum, and the Log builder function)
> 
> (side note: why do you have the extra
> LoggerRegisterHelperChromeUtilityHostMsg_ParseJSON_Succeeded class? Why not
> have a generic registration type with ctor parameters: LoggerRegisterHelper
> g_LoggerRegisterHelper(ChromeUtilityHostMsg_ParseJSON_Succeeded::ID,
> ChromeUtilityHostMsg_ParseJSON_Succeeded::Log); ? that would reduce Clang's
> debug info (you'd drop one out of every pair of these types), though it'd
> increase GCC's because it would now actually start emitting the debug info
> for ChromeUtilityHostMsg_ParseJSON_Succeeded)

Worse than this - even if you switch out the registration system for something more like what I described, something else seems to be getting in the way:

struct base {
  virtual void func() {
  }
};
struct foo: base {
  enum { ID };
};
struct reg {
  reg(int);
};
reg r(foo::ID);

GCC emits a declaration for 'foo', not a definition (so GCC's debug info doesn't mention 'base' at all. The absence of a virtual function in 'base' causes this not to happen & the full definition of 'foo' to be emitted, including the DW_AT_inheritance refering to 'base', and the complete 'base' definition.

Given the absence of any key function here, I'm not sure what GCC's deal is. These are polymorphic classes that may be seen in no other TU than this one where their code must be emitted (due to the presence of virtual functions). I don't know how GCC could rely on these types to be emitted in any other TU - and if not, this seems like another GCC bug.

Perhaps someone from GCC can explain how/why this is correct & what the logic is here & we can implement it, but for now I'm suspicious.

I'd like to find a way around this bug or optimization so I can compare Clang v GCC debug info sans this issue & see how much it contributes to the problems with this TU (my theory is that it's the major issue/difference in this TU - the dependencies brought in via the base class & parameter types of these registration objects are substantial & there are many of them) but I'm not entirely sure how to do that. I'll keep experimenting.
Comment 71 David Blaikie 2013-08-12 11:58:44 PDT
So - a theory to explain Comments 64, 67-70:

GCC is emitting type information only when the vtable (& other virtual 'stuff' - such as virtual base handling) is emitted for any type that has a vtable.

That's why this looks weird for the explicit template instantiation case - it's not specifically targeting that case, it just gets it for free because the type then ends up with a key function (at least one out of line virtual function) & thus the vtable isn't emitted at the call site.

This is a fairly sound idea (one we've considered implementing before) though there are some trivial ways it can fail (whether or not such code is likely to occur in the wild is open for debate):

struct foo {
  virtual ~foo();
  static void func();
};

int main() {
  foo::func();
}

in this case, the virtual functions of 'foo' may never be defined - since no instance of 'foo' is ever constructed, this is valid. Yet the referenced static member will still be uncallable from the debugger because GCC emits no mention of teh 'foo' type at all.
Comment 72 David Blaikie 2013-08-14 20:09:43 PDT
Just while I've got this here, I've prototyped the "only emit debug info for class definitions along with the vtable for any type with a vtable" idea and here are some numbers:

with the original logging TU:

Strings with baseline Clang: 30919
Strings with improved Clang:  7413
Strings with        GCC 4.7:  6601

If we change the registration system so GCC doesn't win by avoiding the intermediate registration types (due to the ID reference being ignored by GCC):

Strings with improved Clang:  6068
Strings with        GCC 4.7:  5560

So we get pretty close.
Comment 73 David Blaikie 2013-08-16 15:52:45 PDT
I'm going to consider this resolved by r188576 as it reduces the debug info for the example down to 

If you have other particularly bad examples, please file new bugs/data & I'll investigate.

GCC 4.7:
3034 .debug_str    359705
3028 .debug_info   349411
3030 .debug_loc    146400
3033 .debug_line   43328
3032 .debug_ranges 24592
3031 .debug_aranges 24224
3029 .debug_abbrev 2854

Clang (old):
2885 .debug_str    1807241
2878 .debug_info   1121994
2882 .debug_line   88887
2879 .debug_abbrev 2445

Clang (r188576):
2878 .debug_info   356859
2885 .debug_str    331528
2882 .debug_line   82402
2879 .debug_abbrev 1258
Comment 74 Nico Weber 2013-10-16 14:52:59 PDT
I finally had the opportunity to check this.

With current trunk chromium, gcc 4.6, and clang r192635, a (statically linked) debug build of chrome is 1.9GB with gcc and 1.3GB with clang. Incremental build times after touching a file in the ui/ directory is 3m55s with clang and 4m02s with gcc. So this looks much better now, thanks!
Comment 75 David Blaikie 2013-10-16 14:55:06 PDT
Awesome! thanks for all your (everyone on this bug) help with repros, reductions, etc. Sorry it was a bit of a while coming.
Comment 76 Eric Christopher 2013-10-16 14:57:33 PDT
Woot.
Comment 77 Nico Weber 2013-10-18 14:51:20 PDT
As a follow-up (probably not related to this bug): I also measure incremental build times with a component build, and clang is ~10% slower than gcc 4.6 for a full build of chrome (24min vs 22min), and still a good deal slower in incremental builds (1.6s vs 1.2s). So generally I feel that more perf work is needed on linux.
Comment 78 Eric Christopher 2013-10-18 18:20:26 PDT
OK, interesting. If you ever get any information for it let me know, otherwise I'll see what I can work up as I get to it.
Comment 79 Eric Christopher 2014-03-21 17:04:42 PDT
As an update, a lot of things have been improved and we're seeing build and link times (including incremental) under gcc at this point. Thanks again everyone!
Comment 80 Eric Christopher 2014-03-21 17:05:01 PDT
As an update, a lot of things have been improved and we're seeing build and link times (including incremental) under gcc at this point. Thanks again everyone!