I am looking into why gcc Chrome is ~600mb while clang builds of Chrome are 1.7gb. (Both with debugging symbols.) I've attached the preprocessed version of one of the files that differs the most between the two compilers. With the same input file, I can generate two output files: g++ -pthread -O0 -g -c -o profile.gcc profile.ii ../llvm/Release/bin/clang++ -pthread -O0 -g -c -o profile.gcc profile.ii Their sizes differ significantly: -rw-r--r-- 1 evanm evanm 4709032 2010-07-01 15:30 profile.clang -rw-r--r-- 1 evanm evanm 2570352 2010-07-01 15:30 profile.gcc gzipped file attached. llvm$ svn info . tools/clang/ | grep Revision Revision: 107405 Revision: 107405
Er, that second command line should use "-o profile.clang" ../llvm/Release/bin/clang++ -pthread -O0 -g -c -o profile.clang profile.ii -rw-r--r-- 1 evanm evanm 4835656 2010-07-01 15:44 profile.clang
g++ -v? Is the size due to debug info? What are the sizes with -g0?
Created attachment 5156 [details] preprocessed source
$ gcc --version gcc (Ubuntu 4.4.3-4ubuntu5) 4.4.3
With -g0, they are pretty close: $ ls -l profile.clang profile.gcc -rw-r--r-- 1 evanm evanm 649592 2010-07-01 15:55 profile.clang -rw-r--r-- 1 evanm evanm 615512 2010-07-01 15:55 profile.gcc
Devang, this looks like an example of where Clang is producing much more bloated debug info for c++ code than GCC is. Can you investigate?
My g++-4.2 does not compile attached preprocessed source file. It'd be great if you could attach an example that I can compile using g++-4.2 for comparison. Otherwise, it'd be easier to analyze size bloat if there is a smaller test case.
Attached a hacked up version of the source which will build with gcc-4.2 (although the code wouldn't work properly, it should be fine for investigating debug info). On Darwin, Clang is actually doing better than GCC -- it seems like gcc-4.4 has made improvements in this area. -- ddunbar@giles:tmp$ xclang -v Apple clang version 2.0 (trunk 109497) Target: x86_64-apple-darwin10 Thread model: posix ddunbar@giles:tmp$ xclang -g -c -o t.clang.o t.ii ddunbar@giles:tmp$ gcc -g -c -o t.gcc.o t.ii ddunbar@giles:tmp$ ls -l t.*.o ls -l t.*.o -rw-r--r-- 1 ddunbar wheel 4463664 Jul 31 13:18 t.clang.o -rw-r--r-- 1 ddunbar wheel 5065984 Jul 31 13:18 t.gcc.o ddunbar@giles:tmp$ -- I think we are going to need a more reduced test case to investigate more thoroughly...
Created attachment 5304 [details] Hacked up profile.ii, which builds with gcc-4.2
In building Chrome with clang vs gcc and comparing files pair-wise, the size difference distribution was all over -- some were smaller with clang, some larger. The summed result though is a binary larger by a factor of a 2x. I grabbed this file because it exhibited the largest difference, so I had hoped it would have made the problem more obvious (e.g. maybe it's just one huge symbol, or maybe each symbol is just a little bit larger). I can provide any of the other files, but I'm not sure whether they'll help much. I guess what I'm saying is: I'd like to help, please give me advice on how.
I agree with your approach, starting with the file with the biggest difference is a good place. The problem is that we can't reproduce the issue on Darwin with GCC 4.2. If you attach the .s files generated with GCC 4.4 and Clang on your platform, Devang might be able to tell which bits of debug info GCC is leaving out. Another approach would be to see if the size difference manifests on Darwin using GCC 4.2. If so, and you can grab the .o file with the biggest difference there, we can probably do a better job of investigating. p.s. Does Chrome built with Clang work?
Adding Nico, who has successfully built on Mac. Nico: to find files with large differences, I did something like find gcc_output -name '*.o' | xargs ls -l | awk "{...}' > gcc-sizes to get a list of 'filename size' pairs, and then the same thing for clang, and then used the coreutils 'join' to merge those lists, and then another awk pass to find the largest difference. Dunno if GNU coreutils are available on Mac, but I'm sure you can figure something out. I might be able to try a build on an Ubuntu Hardy machine (also gcc 4.2), but my recollection is that there are other problems there like http://llvm.org/bugs/show_bug.cgi?id=6379 Chrome on Clang: yeah, it runs! On-page images come out slightly corrupted (gonna be annoying to track down) and there's one last v8 patch that is some template trickiness that they want me to perftest on gcc/MSVC to make sure I don't regress there, and I haven't gotten around to it.
(Chrome/Mac on Clang: Main binary builds and runs in Debug, but fails to link in Release. unit_tests, the binary which contains most of chrome's automated tests, builds in Debug but reports many test failures when build with clang and eventually crashes.)
FWIW, on OS X they're in the same ballpark, clang's ~11% bigger: hummer:src thakis$ ls -l clang/sym/Debug/Chromium\ Framework.framework/Versions/Current/Chromium\ Framework -rwxr-xr-x 1 thakis eng 168898096 Sep 22 14:16 clang/sym/Debug/Chromium Framework.framework/Versions/Current/Chromium Framework hummer:src thakis$ ls -l xcodebuild/Debug/Chromium\ Framework.framework/Versions/Current/Chromium\ Framework -rwxr-xr-x 1 thakis eng 150708884 Sep 23 12:36 xcodebuild/Debug/Chromium Framework.framework/Versions/Current/Chromium Framework
Here are the ten largest sections and their from Linux Chrome. (Ignore the first number in each line, that's just the section number). gcc 4.4.3: 34 .debug_info 332565787 37 .debug_pubnames 64351219 40 .debug_str 44044988 13 .text 36689692 36 .debug_loc 26376994 17 .rodata 24766708 35 .debug_line 19909279 41 .debug_frame 15068712 33 .debug_abbrev 8697456 39 .debug_ranges 4837592 clang: 31 .debug_info 2007297046 35 .debug_line 60710971 11 .text 50932856 30 .debug_frame 30143604 12 .rodata 20520661 25 .eh_frame 14649532 37 .debug_pubnames 6682424 32 .debug_abbrev 6138321 15 .eh_frame_hdr 2925708 21 .data 1863576 Generated with: objdump -h ./out/Debug/chrome | perl -pe 's/([0-9a-f]{8}).*/hex($1)/eg' | sort -k3 -n | tail -10 | tac
Er, I meant: "Here are the ten largest sections and their *size* from Linux Chrome." ls -lh says gcc produces 589M binary, while clang produces 2.1G binary.
I've been trying to find a reduced test case for this, basically taking the preprocessed profile.cc, ripping out stuff while it still compiles and still generates a significantly larger .o file with Clang than with GCC. I'm not sure the reasons for the difference in size here is the same as for the difference in size for all of Chrome, but hopefully it could be helpful in some way. Compiling the attached file like this: clang++ -g -c /tmp/reduction.ii -o /tmp/a.o.clang And comparing with GCC (4.2.1 on Darwin, 4.4.3 on Linux) using the same flags yields these file sizes (in bytes): On x86_64-unknown-linux-gnu: Clang: 4136 GCC: 2448 On x86_64-apple-darwin10: Clang: 3224 GCC: 3528 (This was using Clang built from r123315.) So it seems that Clang generates a larger .o file than GCC on Linux. On Darwin it seems about the same. Not using the -g option makes them come out about the same size.
Created attachment 5997 [details] Heavily reduced profile.ii
4k seems small enough to objdump -D the files and look for clues.
chandlerc says that gcc uses a version of dwarf debugging information that has been tuned to be small (dwarf 2?), while clang uses the old and bloaty version (dwarf 1?).
The debug information causes big .o files, which in turn makes ld very slow. This bug makes chrome builds with clang/linux 5x slower than gcc. For now, we've disabled debug information generation on the clang/linux bots because of this. See http://code.google.com/p/chromium/issues/detail?id=70000
I know of two things that can be done to improve clang's debug info size: * Enable .cfi_* support. This should help on os X too. * Create comdat's. This would make it possible for gnu ld and gold to drop duplicated debug info. There might be more advantages in dwarf4, but I am not sure.
I suspect a large chunk of this problem will be solved by turning on the .cfi_* sections Rafael. That should in particular help the non-type debug information size. The type debug info size is what would be helped most by dwarf4 (from my faded memory...)
Regarding "comment 20", it is not true. llvm emits dwarf3. Someone with access to linux machine should take a very small test case and see why debug info is bloated 5x (as per claims) on linux but not on darwin.
(In reply to comment #24) > Someone with access to linux machine should take a very small test case and see > why debug info is bloated 5x (as per claims) on linux but not on darwin. Comments #8 and #11 indicate that gcc 4.2 (the Darwin one) has behavior similar to clang, but current gcc versions are the ones that produce the problem.
(In reply to comment #25) > (In reply to comment #24) > > Someone with access to linux machine should take a very small test case and see > > why debug info is bloated 5x (as per claims) on linux but not on darwin. > > Comments #8 and #11 indicate that gcc 4.2 (the Darwin one) has behavior similar > to clang, but current gcc versions are the ones that produce the problem. That was even *more* confusing, sorry. - gcc 4.2 has behavior similar to clang, with large outputs - current gcc has much smaller debug output
Aha.. thanks for clarification.
Evan: I'm not sure that's completely right. On the chromium waterfall, the linux/clang bot was way slower than the clang/mac bot before I disabled debug information generation. I believe both bots have similar hardware.
One big difference is that on darwin most of the debug info is not copied from the .o files, so it is not hit as hard by having large debug info. Are you using gold on linux btw?
Yes, we use gold.
(In reply to comment #23) > I suspect a large chunk of this problem will be solved by turning on the .cfi_* > sections Rafael. That should in particular help the non-type debug information > size. The type debug info size is what would be helped most by dwarf4 (from my > faded memory...) One can try to enable .cfi stuff right now. I must admit I haven't tested the emission thoroughly. Basically, on order to enable .cfi emission it's sufficient to change the exceptiontype inside x86 mcasminfo from DwarfTable to DwarfCFI and recompile llvm + clang.
I compiled the reduction.ii with clang++ and g++47 and dumped the dwarf tables via "readelf -w", it looks like clang++ is emiting DW_TAGs for subprograms that g++47 is not. Formal parameters and their types must be emitted too thus the debug info is much bigger. It seems to me that g++47 is not emiting debug info for unused stuff, no idea how it knows whats unused as the cmdline I used is: clang++|g++47 -c -O0 -g2 reduction.ii && ls -l reduction.o the readelf -w dumps are attached
Created attachment 6504 [details] clang readelf output
Created attachment 6505 [details] gcc readelf output
Roman, thanks for analysis and test case reduction. It seems gcc does not emit Profile member GetWebDataService() at all. This means it emits incomplete type info for class Profile. Unless there is a guarantee to have and select complete type info for class Profile at link time, skipping member info may not be a good idea IMO.
If that method is used in some other translation unit, then it will be emitted there, right? This seems exactly the same as not emitting debug info for unused "static inline" C functions.
IMO print all Profile members. gdb) ptype Profile It possible that not all Profile members are used in an application. And it is quite possible, I think quite normal, that each translation unit uses different subset of Profile members. The debug info for class Profile should be either complete or just a forward decl such that it is resolved to complete type info in the end. Is it the case that gcc is omitting GetWebDataService() just because it is an abstract virtual member function ?
The same argument applies to static and inline functions. I agree that there is a size/quality tradeoff here, but generating MUCH MUCH larger debug info is a bug, not a feature. :) -Chris
IMO whenever size/quality tradeoff is made, it should never mislead users. Two points why this is not same as static function tradeoff: 1) In case of static and inline function, we do not expect that it is reasonable to ask "list me all functions you got" and get correct answer. However, in case of a class, it is reasonable to ask and expect correct answer for "list me all class members". That's what "ptype" command does in gdb. 2) If the compiler is going to emit potentially distinct type info for a class in each translation unit then, unless there is a link time step to unify them in a complete type info, the debugger may present you random type info for your class depending upon where you hit the breakpoint in project. I do not think it is user friendly.
If you really want to emit full types for each translation unit, you might want to consider http://wiki.dwarfstd.org/index.php?title=COMDAT_Type_Sections But even then, if we still get a 5x slowdown I don't think it is worth it.
Rafael, we are debating scope of "complete" in "emitting complete debug info for types that are used". If two incompatible type info for a single class is emitted in two separate translation units then its type signature won't match if the linker wants to eliminate duplicates. BTW, the doc you linked mentions -femit-struct-debug-baseonly implemented by GNU compilers. Do you know what it is exactly, how widely it is used, is it on by default etc.. ? However, We should be able to avoid debug info for class Expensive in following artificial example, though. --- class Expensive { public: int a,b,c,d,e; }; class A { public: int foo() { Expensive e1; return e1.a; } static int i; }; int A::i = 0; ---
I can't comment on the technical debate here but I thought I'd add that if it can't be made to Just Work, there is an escape hatch in the form of the -g flag. You could include more or less info depending on the -g level. From man gcc: -glevel Request debugging information and also use level to specify how much information. The default level is 2. Level 0 produces no debug information at all. Thus, -g0 negates -g. Level 1 produces minimal information, enough for making backtraces in parts of the program that you don't plan to debug. This includes descriptions of functions and external variables, but no information about local variables and no line numbers. Level 3 includes extra information, such as all the macro definitions present in the program. Some debuggers support macro expansion when you use -g3.
I perfectly understand that complete in here is "emitting complete debug info for types that are used". That implies that it should be the same in each TU. If we really must do it, then we should at least put it in a comdat so that only the .o files are big, no the final output. I have never seen -femit-struct-debug-baseonly being used. To give a perspective on how bad the problem is, a debug build of clang compiled with gcc 4.4 is 332 MB, one compiled with clang is 670 MB and one compiled with gcc 4.6 -gdwarf-4 is 309 MB.
Devang, I agree that we don't want to just throw out all information, but there is clearly something wrong here if we're generating 5x more debug information than GCC. The debug experience with GCC is "good enough" in this case, so it seems that there is something we can do. If there is another way to recover this hit in a different/better way, then I'm certainly in favor of it, but we shouldn't just ignore such a huge compile time and size regression from GCC.
Do we know for sure that recent versions of gcc are emitting incomplete debug info for some types? If so, can someone with access to those compilers look into how they decide what to leave out? Perhaps more importantly, how much of the overall size difference is due to that? It's hard for me to believe that a 5x difference is due to that alone. Such a large difference seems more likely to involve use of comdat sections or some similar big win. It would be great if someone outside Apple could take the lead on this. We're comparing against our gcc-4.2 compiler, which isn't that much different than clang.
Very new versions of gcc produce complete info in a comdat if using -gdwarf-4. That way the linker can keep only one copy and is very fast at doing so. Roman's test suggests that when not using comdat they emit debug info only for the bits that are needed for that translation unit.
Chris I agree with you. I think, this example is just an outlier. But as Bob, said, more thorough investigation is needed. If I am reading Rafael's numbers, than -gdwarf-4 is also a distraction at the moment. clang : 670 MB gcc-4.4 : 332 MB gcc-4.6 -gdwarf-4 : 309 MB At the moment, 670 MB -> 332 MB drop is important to understand then 10% drop gained by using -gdwarf-4 in gcc.
-gdwarf-4 enables the use of comdats as explained in the wiki link. It might actually be easier to implement this is LLVM since it is producing complete types, just putting them in a comdat so that the linker can drop duplicates.
Cloned this to rdar://9432568 for tracking.
Just chiming in here; it seems a possible explanation for the varying of sizes between cases is that gcc is using indirect strings in the debug_info and llvm is not: readelf --debug-dump profile-gcc.o | grep 'DW_AT_name' | grep -c indirect 15398 readelf --debug-dump profile-gcc.o | grep 'DW_AT_name' | grep -vc indirect 2048 readelf --debug-dump profile-clang.o | grep 'DW_AT_name' | grep -c indirect 48270 readelf --debug-dump profile-clang.o | grep 'DW_AT_name' | grep -vc indirect 0 If the number of duplicated symbol references is small, or the names are short, then using direct names isn't a big deal, but obviously it can become so. I'm just guessing here though, as I'm not sure how dwarf debugging is structured; but grepping over the object files seems to confirm significantly more duplication in the llvm code. readelf -p '.debug_info' profile-clang.o >| debug-clang.dump readelf -p '.debug_info' profile-gcc.o >| debug-gcc.dump grep -ac new_allocator debug-{clang,gcc}.dump debug-clang.dump:3198 debug-gcc.dump:0
*** Bug 11941 has been marked as a duplicate of this bug. ***
Updating the title, since it's no longer 5x as slow. Here's some up-to-date data: (gcc 4.4, clang from last week – 149419 – with memcpy() codegen bug, chromium r120581). thakis@yearofthelinuxdesktop:/usr/local/google/chrome/src$ ls -l out_clang/Debug/chrome -rwxr-x--- 1 thakis eng 1398968296 2012-02-07 11:50 out_clang/Debug/chrome thakis@yearofthelinuxdesktop:/usr/local/google/chrome/src$ ls -l out_gcc/Debug/chrome -rwxr-x--- 1 thakis eng 1220239880 2012-02-07 13:43 out_gcc/Debug/chrome thakis@yearofthelinuxdesktop:/usr/local/google/chrome/src$ objdump -h ./out_clang/Debug/chrome | perl -pe 's/([0-9a-f]{8}).*/hex($1)/eg'| sort -k3 -n | tail -10 | tac 33 .debug_info 971787546 40 .debug_str 85365496 39 .debug_pubtypes 82737857 13 .text 76172952 37 .debug_line 50219067 17 .rodata 18496976 19 .eh_frame 17648412 34 .debug_abbrev 8310060 9 .rela.dyn 5944992 20 .eh_frame_hdr 4633348 thakis@yearofthelinuxdesktop:/usr/local/google/chrome/src$ objdump -h ./out_gcc/Debug/chrome | perl -pe 's/([0-9a-f]{8}).*/hex($1)/eg'| sort -k3 -n | tail -10 | tac 34 .debug_info 652246456 37 .debug_pubnames 111541924 39 .debug_str 78688819 36 .debug_loc 77704123 13 .text 69931528 35 .debug_line 39035120 17 .rodata 34763456 19 .eh_frame 16709788 40 .debug_ranges 16457872 9 .rela.dyn 14992176
Created attachment 8015 [details] Script to find .o files where clang's output is much bigger.
Created attachment 8016 [details] Preprocessed example file. Output of the previous script: 3949 kB bigger: out_clang/Debug/obj/chrome/common/common.logging_chrome.o 3647 kB bigger: out_clang/Debug/obj/third_party/WebKit/Source/WebCore/WebCore.gyp/gen/webkit/webcore_bindings.SVGElementFactory.o 3612 kB bigger: out_clang/Debug/obj/chrome/browser/profiles/browser.profile_impl.o 3214 kB bigger: out_clang/Debug/obj/chrome/browser/automation/browser.testing_automation_provider.o 2975 kB bigger: out_clang/Debug/obj/chrome/browser/ui/browser.browser_init.o 2885 kB bigger: out_clang/Debug/obj/chrome/browser/sync/browser.profile_sync_components_factory_impl.o 2882 kB bigger: out_clang/Debug/obj/third_party/WebKit/Source/WebKit/chromium/src/webkit.WebViewImpl.o 2772 kB bigger: out_clang/Debug/obj/chrome/browser/browser.chrome_browser_main.o 2724 kB bigger: out_clang/Debug/obj/third_party/WebKit/Source/WebKit/chromium/src/webkit.WebFrameImpl.o 2717 kB bigger: out_clang/Debug/obj/chrome/browser/browser.chrome_content_browser_client.o So logging_chrome.cc ( http://code.google.com/codesearch#OAMlx_jo-ck/src/chrome/common/logging_chrome.cc&exact_package=chromium&q=logging_chrome.cc ) is almost 4 MB bigger when built with clang. Here's the breakdown for that file: thakis@yearofthelinuxdesktop:/usr/local/google/chrome/src$ objdump -h out_clang/Debug/obj/chrome/common/common.logging_chrome.o | perl -pe 's/([0-9a-f]{8}).*/hex($1)/eg'| sort -k3 -n | tail -10 | tac 2885 .debug_str 1701546 2878 .debug_info 1111490 2884 .debug_pubtypes 208317 5767 .eh_frame 118168 2882 .debug_line 85517 2887 .text.startup 49914 2875 .text 3462 2879 .debug_abbrev 2641 2877 .bss 1391 4308 .text._ZNSt6vectorIPN9__gnu_cxx15_Hashtable_nodeISt4pairIKjPFvPSsPKN3IPC7MessageES4_EEEESaISD_EE14_M_fill_insertENS0_17__normal_iteratorIPSD_SF_EEmRKSD_ 1114 thakis@yearofthelinuxdesktop:/usr/local/google/chrome/src$ objdump -h out_gcc/Debug/obj/chrome/common/common.logging_chrome.o | perl -pe 's/([0-9a-f]{8}).*/hex($1)/eg'| sort -k3 -n | tail -10 | tac 1517 .debug_info 322155 3039 .debug_str 279041 3036 .debug_pubnames 264787 3035 .debug_loc 116052 3042 .eh_frame 42816 1518 .debug_line 42710 3038 .debug_ranges 24464 3037 .debug_aranges 24256 1513 .text 20197 1516 .debug_abbrev 2759 I'm attaching the (clang-)preprocessed output of logging_chrome.cc.
Interesting. Thanks for the additional information!
Here are the current chrome build numbers, from the dupe: [reply] [-] Description Nico Weber 2012-02-07 13:32:21 CST object files generated by clang tend to take ~20% longer to link than the ones generated by gcc, due to chubby/different debug information. I did incremental builds (touch one file, measure rebuild time – this measures almost exclusively ld time) of the chrome binary in several scenarios. In debug builds, the object files generated by clang take up to 6 seconds longer to link into a final binary (36s instead of 30s, or in with a shared library build, 31s instead of 27s). In release builds, there's no big difference. So while compiling with clang is faster, linking the resulting object files currently takes longer. Raw numbers (each is the min of 3 runs): Chrome incremental build times (ninja, gold on by default, debug builds, gcc4.4, chromeclang) gcc touch file in net (net/base/mime_util.cc) 29.4s touch file in browser (c/b/u/g/chrome_gtk_frame.cc) 32.8s clang touch file in net 35.9s touch file in browser 36.3s gcc component build (libv8.so instead of libv8.a etc) touch file in net 26.5s touch file in browser 8.7s clang component build touch file in net 30.8s touch file in browser 10.6s (release builds) gcc touch file in net 6.4s touch file in browser 5.9s clang touch file in net 6.3s touch file in browser 6.2s gcc component touch file in net 6.7s touch file in browser 3.1s clang component touch file in net 6.6s touch file in browser 3.2s …and with that, I'm done spamming this bug for a while.
Hrm. logging_chrome.ii is preprocessed on linux and not on the mac. I'll probably need someone with access to gcc on linux to do a bit of analysis of which types are included and which aren't - and we can try to come up with a "why" and "how" from there.
FWIW I've got the -wi output of logging_chrome.ii from linux. The formatting is... annoying so it's slow going looking through it. It's obviously much larger though.
I sent echristo a mach-o version of logging_chrome.o (built with clang, and with a local build of gcc4.6). If anybody else wants that, let me know. (It's too big to attach it.)
Not looking at this currently.
I checked if this has improved. It hasn't. clang's debug binary output is now 2GB (due to chrome growing over time I suppose). gcc4.6's has grown a lot, it's now 1.76GB (still 12% smaller than clang's). Linking 'chrome' in debug mode after touching a single file takes 1m42s with gcc (warm cache), but 7m11s with clang over 400% slower by now.
First approximation guess would be relocations, but I'll need to get something set up to look at it. I'll move this up my priority list.
As another side note I'm not quite sure what's going on in the debug information in particular that's making linking so slow. It'll be worth looking into.
First semi-random example of differences between Clang & GCC: int func(void (*)()); int func(int); struct foo { enum { ID = 42 }; static void bar(); }; int i = func(foo::bar); // neither emit 'foo' // int j = func(foo::ID); // both emit 'foo' struct bar { bar() { func(foo::bar); // neither emit 'foo' // func(foo::ID); // Clang emits 'foo', GCC does not } } b; This affects all the "ChromeUtilityHostMsg" objects that use a registration system not unlike foo+bar here. See "ChromeUtilityHostMsg_ParseJSON_Succeeded" for example. GCC doesn't produce any debug info for that type, even though the ctor called from the global initializer for g_LoggerRegisterHelperChromeUtilityHostMsg_ParseJSON_Succeeded references the type (both the ID enum, and the Log builder function) (side note: why do you have the extra LoggerRegisterHelperChromeUtilityHostMsg_ParseJSON_Succeeded class? Why not have a generic registration type with ctor parameters: LoggerRegisterHelper g_LoggerRegisterHelper(ChromeUtilityHostMsg_ParseJSON_Succeeded::ID, ChromeUtilityHostMsg_ParseJSON_Succeeded::Log); ? that would reduce Clang's debug info (you'd drop one out of every pair of these types), though it'd increase GCC's because it would now actually start emitting the debug info for ChromeUtilityHostMsg_ParseJSON_Succeeded)
Another issue we noticed is that Clang produces debug info for implicit special members that are ODR used, but never codegen'd (because they were frontend inlined*, or used in inline functions that were never called/codegen'd, etc). eg: struct foo { int i; }; void func(foo*); int main() { foo f; func(&f); } Clang produces a description of 'foo' with one member 'i' and one subprogram 'foo' (the ctor) for which there is no definition. GCC only describes 'i' and does not describe a ctor 'foo'. Both compilers, given code such as: void func(foo&(foo::*)(const foo&)); ... func(&foo::operator=); produce debug info describing the type 'foo' with a subprogram 'operator=', the implicit default copy assignment operator. As a minor note - when Clang (either rightly or wrongly) emits the declarations of these implicit members, it provides file and line numbers for those declarations. GCC does not. Also, Clang produces "DW_AT_accessibility" on all members, even with the default accessibility of the type. * perhaps we could use the inline debug info metadata description from the frontend to describe these inlinings to produce more complete debug info
Similar to the previous point about special members, Clang emits the declaration of all template specializations, even those that are never codegen'd as in this example: struct foo { template<typename T> void func() { } }; inline void func() { foo().func<int>(); } int main() { foo f; f.func<float>(); } Since 'func' is never codegen'd (as it is never called), func<int> is never generated either - Clang still emits a debug info description for both func<int> and func<float>. GCC only emits func<float>. This issue and the implicit special members can both be addressed in a similar/unified way - don't emit those declarations when emitting the type, but add them after the fact if we end up codegening them.
Probable GCC bug is producing small debug info when it comes to stream-related code such as: #include <fstream> int main() { std::ifstream f; return f.bad(); } The total object file size from GCC is nearly 1/4th that of Clang, but that's because GCC only produced debug info describing the /declaration/ of basic_ifstream<char>. It didn't describe any of the members (including the 'bad()' function called there). Clang does describe this type (& its dependencies) in full, which seems appropriate. I haven't narrowed this down fully, but it seems to be related to libstdc++'s use of extern template as follows: extern template class basic_ifstream<char>; if that line is not present, the full type information is emitted just like Clang. This might seem sort of reasonable, if the full definition of basic_ifstream<char> was emitted in the debug-built binaries of libstdc++, but so far as I can tell, it is not (I only see another declaration there). But perhaps I've done something wrong in that investigation. If I can come up with a standalone repro, I'll post those details.
A little more detail on the fstream issue. I've simplified it down to: struct a { }; template<typename T> struct b : virtual a { void func() { } }; extern template class b<int>; int main() { b<int> x; x.func(); } In GCC's DWARF: 'b' is emitted as a declaration but it contains the DW_TAG_subprograms of 'b's ctor and dtor (if those are explicit, rather than implicit, it does not include them). If the "extern template class b<int>" is proceeded by "template class b<int>" then the debug info for 'func' is emitted (I'm not sure what/where/why the problem exists for this in libstdc++'s debug symbol builds - I may've looked at the wrong libs or in the wrong way, or linked to the wrong ones, etc...)
Referring to comment 67, looking at gcc 4.6 behaviour, gcc does ship all the debug info for an extern template where it's instantiated, not where it's used. So this program: #include <fstream> int main() { std::ifstream f; return f.bad(); } builds a binary that does not have debug info for ifstream, but its debug info can be found in the libstdc++ .so instead.
(In reply to comment #64) > First semi-random example of differences between Clang & GCC: > > int func(void (*)()); > int func(int); > > struct foo { > enum { ID = 42 }; > static void bar(); > }; > > int i = func(foo::bar); // neither emit 'foo' > // int j = func(foo::ID); // both emit 'foo' > > struct bar { > bar() { > func(foo::bar); // neither emit 'foo' > // func(foo::ID); // Clang emits 'foo', GCC does not > } > } b; > > This affects all the "ChromeUtilityHostMsg" objects that use a registration > system not unlike foo+bar here. See > "ChromeUtilityHostMsg_ParseJSON_Succeeded" for example. GCC doesn't produce > any debug info for that type, even though the ctor called from the global > initializer for > g_LoggerRegisterHelperChromeUtilityHostMsg_ParseJSON_Succeeded references > the type (both the ID enum, and the Log builder function) > > (side note: why do you have the extra > LoggerRegisterHelperChromeUtilityHostMsg_ParseJSON_Succeeded class? Why not > have a generic registration type with ctor parameters: LoggerRegisterHelper > g_LoggerRegisterHelper(ChromeUtilityHostMsg_ParseJSON_Succeeded::ID, > ChromeUtilityHostMsg_ParseJSON_Succeeded::Log); ? that would reduce Clang's > debug info (you'd drop one out of every pair of these types), though it'd > increase GCC's because it would now actually start emitting the debug info > for ChromeUtilityHostMsg_ParseJSON_Succeeded) Worse than this - even if you switch out the registration system for something more like what I described, something else seems to be getting in the way: struct base { virtual void func() { } }; struct foo: base { enum { ID }; }; struct reg { reg(int); }; reg r(foo::ID); GCC emits a declaration for 'foo', not a definition (so GCC's debug info doesn't mention 'base' at all. The absence of a virtual function in 'base' causes this not to happen & the full definition of 'foo' to be emitted, including the DW_AT_inheritance refering to 'base', and the complete 'base' definition. Given the absence of any key function here, I'm not sure what GCC's deal is. These are polymorphic classes that may be seen in no other TU than this one where their code must be emitted (due to the presence of virtual functions). I don't know how GCC could rely on these types to be emitted in any other TU - and if not, this seems like another GCC bug. Perhaps someone from GCC can explain how/why this is correct & what the logic is here & we can implement it, but for now I'm suspicious. I'd like to find a way around this bug or optimization so I can compare Clang v GCC debug info sans this issue & see how much it contributes to the problems with this TU (my theory is that it's the major issue/difference in this TU - the dependencies brought in via the base class & parameter types of these registration objects are substantial & there are many of them) but I'm not entirely sure how to do that. I'll keep experimenting.
So - a theory to explain Comments 64, 67-70: GCC is emitting type information only when the vtable (& other virtual 'stuff' - such as virtual base handling) is emitted for any type that has a vtable. That's why this looks weird for the explicit template instantiation case - it's not specifically targeting that case, it just gets it for free because the type then ends up with a key function (at least one out of line virtual function) & thus the vtable isn't emitted at the call site. This is a fairly sound idea (one we've considered implementing before) though there are some trivial ways it can fail (whether or not such code is likely to occur in the wild is open for debate): struct foo { virtual ~foo(); static void func(); }; int main() { foo::func(); } in this case, the virtual functions of 'foo' may never be defined - since no instance of 'foo' is ever constructed, this is valid. Yet the referenced static member will still be uncallable from the debugger because GCC emits no mention of teh 'foo' type at all.
Just while I've got this here, I've prototyped the "only emit debug info for class definitions along with the vtable for any type with a vtable" idea and here are some numbers: with the original logging TU: Strings with baseline Clang: 30919 Strings with improved Clang: 7413 Strings with GCC 4.7: 6601 If we change the registration system so GCC doesn't win by avoiding the intermediate registration types (due to the ID reference being ignored by GCC): Strings with improved Clang: 6068 Strings with GCC 4.7: 5560 So we get pretty close.
I'm going to consider this resolved by r188576 as it reduces the debug info for the example down to If you have other particularly bad examples, please file new bugs/data & I'll investigate. GCC 4.7: 3034 .debug_str 359705 3028 .debug_info 349411 3030 .debug_loc 146400 3033 .debug_line 43328 3032 .debug_ranges 24592 3031 .debug_aranges 24224 3029 .debug_abbrev 2854 Clang (old): 2885 .debug_str 1807241 2878 .debug_info 1121994 2882 .debug_line 88887 2879 .debug_abbrev 2445 Clang (r188576): 2878 .debug_info 356859 2885 .debug_str 331528 2882 .debug_line 82402 2879 .debug_abbrev 1258
I finally had the opportunity to check this. With current trunk chromium, gcc 4.6, and clang r192635, a (statically linked) debug build of chrome is 1.9GB with gcc and 1.3GB with clang. Incremental build times after touching a file in the ui/ directory is 3m55s with clang and 4m02s with gcc. So this looks much better now, thanks!
Awesome! thanks for all your (everyone on this bug) help with repros, reductions, etc. Sorry it was a bit of a while coming.
Woot.
As a follow-up (probably not related to this bug): I also measure incremental build times with a component build, and clang is ~10% slower than gcc 4.6 for a full build of chrome (24min vs 22min), and still a good deal slower in incremental builds (1.6s vs 1.2s). So generally I feel that more perf work is needed on linux.
OK, interesting. If you ever get any information for it let me know, otherwise I'll see what I can work up as I get to it.
As an update, a lot of things have been improved and we're seeing build and link times (including incremental) under gcc at this point. Thanks again everyone!