Introduction

Gentoo Linux is a cool flavor of GNU/Linux that is world-known as a distribution building all packages from source code. Latest version of GCC, version 4.9.0, was released few days ago and this blog tests maturity of Link-Time optimization in the GNU Compiler Collection. My post was inspired by Nikos Chantziaras who wrote similar post about 2 years ago. If you interested how to set-up LTO for a Gentoo installation, please follow this link. Apart from creation of package list that cannot be built with LTO, I spent some time with investigation of problems that break compilation of these packages.

Gentoo System

My Gentoo is a QEMU KVM virtual machine utilizing my i7-4770 CPU with following run options:

qemu-kvm -cpu host,level=9 -smp cores=8 -vga std -hda vdisk.img -boot d -m 4096 -net nic,vlan=1 -net user,vlan=1 -redir tcp:2222::22

System consists of 802 packages that cover all problematic packages listed in aforementioned blog, covering both essential libraries for KDE and GNOME desktop environment. For a complete list of installed packages and corresponding versions, please follow this link. For everyone interested in my USE flags, download my make.conf.

Installation

At time of writing this post, I installed GCC from source code. I took configure options from stable Gentoo package and simply run: ./configure [options] && make && make install. After that, tell your Gentoo system about a new compiler by creation of file (/etc/env.d/gcc/x8664-pc-linux-gnu-4.9.0) in /etc/env.d/gcc. Last step is to switch default compiler by running _gcc-config command. Nevertheless, preferable way would be usage of package from Gentoo overlay.

To make the toolchain really up-to-date, I prefer to install latest binutils, another essential part of LTO tool-chain. Important to notice, there is still need to either use LTO wrappers for nm, ar and ranlib. I really recommend Markus' patch for binutils, you will prevent any problems related to correct loading of LTO plug-in. The patch has been just applied, use latest bintuils. I used standard Gentoo package binutils-9999 installed in following steps:

emerge =binutils-9999
ln -s /usr/libexec/gcc/x86_64-pc-linux-gnu/4.9.0/liblto_plugin.so.0.0.0 /usr/x86_64-pc-linux-gnu/binutils-bin/lib/bfd-plugins

Applied patch causes binutils to automatically load LTO plug-in, where ln command creates a symlink to default plug-in folder.

Problematic packages

There are 30 packages that suffer from diverse problems and cannot be built with -flto. Full list of packages:

app-admin/gam-server
app-crypt/mit-krb5
app-emulation/virtualbox
app-text/rarian
dev-lang/perl
dev-lang/ruby
dev-libs/elfutils
dev-python/notify-python
dev-python/numpy
dev-qt/qtscript
dev-qt/qtwebkit
dev-tex/luatex
dev-vcs/cvs
media-libs/alsa-lib
media-libs/x264
media-sound/pulseaudio
media-sound/wavpack
media-video/ffmpeg
media-video/libav
media-video/mplayer2
sys-apps/hwinfo
sys-apps/pciutils
sys-devel/llvm
www-client/chromium
www-client/firefox
x11-base/xorg-server
x11-drivers/xf86-video-intel
x11-libs/cairo
x11-libs/wxGTK
sys-libs/glibc

Problem analysis

These packages has many common issued that block proper compilation:

Configuration scripts

  • dev-vcs/cvs - Configure script checks whether a function exists with pointer equality. Link-Time optimization proves these pointers are equal and optimizes out these symbols, for more details please read LTO FAQ.
  • dev-lang/ruby - Very similar problem, I created a bug #9692 that was fixed.
  • app-text/rarian - Execution of nm_test_func fails. GCC, starting from version 4.9.0, does not generate fat objects files (GCC 4.9 Release Series). Thus, no assembly output is presented in objects files. As a result, assembly tools like nm cannot be used e.g. for symbol assembly extraction. For complete magic related to check, follow pastebin snippet.
  • dev-python/notify-python - Likewise.
  • media-sound/wavpack - Likewise.
  • notify-python - Likewise.
  • x11-libs/wxGTK - Similar problem during checking for thr_setconsurrency.
  • x11-libs/cairo - Configure script compiles a source file with a float constant having magic number. After that, 'noonsees' is searched in the object file. If your machine is big endian, check succeeds. Again, existence of slim object file breaks the test. Cairo package builds with LTO internally, so even if you add the package to no-lto group, it is built with Link-Time optimization.
  • media-libs/x264 - Likewise.

As you can see, aforementioned problems can be simply fixed by adding -fno-lto automatically to autoconf pass, Markus Tripperlsdorf suggested this solution on autoconf mailing list. On the other hand, every single problematic check must be fixed by hand. To make matters even worse, autoconf configure scripts are often a copy residing in version control system.

Assembly usage

  • dev-tex/luatex - With a compiler, one can combine source code and assembly language (Assembler Instructions with C expression Operands). Except of assembly code generation, compiler does not parse and understand assembler statements. If there's a constant symbol used in e.g. top-level assembler, GCC can't connect these symbols together. Thus, during LTRANS partitioning, these symbols can go to different partitions and linker error occures. For more detail information, follow link.
  • media-video/libav - Likewise.
  • media-video/ffmpeg - Likewise.
  • media-video/mplayer2 - Likewise.
  • ses-devel/llvm - Likewise.

Following group of tests suffer from linker issues, I haven't had time for more detail investigation. I think part of these packages cannot be built because of missing __attribute__ ((used)).

  • dev-lang/perl
  • sys-apps/pciutils
  • dev-libs/elfutils
  • dev-qt/qtscript
  • dev-qt/qtwebkit

Others

  • x11-drivers/xf86-video-intel - For a function marked with __attribute__ ((flatten)), every call (recursively) inside the function is inlined. With LTO, the compiler can process entire program analysis and the attribute can lead to extreme number of inlined functions. Before the compiler was killed, it performed inlining of about 3 million functions; I reported a bug #77580.
  • media-sound/pulseaudio - dll_open related problem.
  • x11-base/xorg-server - Array bounds check is hit, there is a bug #71127.
  • www-client/firefox - Firefox is almost ready for LTO, but there are audio/video codecs libraries that incorrectly use assembly symbols. Jan Hubička wrote very detail post about Firefox.
  • www-client/chromium - Chromium relates on gold linker as a part of source repository. Apart from that, there are also translation units that must be compiler without LTO.
  • app-crypt/mit-krb5 - Package contains a conflicting name for a variable link (conflict with: /usr/include/unistd.h:812:12: error: variable ‘link’ redeclared as function). I have created a pull request for krb5 Github repository. Update: Eventually, the problem is caused by GCC which delays symbol renaming: PR61012.
  • app-admin/gam-server - Very similar issue, static variable 'socket' is redeclared. I sent email about the bug to corresponding mailing list. Update: Likewise.
  • app-emulation/virtualbox - Not investigated yet.
  • media-libs/alsa-lib - Likewise.
  • sys-apps/hwinfo - Likewise.
  • dev-python/numpy - Likewise.
  • sys-libs/glibc - Likewise.

Package unrelated problems

  • dev-qt/qtcore-4.8.5-r1:4 + kde-base/kdelibs-4.11.5:4/4.11 - If I enable ld.gold to link following two packages, there's an infinite loop observed in meinproc4. BFD does not suffer from the issue observable even with -O0 and -fno-lto. I will create an issue after I understand the strange behavior.
  • There are a few packages, build with LTO, that have disproportionately big first ELF section. Looks the problem is not related to specific linker, both BFD and gold behave the same.
$ readelf -S /usr/bin/gst-launch-1.0
There are 26 section headers, starting at offset 0x207740:

Section Headers:
[Nr] Name              Type             Address           Offset
Size              EntSize          Flags  Link  Info  Align
[ 0]                   NULL             0000000000000000  00000000
0000000000000000  0000000000000000           0     0     0
[ 1] .interp           PROGBITS         0000000000400200  00200200
000000000000001c  0000000000000000   A       0     0     1
[ 2] .note.ABI-tag     NOTE             000000000040021c  0020021c
0000000000000020  0000000000000000   A       0     0     4
[ 3] .hash             HASH             0000000000400240  00200240
0000000000000434  0000000000000004   A       4     0     8
[ 4] .dynsym           DYNSYM           0000000000400678  00200678
0000000000000cc0  0000000000000018   A       5     1     8
[ 5] .dynstr           STRTAB           0000000000401338  00201338
0000000000000b2f  0000000000000000   A       0     0     1
[ 6] .gnu.version      VERSYM           0000000000401e68  00201e68
...

$ ls -l /usr/bin/gst-launch-1.0
-rwxr-xr-x 1 root root 2129344 Apr 23 11:39 /usr/bin/gst-launch-1.0

Update: I found out that the binary is prolonged by paxctl command (paxctl -qCm /usr/bin/gst-launch-1.0). I am not familiar with this tool, but LTO does not contribute to the initial ELF section expansion.

Statistics

To present statistical data about size reduction impact of LTO, I created a python script that simply walks all ELF binaries located at $PATH and all shared libraries in folders /lib64/ and /usr/lib64. According to data presented in following tables, binaries shrink by 10.6% (41 MB) and shared libraries by 3.3% (10 MB). As you can see, Link-Time optimization can reach better results for executables that have (in general) less globally visible symbols. For both categories, I've chosen the most interesting results (complete statistics can be found here).

ELF executables

Binary name no-LTO size LTO size Fraction
/usr/bin/jsc-1 2154080 61104 2.84%
/usr/bin/git-shell 692152 25552 3.69%
/usr/bin/sapWatch 623640 33088 5.31%
/usr/bin/MPEG2TransportStreamIndexer 605144 37632 6.22%
/usr/bin/testRelay 623384 41856 6.71%
/usr/bin/testReplicator 623576 44224 7.09%
/usr/bin/testMPEG2TransportStreamTrickPlay 606104 44432 7.33%
/usr/bin/testMPEG1or2Splitter 605016 47960 7.93%
/usr/bin/testH264VideoToTransportStream 604888 48000 7.94%
/usr/bin/testMPEG1or2ProgramToTransportStream 604824 52056 8.61%
/sbin/thin_dump 2215560 228432 10.31%
/usr/bin/testMPEG1or2VideoReceiver 623704 64704 10.37%
/usr/bin/testMP3Receiver 623704 64704 10.37%
/usr/bin/testMPEG2TransportStreamer 624408 68864 11.03%
/usr/bin/testMP3Streamer 624408 69056 11.06%
/sbin/thin_restore 2215544 253128 11.43%
/usr/bin/testMPEG1or2VideoStreamer 624408 77376 12.39%
/sbin/thin_repair 2215520 299232 13.51%
/usr/bin/testMPEG1or2AudioVideoStreamer 624472 97864 15.67%
/usr/bin/wpa_passphrase 36096 5816 16.11%
/usr/bin/testAMRAudioStreamer 624536 101808 16.30%
/usr/bin/l2ping 66720 10960 16.43%
/usr/bin/testWAVAudioStreamer 625176 105968 16.95%
/usr/bin/testDVVideoStreamer 624536 105968 16.97%
/sbin/thin_rmap 466400 85576 18.35%
/usr/bin/testH264VideoStreamer 624536 114800 18.38%
/usr/bin/testMPEG4VideoStreamer 624472 118320 18.95%
/usr/bin/vobStreamer 626136 134896 21.54%
/usr/bin/ciptool 111632 29128 26.09%
/usr/bin/rfcomm 75856 20200 26.63%
/sbin/thin_check 434232 118368 27.26%
/usr/bin/testRTSPClient 629400 172656 27.43%
/usr/bin/rctest 114208 31600 27.67%
/usr/bin/lyxclient 422744 122760 29.04%
/usr/bin/testMPEG4VideoToDarwin 624856 196080 31.38%
/usr/bin/l2test 83232 26264 31.56%
/usr/sbin/pppgetpass.vt 8280 8472 102.32%
/usr/bin/ilbmtoppm 34616 35496 102.54%
/usr/bin/kross 15624 16024 102.56%
/usr/bin/kconfig_compiler 118912 122608 103.11%
/usr/bin/kmailservice 8296 8568 103.28%
/usr/bin/ktelnetservice 15768 16424 104.16%
/usr/sbin/ab 46664 48768 104.51%
/usr/bin/swig 1431120 1500392 104.84%
/usr/bin/okular 66184 70280 106.19%
/usr/bin/meinproc4 37544 40608 108.16%
/usr/bin/gif2tiff 13544 15040 111.05%
/usr/bin/kcachegrind 890184 1023024 114.92%
/usr/bin/htop 124904 152296 121.93%
/usr/bin/node 6916664 8764904 126.72%
SUMMARY (of 1688 binaries) 387.0 MB 346.1 MB 89.43%

Shared libraries

Shared library name no-LTO binary size LTO size Fraction
/usr/lib64/libsandbox.so 72120 16064 22.27%
/usr/lib64/libboost_wave.so.1.52.0 1196640 632872 52.89%
/usr/lib64/libboost_program_options.so.1.52.0 496008 270224 54.48%
/usr/lib64/libgirepository-1.0.so.1.0.0 206424 120272 58.26%
/usr/lib64/libboost_wserialization.so.1.52.0 334168 201520 60.30%
/usr/lib64/libboost_thread.so.1.52.0 176000 108016 61.37%
/usr/lib64/libgnutls-extra.so.26.22.6 34136 21248 62.25%
/usr/lib64/libsbc.so.1.1.0 64016 40856 63.82%
/usr/lib64/libboost_unit_test_framework.so.1.52.0 702112 453160 64.54%
/usr/lib64/libboost_serialization.so.1.52.0 462904 299592 64.72%
/usr/lib64/libicule.so.51.2 524104 346992 66.21%
/usr/lib64/libsmbclient.so.0 5904400 3934144 66.63%
/usr/lib64/libboost_locale.so.1.52.0 981776 672160 68.46%
/usr/lib64/libwbclient.so.0 67888 47104 69.38%
/usr/lib64/libnetapi.so.0 6592792 4662200 70.72%
/usr/lib64/libboost_date_time.so.1.52.0 71552 50872 71.10%
/lib64/libblkid.so.1.1.0 206568 147336 71.33%
/usr/lib64/libboost_graph.so.1.52.0 363624 260904 71.75%
/usr/lib64/libinproctrace.so 38752 28328 73.10%
/usr/lib64/libiculx.so.51.2 62224 45872 73.72%
/usr/lib64/libattica.so.0.4.2 1020296 753632 73.86%
/lib64/libudev.so.1.4.0 79832 59184 74.14%
/usr/lib64/libGLU.so.1.3.1 526216 390768 74.26%
/usr/lib64/libboost_iostreams.so.1.52.0 106728 79656 74.63%
/lib64/libaio.so.1.0.1 3984 2992 75.10%
/usr/lib64/libpipeline.so.1.2.5 52368 39824 76.05%
/lib64/libmount.so.1.1.0 222824 169824 76.21%
/usr/lib64/libdbusmenu-qt.so.2.6.0 200184 155520 77.69%
/usr/lib64/libboost_signals.so.1.52.0 95000 73824 77.71%
/usr/lib64/libstreamanalyzer.so.0.7.8 527632 413168 78.31%
/usr/lib64/libboost_math_tr1l.so.1.52.0 250968 199000 79.29%
/usr/lib64/libmp4v2.so.2.0.0 975072 777440 79.73%
/usr/lib64/libboost_python-2.7.so.1.52.0 316264 253360 80.11%
/usr/lib64/libboost_math_tr1.so.1.52.0 250264 200488 80.11%
/usr/lib64/libknewstuff2.so.4.11.5 408864 434296 106.22%
/usr/lib64/libmdigest.so.1.0 37248 39624 106.38%
/usr/lib64/libkmediaplayer.so.4.11.5 38200 40784 106.76%
/usr/lib64/libknotifyconfig.so.4.11.5 73576 79240 107.70%
/usr/lib64/libkfile.so.4.11.5 707824 762848 107.77%
/usr/lib64/libkdeinit4_kio_http_cache_cleaner.so 48776 52952 108.56%
/usr/lib64/libthreadweaver.so.4.11.5 94056 102576 109.06%
/usr/lib64/libkutils.so.4.11.5 6920 7592 109.71%
/usr/lib64/libkhtml.so.5.11.5 8475712 9339944 110.20%
/usr/lib64/libkimproxy.so.4.11.5 70648 78448 111.04%
/usr/lib64/libkjs.so.4.11.5 880976 985072 111.82%
/usr/lib64/libkdeinit4_kded4.so 71072 79888 112.40%
/usr/lib64/libkdeclarative.so.5.11.5 61688 70272 113.92%
/usr/lib64/libkdeinit4_klauncher.so 87024 100000 114.91%
/usr/lib64/libsolid.so.4.11.5 1054680 1249792 118.50%
/usr/lib64/libkidletime.so.4.11.5 62584 75256 120.25%
SUMMARY (of 575 libraries) 299.5 MB 289.7 MB 96.72%

Final thoughts

Main intention for writing this post is to popularize Link-Time optimization in GCC. Presented results definitely proof that LTO significantly reduces binary size and majority of Gentoo packages can be built with LTO. Even though I do not benchmark LTO against classical build system, performance speed-up can be seen e.g. in my diploma thesis. Another, more than interesting, source is blog of Jan Hubička who has been intensively writing about Link-Time optimization. I would really welcome any help, not just, from Gentoo community to decrease the number of Open-source software that cannot be built with LTO. Feel free to add packages that I missed.