Skip to main content

An interesting bug - network-manager, glibc, dpkg-shlibdeps, systemd, and finally binutils

Not so long ago I went to effectively recompile NetworkManager and fix up minor bug in it. It built fine across all architectures, was considered to be installable etc. And I was expecting it to just migrate across. At the time, glibc was at 2.26 in artful-proposed and NetworkManager was built against it. However release pocket was at glibc 2.24. In Ubuntu we have a ProposedMigration process in place which ensures that newly built packages do not regress in the number of architectures built for; installable on; and do not regress themselves or any reverse dependencies at runtime.

Thus before my build of NetworkManager was considered for migration, it was tested in the release pocket against packages in the release pocket. Specifically, since package metadata only requires glibc 2.17 NetworkManager was tested against glibc currently in the release pocket, which should just work fine....
autopkgtest [21:47:38]: test nm: [-----------------------
test_auto_ip4 (__main__.ColdplugEthernet)
ethernet: auto-connection, IPv4 ... FAIL ----- NetworkManager.log -----
NetworkManager: /lib/x86_64-linux-gnu/ version `GLIBC_2.25' not found (required by NetworkManager)
At first I only saw failing tests, which I thought is transient failure. Thus they were retried a few times. Then I looked at the autopkgtest log and saw above error messages. Perplexed, I have started a lxd container with ubuntu artful, enabled proposed and installed just network-manager from artful-proposed and indeed a simple `NetworkManager --help` failed with above error from linker.

I am too young to know what dependency-hell means, since ever since I used Linux (Ubuntu 7.04) all glibc symbols were versioned, and dpkg-shlibdeps would generate correct minimum dependencies for a package. Alas in this case readelf confirmed that indeed /usr/sbin/NetworkManager requires 2.25 and dpkg depends is >= 2.17.

Further reading readelf output I checked that all of the glibc symbols used are 2.17 or lower, and only the "Version needs section '.gnu.version_r'" referenced GLIBC_2.25 symbol. Inspecting dpkg-shlibdeps code I noticed that it does not parse that section and only searches through the dynamic symbols used to establish the minimum required version.

Things started to smell fishy. On one hand, I trust dpkg-shlibdeps to generate the right dependencies. On the other hand I also trust linker to not tell lies either. Hence I opened a Debian BTS bug report about this issue.

At this point, I really wanted to figure out where the reference to 2.25 comes from. Clearly it was not from any private symbols as then the reference would be on 2.26. Checking glibc abi lists I found there were only a handful of symbols marked as 2.25
$ grep 2.25 ./sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
GLIBC_2.25 GLIBC_2.25 A
GLIBC_2.25 __explicit_bzero_chk F
GLIBC_2.25 explicit_bzero F
GLIBC_2.25 getentropy F
GLIBC_2.25 getrandom F
GLIBC_2.25 strfromd F
GLIBC_2.25 strfromf F
GLIBC_2.25 strfroml F
Blindly grepping for these in network-manager source tree I found following:
$ grep explicit_bzero -r src/ explicit_bzero],
src/systemd/src/basic/string-util.h:void explicit_bzero(void *p, size_t l);
src/systemd/src/basic/string-util.c:void explicit_bzero(void *p, size_t l) {
src/systemd/src/basic/string-util.c:        explicit_bzero(x, strlen(x));
First of all it seems like network-manager includes a partial embedded copy of systemd. Secondly that code is compiled into a temporary library and has autconf detection logic to use explicit_bzero. It also has an embedded implementation of explicit_bzero when it is not available in libc, however it does not have FORTIFY_SOURCES implementation of said function (__explicit_bzero_chk) as was later pointed out to me. And whilst this function is compiled into an intermediary noinst library, no functions that use explicit_bzero are used in the end by NetworkManger binary. To proof this, I've dropped all code that uses explicit_bzero, rebuild the package against glibc 2.26, and voila it only had Version reference on glibc 2.17 as expected from the end-result usage of shared symbols.

At this point toolchain bug was a suspect. It seems like whilst explicit_bzero shared symbol got optimised out, the version reference on 2.25 persisted to the linked binaries. At this point in the archive a snapshot version of binutils was in use. And in fact forcefully downgrading bintuils resulted in correct compilation / versions table referencing only glibc 2.17.

Mathias then took over a tarball of object files and filed upstream bug report against bintuils: "[2.29 Regression] ld.bfd keeps a version reference in .gnu.version_r for symbols which are optimized out". The discussion in that bug report is a bit beyond me as to me binutils is black magic. All I understood there was "we moved sweep and pass to another place due to some bugs", doing that introduced this bug, thus do multiple sweep and passes to make sure we fix old bugs and don't regress this either. Or something like that. Comments / Better description of the bintuils fix are welcomed.

Binutils got fixed by upstream developers, cherry-picked into debian, and ubuntu, network-manager got rebuild and everything is wonderful now. However, it does look like unused / deadend code paths tripped up optimisations in the toolchain which managed to slip by distribution package dependency generation and needless require a higher up version of glibc. I guess the lesson here is do not embed/compile unused code. Also I'm not sure why network-manager uses networkd internals like this, and maybe systemd should expose more APIs or serialise more state into /run, as most other things query things over dbus, private socket, or by establishing watches on /run/systemd/netif. I'll look into that another day.

Thanks a lot to Guillem Jover, Matthias Klose, Alan Modra, H.J. Lu, and others for getting involved. I would not be able to raise, debug, or fix this issue all by myself.


Popular posts from this blog

How to disable TLS 1.0 and TLS 1.1 on Ubuntu

Example of website that only supports TLS v1.0, which is rejected by the client Overivew TLS v1.3 is the latest standard for secure communication over the internet. It is widely supported by desktops, servers and mobile phones. Recently Ubuntu 18.04 LTS received OpenSSL 1.1.1 update bringing the ability to potentially establish TLS v1.3 connections on the latest Ubuntu LTS release. Qualys SSL Labs Pulse report shows more than 15% adoption of TLS v1.3. It really is time to migrate from TLS v1.0 and TLS v1.1. As announced on the 15th of October 2018 Apple , Google , and Microsoft will disable TLS v1.0 and TLS v1.1 support by default and thus require TLS v1.2 to be supported by all clients and servers. Similarly, Ubuntu 20.04 LTS will also require TLS v1.2 as the minimum TLS version as well. To prepare for the move to TLS v1.2, it is a good idea to disable TLS v1.0 and TLS v1.1 on your local systems and start observing and reporting any websites, systems and applications that

Ubuntu 23.10 significantly reduces the installed kernel footprint

Photo by Pixabay Ubuntu systems typically have up to 3 kernels installed, before they are auto-removed by apt on classic installs. Historically the installation was optimized for metered download size only. However, kernel size growth and usage no longer warrant such optimizations. During the 23.10 Mantic Minatour cycle, I led a coordinated effort across multiple teams to implement lots of optimizations that together achieved unprecedented install footprint improvements. Given a typical install of 3 generic kernel ABIs in the default configuration on a regular-sized VM (2 CPU cores 8GB of RAM) the following metrics are achieved in Ubuntu 23.10 versus Ubuntu 22.04 LTS: 2x less disk space used (1,417MB vs 2,940MB, including initrd) 3x less peak RAM usage for the initrd boot (68MB vs 204MB) 0.5x increase in download size (949MB vs 600MB) 2.5x faster initrd generation (4.5s vs 11.3s) approximately the same total time (103s vs 98s, hardware dependent) For minimal cloud images that do not in

Ubuntu Livepatch service now supports over 60 different kernels

Linux kernel getting a livepatch whilst running a marathon. Generated with AI. Livepatch service eliminates the need for unplanned maintenance windows for high and critical severity kernel vulnerabilities by patching the Linux kernel while the system runs. Originally the service launched in 2016 with just a single kernel flavour supported. Over the years, additional kernels were added: new LTS releases, ESM kernels, Public Cloud kernels, and most recently HWE kernels too. Recently livepatch support was expanded for FIPS compliant kernels, Public cloud FIPS compliant kernels, and as well IBM Z (mainframe) kernels. Bringing the total of kernel flavours support to over 60 distinct kernel flavours supported in parallel. The table of supported kernels in the documentation lists the supported kernel flavours ABIs, the duration of individual build's support window, supported architectures, and the Ubuntu release. This work was only possible thanks to the collaboration with the Ubuntu C