Not so long ago I went to effectively recompile NetworkManager and fix up minor bug in it. It built fine across all architectures, was considered to be installable etc. And I was expecting it to just migrate across. At the time, glibc was at 2.26 in artful-proposed and NetworkManager was built against it. However release pocket was at glibc 2.24. In Ubuntu we have a ProposedMigration process in place which ensures that newly built packages do not regress in the number of architectures built for; installable on; and do not regress themselves or any reverse dependencies at runtime.
Thus before my build of NetworkManager was considered for migration, it was tested in the release pocket against packages in the release pocket. Specifically, since package metadata only requires glibc 2.17 NetworkManager was tested against glibc currently in the release pocket, which should just work fine....
Thus before my build of NetworkManager was considered for migration, it was tested in the release pocket against packages in the release pocket. Specifically, since package metadata only requires glibc 2.17 NetworkManager was tested against glibc currently in the release pocket, which should just work fine....
autopkgtest [21:47:38]: test nm: [-----------------------
test_auto_ip4 (__main__.ColdplugEthernet)
ethernet: auto-connection, IPv4 ... FAIL ----- NetworkManager.log -----
NetworkManager: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.25' not found (required by NetworkManager)
At first I only saw failing tests, which I thought is transient failure. Thus they were retried a few times. Then I looked at the autopkgtest log and saw above error messages. Perplexed, I have started a lxd container with ubuntu artful, enabled proposed and installed just network-manager from artful-proposed and indeed a simple `NetworkManager --help` failed with above error from linker.
I am too young to know what dependency-hell means, since ever since I used Linux (Ubuntu 7.04) all glibc symbols were versioned, and dpkg-shlibdeps would generate correct minimum dependencies for a package. Alas in this case readelf confirmed that indeed /usr/sbin/NetworkManager requires 2.25 and dpkg depends is >= 2.17.
Further reading readelf output I checked that all of the glibc symbols used are 2.17 or lower, and only the "Version needs section '.gnu.version_r'" referenced GLIBC_2.25 symbol. Inspecting dpkg-shlibdeps code I noticed that it does not parse that section and only searches through the dynamic symbols used to establish the minimum required version.
Things started to smell fishy. On one hand, I trust dpkg-shlibdeps to generate the right dependencies. On the other hand I also trust linker to not tell lies either. Hence I opened a Debian BTS bug report about this issue.
At this point, I really wanted to figure out where the reference to 2.25 comes from. Clearly it was not from any private symbols as then the reference would be on 2.26. Checking glibc abi lists I found there were only a handful of symbols marked as 2.25
$ grep 2.25 ./sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
GLIBC_2.25 GLIBC_2.25 A
GLIBC_2.25 __explicit_bzero_chk F
GLIBC_2.25 explicit_bzero F
GLIBC_2.25 getentropy F
GLIBC_2.25 getrandom F
GLIBC_2.25 strfromd F
GLIBC_2.25 strfromf F
GLIBC_2.25 strfroml F
Blindly grepping for these in network-manager source tree I found following:
$ grep explicit_bzero -r configure.ac src/
configure.ac: explicit_bzero],
src/systemd/src/basic/string-util.h:void explicit_bzero(void *p, size_t l);
src/systemd/src/basic/string-util.c:void explicit_bzero(void *p, size_t l) {
src/systemd/src/basic/string-util.c: explicit_bzero(x, strlen(x));
First of all it seems like network-manager includes a partial embedded copy of systemd. Secondly that code is compiled into a temporary library and has autconf detection logic to use explicit_bzero. It also has an embedded implementation of explicit_bzero when it is not available in libc, however it does not have FORTIFY_SOURCES implementation of said function (__explicit_bzero_chk) as was later pointed out to me. And whilst this function is compiled into an intermediary noinst library, no functions that use explicit_bzero are used in the end by NetworkManger binary. To proof this, I've dropped all code that uses explicit_bzero, rebuild the package against glibc 2.26, and voila it only had Version reference on glibc 2.17 as expected from the end-result usage of shared symbols.
At this point toolchain bug was a suspect. It seems like whilst explicit_bzero shared symbol got optimised out, the version reference on 2.25 persisted to the linked binaries. At this point in the archive a snapshot version of binutils was in use. And in fact forcefully downgrading bintuils resulted in correct compilation / versions table referencing only glibc 2.17.
Mathias then took over a tarball of object files and filed upstream bug report against bintuils: "[2.29 Regression] ld.bfd keeps a version reference in .gnu.version_r for symbols which are optimized out". The discussion in that bug report is a bit beyond me as to me binutils is black magic. All I understood there was "we moved sweep and pass to another place due to some bugs", doing that introduced this bug, thus do multiple sweep and passes to make sure we fix old bugs and don't regress this either. Or something like that. Comments / Better description of the bintuils fix are welcomed.
Binutils got fixed by upstream developers, cherry-picked into debian, and ubuntu, network-manager got rebuild and everything is wonderful now. However, it does look like unused / deadend code paths tripped up optimisations in the toolchain which managed to slip by distribution package dependency generation and needless require a higher up version of glibc. I guess the lesson here is do not embed/compile unused code. Also I'm not sure why network-manager uses networkd internals like this, and maybe systemd should expose more APIs or serialise more state into /run, as most other things query things over dbus, private socket, or by establishing watches on /run/systemd/netif. I'll look into that another day.
Thanks a lot to Guillem Jover, Matthias Klose, Alan Modra, H.J. Lu, and others for getting involved. I would not be able to raise, debug, or fix this issue all by myself.
Comments
Post a Comment