Mark H Weaver <mhw@netris.org> skribis:
Toggle quote (7 lines)
> On my GuixSD system, DNS lookup failures (due to lack of network) are
> cached for at least 4 minutes, and possibly much longer (I've not had
> the patience to wait until it expires).
>
> Even after restarting nscd, the negative cache entry persists for about
> another 20 seconds.
To reproduce it, I ran ‘guix system vm’ on this:
Toggle snippet (24 lines)
(use-modules (gnu) (guix))
(use-package-modules base)
(operating-system
(host-name "komputilo")
(timezone "Europe/Berlin")
(locale "en_US.UTF-8")
(bootloader (grub-configuration (device "/dev/sdX")))
(file-systems (cons (file-system
(device "root")
(title 'label)
(mount-point "/")
(type "ext4"))
%base-file-systems))
(packages (cons* (canonical-package glibc) %base-packages))
(services (map (lambda (mservice)
(mlet %store-monad ((service mservice))
(if (memq 'nscd (service-provision service))
(nscd-service (nscd-configuration
(debug-level 10)))
mservice)))
%base-services)))
Then I had one console running “tail -f /var/log/nscd.log”, and another
one where I would run “ping www.gnu.org.”
Initially looking up www.gnu.org fails (because the network is down),
and nscd.log indeed shows:
Toggle snippet (6 lines)
Mon Jun 15 21:04:20 2015 - 276: handle_request: request received (Version = 2) from PID 296
Mon Jun 15 21:04:20 2015 - 276: GETAI (www.gnu.org)
Mon Jun 15 21:04:20 2015 - 276: Haven't found "www.gnu.org" in hosts cache!
Mon Jun 15 21:04:20 2015 - 276: add new entry "www.gnu.org" of type GETAI for hosts to cache (first)
For subsequent requests, it shows fewer details, meaning that the cached
failure is used:
Toggle snippet (5 lines)
Mon Jun 15 21:04:25 2015 - 276: handle_request: request received (Version = 2) from PID 297
Mon Jun 15 21:04:25 2015 - 276: GETFDHST
Mon Jun 15 21:04:25 2015 - 276: provide access to FD 6, for hosts
Twenty seconds after the original failure, we see:
Toggle snippet (4 lines)
Mon Jun 15 21:04:40 2015 - 276: pruning hosts cache; time 1434395080
Mon Jun 15 21:04:40 2015 - 276: considering GETAI entry "www.gnu.org", timeout 1434395080
So cache expiry seems to be working as expected.
Now, if we enable networking with “dhclient -v eth0” and see what
happens, we get again a series of lookup failures, even after the
www.gnu.org entry has been removed from cache; apparently new failures
get cached. Eventually, after a bunch of cache prunes (so a few times
20 seconds), lookup succeeds.
The situation is different if you use ‘getent hosts’ instead of ‘ping’–
note that the latter uses getaddrinfo whereas the former uses
gethostbyname{,v6}, which is cached differently. Here we get a
successful lookup within 20 seconds after networking has been put up.
So my guess is that the GETAI cache is racy or something.
I’ve run nscd with Helgrind, which reports a few things (see attached
file), but I suspect most of it are false positives.
Ludo’.