Shepherd: Growing number of user shepherds when relogging

  • Open
  • quality assurance status badge
Details
5 participants
  • bokr
  • Julian Flake
  • Jake
  • Ludovic Courtès
  • Tomas Volf
Owner
unassigned
Submitted by
Jake
Severity
important
Merged with
J
(address . bug-guix@gnu.org)(address . ludovic.courtes@inria.fr)
CAJqVjv_yNT19Svyd_xNVduNduuwZoWRrcGYRuQJ6=g4cmWDSaQ@mail.gmail.com
Hi

I think I'm experiencing a bug in Shepherd since version 1.0.
Whenever I log out and log back in again, my user shepherd from the
previous login session is still present, and a new user shepherd spawns for
the current login session.
So relogging N times results in N+1 user shepherds.

For example, I have relogged 5 times since I last rebooted:

$ herd status root
Status of root:
It is running since 00:30:02 (10 minutes ago).
Main PID: 23450
Command:
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
...

$ pgrep shepherd
1
9891
10777
16417
18510
21960
23450

$ ps aux | grep shepherd
root 1 0.0 0.9 222872 74456 ? Sl Dec15 0:08
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--config /gnu/store/p7al8wd1inwk8f5di2q4llcpd64mjn5q-shepherd.conf
jake 9891 0.0 0.2 75816 23624 ? Ss Dec15 0:04
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
jake 10777 0.0 0.3 76224 24752 ? Ss Dec16 0:03
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
jake 16417 0.0 0.3 75752 24004 ? Ss Dec16 0:02
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
jake 18510 0.0 0.2 75752 23760 ? Ss Dec16 0:01
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
jake 21960 0.0 0.2 114608 22124 ? Ss Dec16 0:00
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
jake 23450 0.0 0.2 114204 21328 ? Ss 00:30 0:00
/gnu/store/mfkz7fvlfpv3ppwbkv0imb19nrf95akf-guile-3.0.9/bin/guile
--no-auto-compile
/gnu/store/nl0w5c7pxxdczqiv4r9iq44al7nd5y5g-shepherd-1.0.0/bin/shepherd
--silent --config /gnu/store/w3l6dmap815mm3qzx77xdazky853adda-shepherd.conf
jake 23672 0.0 0.0 6636 2552 pts/1 S+ 00:32 0:00 grep
--color=auto shepherd

In addition, any daemons managed by the zombie shepherds also persist!

I'm experiencing this on both of my Guix System machines. One is running
GDM and XFCE. The other is running GDM and CWM.
Please let me know if I can provide more information.

Thanks
Jake
Attachment: file
L
L
Ludovic Courtès wrote on 18 Dec 2024 23:35
(name . Jake)(address . jforst.mailman@gmail.com)(address . 74912@debbugs.gnu.org)
87r064ippt.fsf@gnu.org
Hello,

Jake <jforst.mailman@gmail.com> skribis:

Toggle quote (6 lines)
> I think I'm experiencing a bug in Shepherd since version 1.0.
> Whenever I log out and log back in again, my user shepherd from the
> previous login session is still present, and a new user shepherd spawns for
> the current login session.
> So relogging N times results in N+1 user shepherds.

I have a user shepherd via Guix Home and I experience the same problem
(though because I rarely log out it’s not really annoying :-)).

I suspect the problem has to do with how Guix Home determines whether or
not it should launch shepherd, but I haven’t checked yet.

Thanks for reporting the issue,
Ludo’.
T
T
Tomas Volf wrote on 19 Dec 2024 01:29
(name . Ludovic Courtès)(address . ludo@gnu.org)
877c7w7bxi.fsf@wolfsden.cz
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (16 lines)
> Hello,
>
> Jake <jforst.mailman@gmail.com> skribis:
>
>> I think I'm experiencing a bug in Shepherd since version 1.0.
>> Whenever I log out and log back in again, my user shepherd from the
>> previous login session is still present, and a new user shepherd spawns for
>> the current login session.
>> So relogging N times results in N+1 user shepherds.
>
> I have a user shepherd via Guix Home and I experience the same problem
> (though because I rarely log out it’s not really annoying :-)).
>
> I suspect the problem has to do with how Guix Home determines whether or
> not it should launch shepherd, but I haven’t checked yet.

When you have another login session active when you log out and in
again, new shepherd is *not* spawned. I am guessing here but probably
last log out causes XDG_RUNTIME_DIR to be removed (by elogind in my
case), so on log in there is no /run/user/$UID/on-first-login-executed,
so it runs again and starts the shepherd.

But even if that would be solved, since the runtime directory was nuked,
there is no shepherd socket around anymore, so the (still running)
shepherd from previous login session cannot be contacted by herd.

Of the top of my head I can think of two possible solutions:

1. Stop the shepherd on log out. So as we have on-first-login, we would
have on-last-logout. I have no idea how to implement that. Maybe we
could use ~/.bash_logout? Or some PAM thing?

2. Shepherd could shutdown gracefully when the control socket is deleted
from the file system. It is arguable how useful running shepherd is
without the socket anyway.

Any other ideas?

Tomas

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
-----BEGIN PGP SIGNATURE-----

iQJCBAEBCgAsFiEEt4NJs4wUfTYpiGikL7/ufbZ/wakFAmdjaNkOHH5Ad29sZnNk
ZW4uY3oACgkQL7/ufbZ/wakqSA//X0KE72gpD9M9RRjzrRQjmT/xYRHBPia7ZKIQ
96OH8Rj7qrHLbaQvlfhtEEVbuoIxoxXRHvtLXvdXgxAaD05geIE9qyhid84E6qgM
Y+/qiRfsThXjEu2chqLcPxl/xkL6mqk+Jzv2HQyAn0wAp+5N1A4TxCJv14cG+ZOj
0Non6zepnKkeQABDsH0ovzAj79T5LqKDCVryXI6BGpE+kqnH6V+H7nBC8JNGh7eO
O4koQuAYcifyAAD4iD/qM0bI3CCtOShBIFalHmJ9Mb4GNVMbTh/Oe2ayVZ0yNB7T
NAsLYNeu6UmrGQ7J6cYskPsvnu5qB01PqeGTMvqIVjc8yzb1nEdFHlZ4FLFcZ5mU
TT57bpNfB6TMzQ4R3KkffiJ+Oh0EIokDlYukTrvpNqTvnSxWB7GFesej+mgpyQU3
Sc569E+AzO+dDqCO3W7s9otW4qw9MsYyoR2q6yR3qLJWbDAhzp2KQUNoJAG5M+Xy
WlpI7QZDUsCNaABwV3J/4DpI+0bnc9EkLcRvVqVyglgsY3QgRTbxiCvtdWFhh5Iq
uuzepyV/WHCIW/h58M0lQH8AcF9mFZoLLwMrOV83OGAlI8h6R2ixqhNakWK2QcZP
ySUedRh6NaNkxC9FbznYUkuEthpLnmd5PfT32PKhanFq00T+DknnqYm2fOHCQ2X1
VL/AgMk=
=fjWz
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 26 Dec 2024 11:50
(name . Tomas Volf)(address . ~@wolfsden.cz)
87o70yzpk7.fsf@gnu.org
Hi!

Tomas Volf <~@wolfsden.cz> skribis:

Toggle quote (10 lines)
> When you have another login session active when you log out and in
> again, new shepherd is *not* spawned. I am guessing here but probably
> last log out causes XDG_RUNTIME_DIR to be removed (by elogind in my
> case), so on log in there is no /run/user/$UID/on-first-login-executed,
> so it runs again and starts the shepherd.
>
> But even if that would be solved, since the runtime directory was nuked,
> there is no shepherd socket around anymore, so the (still running)
> shepherd from previous login session cannot be contacted by herd.

Hmm, when is /run/user/UID deleted?

Toggle quote (6 lines)
> Of the top of my head I can think of two possible solutions:
>
> 1. Stop the shepherd on log out. So as we have on-first-login, we would
> have on-last-logout. I have no idea how to implement that. Maybe we
> could use ~/.bash_logout? Or some PAM thing?

Or some elogind thing, rather?

But then, how do we make it work on other distros? Maybe on systemd
distros shepherd receives SIGTERM or something, in which case it
terminates properly.

Toggle quote (4 lines)
> 2. Shepherd could shutdown gracefully when the control socket is deleted
> from the file system. It is arguable how useful running shepherd is
> without the socket anyway.

I don’t think that’s workable: you’d need to poll/inotify for the
existence of that socket, but even if it exists on the file system, you
cannot tell whether it matches the socket you’re accepting on.

Ludo’.
B
(name . Ludovic Courtès)(address . ludo@gnu.org)
Z22RflvtBpyOHG14@BRL14v1
On +2024-12-26 11:50:00 +0100, Ludovic Courtès wrote:
Toggle quote (41 lines)
> Hi!
>
> Tomas Volf <~@wolfsden.cz> skribis:
>
> > When you have another login session active when you log out and in
> > again, new shepherd is *not* spawned. I am guessing here but probably
> > last log out causes XDG_RUNTIME_DIR to be removed (by elogind in my
> > case), so on log in there is no /run/user/$UID/on-first-login-executed,
> > so it runs again and starts the shepherd.
> >
> > But even if that would be solved, since the runtime directory was nuked,
> > there is no shepherd socket around anymore, so the (still running)
> > shepherd from previous login session cannot be contacted by herd.
>
> Hmm, when is /run/user/UID deleted?
>
> > Of the top of my head I can think of two possible solutions:
> >
> > 1. Stop the shepherd on log out. So as we have on-first-login, we would
> > have on-last-logout. I have no idea how to implement that. Maybe we
> > could use ~/.bash_logout? Or some PAM thing?
>
> Or some elogind thing, rather?
>
> But then, how do we make it work on other distros? Maybe on systemd
> distros shepherd receives SIGTERM or something, in which case it
> terminates properly.
>
> > 2. Shepherd could shutdown gracefully when the control socket is deleted
> > from the file system. It is arguable how useful running shepherd is
> > without the socket anyway.
>
> I don’t think that’s workable: you’d need to poll/inotify for the
> existence of that socket, but even if it exists on the file system, you
> cannot tell whether it matches the socket you’re accepting on.
>
> Ludo’.
>
>
>

I wonder how many guix-daemon-process-relationship type problems would be simplified
if (radical vision) one let wayland's inner event-driven loop/protocol be the dispatcher
for guix processes instead of the current guix daemon switching between its collection of threads.
I.e., all the guix threads would be individual login or spawned user processes securely communicating
virtualizably (shared memory or networked rendezvous buffers etc) for offloading?
T
T
Tomas Volf wrote on 28 Dec 2024 00:19
(name . Ludovic Courtès)(address . ludo@gnu.org)
875xn44suw.fsf@wolfsden.cz
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (16 lines)
> Hi!
>
> Tomas Volf <~@wolfsden.cz> skribis:
>
>> When you have another login session active when you log out and in
>> again, new shepherd is *not* spawned. I am guessing here but probably
>> last log out causes XDG_RUNTIME_DIR to be removed (by elogind in my
>> case), so on log in there is no /run/user/$UID/on-first-login-executed,
>> so it runs again and starts the shepherd.
>>
>> But even if that would be solved, since the runtime directory was nuked,
>> there is no shepherd socket around anymore, so the (still running)
>> shepherd from previous login session cannot be contacted by herd.
>
> Hmm, when is /run/user/UID deleted?

I believe it is done by elogind (in my setup) when last user session
(for the given UID) logs out. If I grepped right, it is done by
user_finalize function in logind-user.c.

It (AFAIUT) it should be performed when last session of the seat
terminates. So if you log only into a single TTY, the XDG_RUNTIME_DIR
will be removed on every log out.

Toggle quote (9 lines)
>
>> Of the top of my head I can think of two possible solutions:
>>
>> 1. Stop the shepherd on log out. So as we have on-first-login, we would
>> have on-last-logout. I have no idea how to implement that. Maybe we
>> could use ~/.bash_logout? Or some PAM thing?
>
> Or some elogind thing, rather?

I looked around the manual page, but did not found anything. There is
KillUserProcesses, but that feels like fairly big hammer, and something
that should *not* be enabled by default.

We could patch elogind to add new RemoveRuntimeDirectory boolean flag to
allow keeping the XDG_RUNTIME_DIR even after last log out (I personally
would prefer that behavior anyway). I am not sure what our policy
regarding patches here is.

Toggle quote (5 lines)
>
> But then, how do we make it work on other distros? Maybe on systemd
> distros shepherd receives SIGTERM or something, in which case it
> terminates properly.

No idea here. ~/.bash_logout?

Toggle quote (9 lines)
>
>> 2. Shepherd could shutdown gracefully when the control socket is deleted
>> from the file system. It is arguable how useful running shepherd is
>> without the socket anyway.
>
> I don’t think that’s workable: you’d need to poll/inotify for the
> existence of that socket, but even if it exists on the file system, you
> cannot tell whether it matches the socket you’re accepting on.

For files I would suggest checking if both `stat:dev' and `stat:ino'
match in order to detect whether it is the same file. Not sure if same
strategy can be used for unix sockets.

Tomas

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
-----BEGIN PGP SIGNATURE-----

iQJCBAEBCgAsFiEEt4NJs4wUfTYpiGikL7/ufbZ/wakFAmdvNecOHH5Ad29sZnNk
ZW4uY3oACgkQL7/ufbZ/wamqow/5ARGzpgBLPnkZwrBhnb2M3oaDxKJP3xCI9rwq
asdXIj6IdrK2BXikuRAKq8iLdJdzZzZXNVBidTkYOu9U6OYZoQ28kzdBk0wQQ0lk
rKooFzjsxGL2WVp3N9j5Z+oMT6RFLSDf9W2w3sOAp6boNzZ4iHVQKtFHmWhCFJ6e
LBI+C0EFZoWoEoZzrboEMDC5r9NYRLAr2tUxu6RG+FZ+Shd4gT827oeTtvn2nMIz
agfHAER2DUJcWNJy1QuPMvyOWicmFEEHk0wNiLw9xdWiodN4/qdt9AcudmHkQtY/
oASK/aBbAa74avwBPIXaGosD9djABKWIjF0JtzL0/C+YwTUDkqkx1Fp2rKYkg5oY
Oko0ctMuFpyIJLIx7A8notShubi1YVDhXrKxbCo7xuSVcT1N5TJ0TVdLT6mAA5lF
haDmC+7+u8Y/ZOvQQ1Z1zsBYzl3oiPspSzZ5tbIWlfQMGf35jA8onCz+ksgVh/Ps
LGTHgvoIszeJ5tP9yUPB3ScPRiYiRy9GUj0sN7wZEPM41fqu+Lxm95R9RMMGPjEY
EoAB/UyJP/29puD6EFHBH6CHFm2XE7U1NqIkep1UaNlipgr/JMle7EB7JbE9HZsS
ifKNVwrpVV0wP85nhxkYkLz3oXAhwlq4FHPPF2mAthIcL0bZJflchdIWk2Ad8tIK
OZLG6tU=
=X+CU
-----END PGP SIGNATURE-----

T
T
Tomas Volf wrote on 28 Dec 2024 00:20
(address . bokr@bokr.com)
871pxs4srs.fsf@wolfsden.cz
I am not sure how this relates to this specific bug report, but

bokr@bokr.com writes:

Toggle quote (4 lines)
> I wonder how many guix-daemon-process-relationship type problems would be simplified
> if (radical vision) one let wayland's inner event-driven loop/protocol
> be the dispatcher

not everyone uses wayland.

Toggle quote (4 lines)
> for guix processes instead of the current guix daemon switching between its collection of threads.
> I.e., all the guix threads would be individual login or spawned user processes securely communicating
> virtualizably (shared memory or networked rendezvous buffers etc) for offloading?

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
-----BEGIN PGP SIGNATURE-----

iQJCBAEBCgAsFiEEt4NJs4wUfTYpiGikL7/ufbZ/wakFAmdvNlcOHH5Ad29sZnNk
ZW4uY3oACgkQL7/ufbZ/walKPRAAl9jzOpPQ9YaB4UjS6KEIbLe1VtCiga+PmtMX
hIq/h7JbMv1EQAHEV9kUJSwNrvrzsFx76E5PTaln3FD/cGSbs37XVdEF5QvweGK6
5rD6ksKZQwskM4SnaxEq4RjwoIDXcS3ybkfMyvq8VDfBmPR9cOxQmwqdiI7K4rYb
VVB/TZRJRXFUa6fb72mvMeZLodXHGqfFrKlADLQ2ltqw6KbqgLlPpJDwLM/7jQWE
JXJsgS4/iNlAonFKbwLBWO9W04sfv+ybXwJvtpeOtthWf7MpB5UHKKVWsi7u/IT3
U1fUDMFhxYZ9XcImCirmqhV+SRfeIHuxJ/X35ezPjbk4BtLuHB6GBUTsXU72YsA8
r1XV0XS0EecgFBJ3ZtBIHYYZDaTY4x5Ou+XNC0F7GZWKIhuZhWXKK6uVt8IGirYf
DuYRcS/5uVjJYoVchcMySmCuyiDudOsMEhoTYFRx0vNVI84O/s4cZ/tw4WL3Ga3J
LGspXSSRFnZKBdUw9tkjkeDZAXvJCNphU8W8UeUO7gsQY54sSx+CB6Hbfj2STPcx
lYF2QByOXFjsLUctzczXwTRX/Sy7mZTctGNc4YbyM/uan3Vm3+9RA6ZjhQHpf0cV
erMeMbThx5QR8SwgO6qH2rh1zKkXyv31ye98y9oTbFpWq23L/bDEa7Sii7UGdqWD
ZLNteBg=
=WNc+
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 7 Jan 23:58 +0100
control message for bug #74912
(address . control@debbugs.gnu.org)
87ikqqxmbz.fsf@gnu.org
severity 74912 important
quit
J
J
Julian Flake wrote on 11 Jan 22:54 +0100
merge 67863 74912
(name . GNU bug tracker automated control server)(address . control@debbugs.gnu.org)
87bjwdaue5.fsf@uni-koblenz.de
severity 67863 important
merge 67863 74912
thankyou
-----BEGIN PGP SIGNATURE-----

iHMEAREKADMWIQSZos45zYG9CgQjO52pmOo730U57wUCZ4LoohUcZmxha2VAdW5p
LWtvYmxlbnouZGUACgkQqZjqO99FOe9yMwCghxsw9H0nubbMjU6XC2JYELZ4VRAA
niOvwWBqYBCAQhaBbAy2kFqhXuWl
=QKlJ
-----END PGP SIGNATURE-----

?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 74912@debbugs.gnu.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 74912
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch