(address . bug-guix@gnu.org)
Hello,
One problem we noticed in the analysis of the boot problem of bayfront
after the recent downtime¹ is that an interactive REPL would be opened
after an unbound variable was found in the shepherd config file:
Toggle snippet (18 lines)
[ 13.098907] shepherd[1]: Service root started.
[ 13.100711] shepherd[1]: Service root running with value #t.
[ 13.103824] shepherd[1]: Service root has been started.
[ 13.426102] shepherd[1]: ice-9/boot-9.scm:1685:16: In procedure raise-exception:
[ 13.428099] shepherd[1]: Unbound variable: make-forkexec-constructor/container
[ 13.429912] shepherd[1]:
[ 13.431108] shepherd[1]: Entering a new prompt. Type `,bt' for a backtrace or `,q' to continue.
[ 13.441983] shepherd[1]: GNU Guile 3.0.9
[ 13.442728] shepherd[1]: Copyright (C) 1995-2023 Free Software Foundation, Inc.
[ 13.443947] shepherd[1]:
[ 13.444427] shepherd[1]: Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
[ 13.445679] shepherd[1]: This program is free software, and you are welcome to redistribute it
[ 13.446919] shepherd[1]: under certain conditions; type `,show c' for
[ 13.447072] shepherd[1]: details.
[ 13.448737] shepherd[1]:
[ 13.449239] shepherd[1]: Enter `,help' for help.
This was unhelpful because we couldn’t interact with that REPL remotely
(no IPMI). Even when you can interact, it’s of limited use; in this
case, if you type “,q”, it tries to continue and fails:
Toggle snippet (43 lines)
Uncaught exception in task:
In fibers.scm:
172:8 7 (_)
In ice-9/exceptions.scm:
406:15 6 (_)
In ice-9/boot-9.scm:
1752:10 5 (with-exception-handler _ _ #:unwind? _ # _)
In shepherd/service.scm:
824:39 4 (_)
this is because we’re effectively adding #f in the middle of the list
passed to ‘register-services’ (see below).
This REPL-on-error “feature” comes from Guix System, not Shepherd, in
the config file generated from (gnu services shepherd):
;; Arrange to spawn a REPL if something goes wrong. This is better
;; than a kernel panic.
(call-with-error-handling
(lambda ()
(register-services
(parameterize ((current-warning-port
(%make-void-port "w")))
(map (lambda (file)
(save-module-excursion
(lambda ()
(set-current-module (make-user-module))
(load-compiled file))))
'#$(map scm->go files))))))
The rationale mentioned in the comment no longer holds: starting from
Shepherd 0.10.2, the config file is loaded in the background; if it’s
evaluation fails, shepherd keeps running (see
‘tests/config-failure.sh’, which tests this behavior).
I think we should change the above to log and gracefully handle failure
to load an individual service file.
Ludo’.
¹ https://lists.gnu.org/archive/html/info-guix/2024-05/msg00000.html