nss@3.99 is really hard to build

  • Open
  • quality assurance status badge
Details
8 participants
  • Christina O'Donnell
  • Ian Eure
  • Joshua Branson
  • Ludovic Courtès
  • Christopher Baines
  • Maxim Cournoyer
  • Mark H Weaver
  • pelzflorian (Florian Pelz)
Owner
unassigned
Submitted by
Christopher Baines
Severity
normal
Merged with
C
C
Christopher Baines wrote on 30 Apr 11:16 +0200
(address . bug-guix@gnu.org)
87plu7xla9.fsf@cbaines.net
nss@3.99 is really hard to build, it's so hard and so important that
data.guix.gnu.org is still after two days trying to process [1]. I say
so important because you have to build nss@3.99 to compute the channel
instance derivations for Guix.


Looking at the next revision which has been processed [2], it's been
built on riscv64-linux as the testsuite is disabled, and it has also
built on aarch64-linux, but there's no successful build for any other
architecture.


I think there's two issues here, was this spotted before merging, and
what if anything can be done about this now. Where there's not a
substitute available for nss@3.99, this will affect guix pull/guix
time-machine, e.g.

→ guix time-machine --commit=72308f262c910977e40c2c9f350dc563c0a8437a -- describe
Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 100.0%
substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 100.0%
substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 100.0%
nss-3.99.tar.xz 55.2MiB 13.7MiB/s 00:04 ???????????????????? 100.0%
building /gnu/store/8379qa0y6s7ssjr8gplm5fyw9r5pnxhn-nss-3.99.0.drv...
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmYwtv5fFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9Xdqtg//UfhI6FP67u+ne7heb9jfzjLLzdBNXzsz
dq1y87frK4jRQeqluxxrI88/DpOm3eRJKy1TXUh1zpG5nBauflk/hk2DZdhda0dW
YlgQxYDiT/cLgdODl5KsOwwkD3V3wXz6RJBanaDF75JFjIo0xa8tNxq36y9ZCVEY
kSoOfi8T6Sl4F2c2o8IiQuR2JHlZ65kXNDWZEPbWNOgh2vcq+2/SJWhMRhdEpal5
UrbswxOydaJ5xsJ1IXNkNffkIeLmvJelzTJyrqT07jUaSm+ua8aXssaRoDzJWr+D
192kV1Q7V4O7OTPuuxwiJeUOmVcXLUEgULs3f4c+hYKfULkzyltVKOPfFH+23Y8F
7rd/naSIDz5L5mvW+72wXZe68MzMgr+ga1+amYfTl79o5i3+FH9+a76wGxzf3+7Y
I07k8x7iZxwbiRrBCOlIJeRUwz69emPyC9Za58J0gPZYRuoywtaSOJR1awra3ycL
EAei6iIACxONTIpX7304VkxgAE4nea/V8icDr6pgKCFJbpHgV1OPXrOdeSHwuhUX
8WaAg5buhpTExUcLTJkzjhoHjdaaYeMnmI1hqV2vfGYgZA/hd6f0Wu3v+DWn8/QU
29Qnu1wg3xYBqp6THbMNWh1wjJ1BGWbAJ+BNz1kf5F/SyEdWMRl8YjpTY5xE746j
8MCbyxjHCmY=
=QBkH
-----END PGP SIGNATURE-----

C
C
Christopher Baines wrote on 1 May 12:11 +0200
(address . 70663@debbugs.gnu.org)
871q6lx2ng.fsf@cbaines.net
Christopher Baines <mail@cbaines.net> writes:

Toggle quote (27 lines)
> nss@3.99 is really hard to build, it's so hard and so important that
> data.guix.gnu.org is still after two days trying to process [1]. I say
> so important because you have to build nss@3.99 to compute the channel
> instance derivations for Guix.
>
> 1: https://data.guix.gnu.org/revision/72308f262c910977e40c2c9f350dc563c0a8437a
>
> Looking at the next revision which has been processed [2], it's been
> built on riscv64-linux as the testsuite is disabled, and it has also
> built on aarch64-linux, but there's no successful build for any other
> architecture.
>
> 2: https://data.guix.gnu.org/revision/9f183c3627a006e8fd3bb9708448bc05a6204e6d/package/nss/3.99.0?locale=en_US.UTF-8
>
> I think there's two issues here, was this spotted before merging, and
> what if anything can be done about this now. Where there's not a
> substitute available for nss@3.99, this will affect guix pull/guix
> time-machine, e.g.
>
> → guix time-machine --commit=72308f262c910977e40c2c9f350dc563c0a8437a -- describe
> Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
> substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 100.0%
> substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 100.0%
> substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 100.0%
> nss-3.99.tar.xz 55.2MiB 13.7MiB/s 00:04 ???????????????????? 100.0%
> building /gnu/store/8379qa0y6s7ssjr8gplm5fyw9r5pnxhn-nss-3.99.0.drv...

Looking at the build failures for x86_64-linux, it seems that there's
just one test failure. There's a threshold of less than 5 seconds, and
it takes 5 to 7 seconds to complete. This probably isn't helped by using
faketime.

Here's an upstream bug [3] where they raised the threshold a bit, but
this isn't enough for our use case.


I've sent a patch here which increases the threshold by a lot:

-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmYyFVNfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9Xc5ehAAlSBhAj0V6Ce6bQqGcg6+IjHTn9yRtiis
asmAS4xKgzw1wEKJni3YixoiLNRy6IdlGcZ0QrEBLKByfwUKITWyjjujl8bxbykL
BCoG6F/0n/6Lqh+G/+WiOjHPySM0WfJmGTEP47Vw/7Y/feR6jI852DNvUhwb5v07
NnXKKyu/kgUeD2gCaub/zVmSvGJhZXJQlycPDDG91XhITQaiCmjJCOsgF0bK/kT/
kbIE9Lh5ctFSwGKCGoj5gCiF/Vz+i/loek1BvYMT33shGnxQJmLDA2OObwD/r7B2
IIdhLj/sVIy6hYRzZ1Cl5bbG7rIcB1cEgdy4f4EHOT0oMomWJ5QJvG25xx1/+bN6
+qcpNnAuYnHaatR6DV8zEklX0xgF5d3wjbmAqFtS5PragpF2P1CcQqp/r8eeE5OL
m1Re81tdKBAsLpf80XkIsWI+G05abQ1UOQ7C/vngaDTNS7OWUe5ceWKwE1Jkf8ij
xqmSRgpArwafptjn6VpaaEbsJ2hLuUCRlwV4h1GD2nWx2zARC0nctKj7eUJGKBxw
pCWvK4a3TlwV3Bk7emRFfMsXi9cAMT66pOIYEutAvbNfe1AlIZcqbqKwWyOWl1Kz
/TheB2KxBE9/VyWHqS+BTEPPBUfsC/hq6m9QbDWAnaSRvaIL1C2itRo2yxuaeWwG
45iN0I80mHI=
=bNsA
-----END PGP SIGNATURE-----

M
M
Maxim Cournoyer wrote on 1 May 18:54 +0200
Re: bug#70663: nss@3.99 is really hard to build
(name . Christopher Baines)(address . mail@cbaines.net)(address . 70663@debbugs.gnu.org)
87sez179sf.fsf@gmail.com
Hi Chris,

Christopher Baines <mail@cbaines.net> writes:

Toggle quote (5 lines)
> nss@3.99 is really hard to build, it's so hard and so important that
> data.guix.gnu.org is still after two days trying to process [1]. I say
> so important because you have to build nss@3.99 to compute the channel
> instance derivations for Guix.

I agree that the nss test suite takes a ridiculous amount of time to run
(multiple hours on a fast machine IIRC). Are we missing a '--fast' test
flag or something to make it run in a more reasonable amount of time?

--
Thanks,
Maxim
C
C
Christopher Baines wrote on 1 May 19:14 +0200
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)(address . 70663@debbugs.gnu.org)
87edalv4hj.fsf@cbaines.net
Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (13 lines)
> Hi Chris,
>
> Christopher Baines <mail@cbaines.net> writes:
>
>> nss@3.99 is really hard to build, it's so hard and so important that
>> data.guix.gnu.org is still after two days trying to process [1]. I say
>> so important because you have to build nss@3.99 to compute the channel
>> instance derivations for Guix.
>
> I agree that the nss test suite takes a ridiculous amount of time to run
> (multiple hours on a fast machine IIRC). Are we missing a '--fast' test
> flag or something to make it run in a more reasonable amount of time?

I did read some of the all.sh script used for the tests and there is
some environment variables you can set here:


It seems like there are 4 "cycles", maybe we could just run the standard
cycle or at least check how long they each take.
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmYyeIhfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9Xey0w//VCqizlY958Dn2sJklAgROJqNOFPSmGEx
5FTD4Eq0Ak1H25XeIsBiYCXOnE7hC7y3OmcEg0+KRsdR4fQ1PBClmzCxCQv5uJmw
PpyZfpNubLcxZIfvyYRiX4mb7S1PXrYs8XToS2pLeayXL/mtLq++qejLBYtx3gAC
yfc3szc6SNe67/+dvjY0hBs1kbR629thdXntpD/Lr1iNlCsXDvawsU112ZOaKOSn
C9NYudEu/hno4NrNLOpBj5FfyKxgGlK2ZydpvShS2CdNDpg2Uvx8v52BaIO6bEMm
H+HDO2JWgVgN5V51mKBTgxEhTCwmP2JrzHrR5N9hzuywRLWqW1O37zYvgauCE/2k
fXX3EM7KqlijeUJ94vhuj2eRy/u3Cay78Kzlx2EX+O87ETwgY8smeeAaL3LBsLW2
l3B6Lq9x0AUQoJSqSMQiAwCeP1Kk+kYyOo7KbzeqZnUVuE4oRXquAbtx7sIxLRve
geA70W+ECVLF7whgWl6VwcsokEirAByjAeyHwptwYEXtBi7rv2/cVb9qT4NhF/U/
otQzXMCvhori0mppr/EKJTMu5cwj/iO/hxMgO/NCmDQTO+2HhKqFoNbaAlEbwGvF
Q4LHRbg3knBoVwSw72q/fx/8ohq2Qb2CgpI4D+ifZAKnpxTmfWF/qyWWWxqKWWOt
UNuucBBxEsI=
=dROs
-----END PGP SIGNATURE-----

C
C
Christina O'Donnell wrote on 2 May 22:38 +0200
(address . 70663@debbugs.gnu.org)
cf5c17ef-964d-6957-9c40-4a50ef1ed019@mutix.org
Hi,

On 01/05/2024 18:14, Christopher Baines wrote:
Toggle quote (21 lines)
> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>
>> Hi Chris,
>>
>> Christopher Baines <mail@cbaines.net> writes:
>>
>>> nss@3.99 is really hard to build, it's so hard and so important that
>>> data.guix.gnu.org is still after two days trying to process [1]. I say
>>> so important because you have to build nss@3.99 to compute the channel
>>> instance derivations for Guix.
>> I agree that the nss test suite takes a ridiculous amount of time to run
>> (multiple hours on a fast machine IIRC). Are we missing a '--fast' test
>> flag or something to make it run in a more reasonable amount of time?
> I did read some of the all.sh script used for the tests and there is
> some environment variables you can set here:
>
> https://github.com/nss-dev/nss/blob/master/tests/all.sh#L70-L82
>
> It seems like there are 4 "cycles", maybe we could just run the standard
> cycle or at least check how long they each take.

On my machine building natively on x86_64 I was getting approximately 63
mins for a full test and 20 mins for just the 'standard' 'cycle'. My
vote would be to just run 'standard' since that runs all of the tests once.

I can profile individual tests if needed to see if there's any that are
particularly worth culling, but just the above change is an easy win
without sacrificing too much test coverage.

Kind regards,

Christina
L
L
Ludovic Courtès wrote on 5 May 12:01 +0200
control message for bug #70663
(address . control@debbugs.gnu.org)
877cg8shlm.fsf@gnu.org
merge 70663 70771
quit
J
J
Joshua Branson wrote on 9 May 19:01 +0200
Re: bug#70663: nss@3.99 is really hard to build
(name . Christina O'Donnell)(address . cdo@mutix.org)
87seyqzzq9.fsf@dismail.de
Perhaps we could disable the test suite for power9 ? At the moment guix
pull fails on power9...I believe due to this bug.

Just a thought.

Joshua
C
C
Christopher Baines wrote on 14 May 11:05 +0200
Re: nss@3.99 is really hard to build
87o798zrtz.fsf@cbaines.net
Christopher Baines <mail@cbaines.net> writes:

Toggle quote (27 lines)
> nss@3.99 is really hard to build, it's so hard and so important that
> data.guix.gnu.org is still after two days trying to process [1]. I say
> so important because you have to build nss@3.99 to compute the channel
> instance derivations for Guix.
>
> 1: https://data.guix.gnu.org/revision/72308f262c910977e40c2c9f350dc563c0a8437a
>
> Looking at the next revision which has been processed [2], it's been
> built on riscv64-linux as the testsuite is disabled, and it has also
> built on aarch64-linux, but there's no successful build for any other
> architecture.
>
> 2: https://data.guix.gnu.org/revision/9f183c3627a006e8fd3bb9708448bc05a6204e6d/package/nss/3.99.0?locale=en_US.UTF-8
>
> I think there's two issues here, was this spotted before merging, and
> what if anything can be done about this now. Where there's not a
> substitute available for nss@3.99, this will affect guix pull/guix
> time-machine, e.g.
>
> → guix time-machine --commit=72308f262c910977e40c2c9f350dc563c0a8437a -- describe
> Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
> substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 100.0%
> substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 100.0%
> substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 100.0%
> nss-3.99.tar.xz 55.2MiB 13.7MiB/s 00:04 ???????????????????? 100.0%
> building /gnu/store/8379qa0y6s7ssjr8gplm5fyw9r5pnxhn-nss-3.99.0.drv...

So with the changes in #70693 merged, this issue should be fixed going
forward, but the revisions with the broken nss are going to be affected
forever and thus the impact is going to drag on for a while. For
example, data.guix.gnu.org is going to be struggling to process the
revisions with the broken nss for a long while to come.

Before closing this bug, it would be good to understand more about how
this happened and from that try to think if anything can be done to
prevent similar issues in the future?

At least from what I can see on the issues, the problem was introduced
with the update to 3.98.0 [3] and then continued with the update to 3.99
[4]. Given the changes in 70662 were sent to guix-patches and then
merged less than 24 hours later, I'd imagine that wasn't sufficient time
for data.qa.guix.gnu.org to fail attempting to build nss.


Had the changes waited for longer, then these failures should have been
spotted by QA, I would guess that the revision might have failed to be
processed, and if it was processed successfully, the nss failures should
have shown up, so maybe we should start requiring [5] that not only are
changes sent to guix-patches@gnu.org, but that QA processes them (to
some extent) before merging?


Thanks,

Chris
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmZDKVhfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XdM2xAAoc9IN7usc1G2+xZroJRriFxA2AYojpk8
CxqyZ/lI8zg84jFgXNW0wJXtJp3KNgILEevvq1jXlsP8vlP/mkDq8l2s3e7VlPu3
9aXPRerflHrAJL75herxiav58VxuTe/dQ81sQGBxM6OdupoYDghNphZhDhzsc+Ny
H2ATepx/tmUQ9lQQwni5wiZee/iw1w6MwE1hyS6s8KzrSLDc9Sg7PogRGLJ1cP+r
2/M4k7eWMJGT70p0QqXay0Tb5fNzLLuOulm0x6BKkm/vuGuzSALZ/RdIMjewa9H5
Edj2qGbGo4j+J+swTZtD0zFsDhb1J5MMIKx84Rm3gvGkQTvlzymXCDLfW/IN70QM
TsSkrLaBNMICDAx8OJ1eSqe/Wy93ub8zd7WtEjkXA/E4W5vZ7SyZ784MvBUf3ir8
ZokT2T+VNdoacu1xZFkTxAzr3Opb4oufiPb6kyfDX6zTFWXKgSXEXIpD0mJH6Mzr
TM9YeOjJo6/EoWHpBmls5C1FysHbkrgmUizSBn5vmxKrTvsEkhLeiMIvEZ6EMapZ
oTSbZE4vDSn2d4/wFKQCuaZFQ2Hm9JTjEFoCsnRK4+RTscEjzRXAQ5Vwj5bh5VRX
+Qvugfd4SXq9lbBA5QLO/9FCIUWZw4+t13w2YXxLxXCga3tRTgYN5ITDc4Kuicie
+43o2g9iV8E=
=OCea
-----END PGP SIGNATURE-----

C
C
Christina O'Donnell wrote on 14 May 12:19 +0200
Re: Scheduling a new release?
339167fe-c899-1303-166f-8040b49f0f59@mutix.org
Hi,

On 08/05/2024 14:01, Christopher Baines wrote:
Toggle quote (12 lines)
> I think it would be nice to have a new release, and indeed release more
> often, I think the way to get there is for less things to be broken
> between releases, such that releasing takes less effort in terms of
> testing and fixing things.
>
> To give some specific issues, I've run up against the recent issues with
> nss [1][2] and I don't think we could release with the nss package as is
> currently.
>
> 1: https://issues.guix.gnu.org/70662
> 2: https://issues.guix.gnu.org/70663

I can fix these by disabling tests, but I would prefer if someone with
more experience packaging for guix could make a decision on it.
Otherwise, I don't have any problem reducing the number of tests and
disabling all tests on PowerPC at least.

I could also do some analysis if it was deemed necessary, inserting a
patch to measure the timings of each test/cycle. Additionally, I could
try packaging some of the versions between 0.88 and 0.98 to identify the
exact change that could be to blame. However, both of these seem
overkill, given the backlog of patches/issues we have left to get
through, and the manpower we currently have to work with.

Would any of that be helpful?

Toggle quote (2 lines)
> ...

Kind regards,

Christina
P
P
pelzflorian (Florian Pelz) wrote on 14 May 12:36 +0200
Re: bug#70663: nss@3.99 is really hard to build
(name . Christopher Baines)(address . mail@cbaines.net)
87eda4vfx9.fsf@pelzflorian.de
Hello Christopher.

Christopher Baines <mail@cbaines.net> writes:
Toggle quote (9 lines)
> Had the changes waited for longer, then these failures should have been
> spotted by QA, I would guess that the revision might have failed to be
> processed, and if it was processed successfully, the nss failures should
> have shown up, so maybe we should start requiring [5] that not only are
> changes sent to guix-patches@gnu.org, but that QA processes them (to
> some extent) before merging?
>
> 5: https://guix.gnu.org/manual/devel/en/html_node/Managing-Patches-and-Branches.html#

Yes, though note that the nss change did provide security fixes:

commit e584ff08b162c46ef587daca438e97d56bc20b32
Author: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Date: Wed Apr 24 11:22:30 2024 -0400

gnu: nss: Graft with version 3.98 [security fixes].
This fixes CVE-2023-5388, CVE-2023-6135 and CVE-2024-0743.
* gnu/packages/nss.scm (nss) [replacement]: New field.
(nss-3.98): Rename variable to...
(nss/fixed): ... this. Make it a hidden package.
* gnu/packages/librewolf.scm (librewolf) [inputs]: Replace nss-3.98 with
nss/fixed.
Change-Id: I8cc667c53a270dfe00738bf731923f1342036624

I suppose the requirement to wait for QA should apply to security fixes
as well?

Thank you for all your work.

Regards,
Florian
M
M
Maxim Cournoyer wrote on 14 May 14:58 +0200
Re: nss@3.99 is really hard to build
(name . Christopher Baines)(address . mail@cbaines.net)
87le4czh0z.fsf@gmail.com
Hi,

Christopher Baines <mail@cbaines.net> writes:

[...]

Toggle quote (29 lines)
>> I think there's two issues here, was this spotted before merging, and
>> what if anything can be done about this now. Where there's not a
>> substitute available for nss@3.99, this will affect guix pull/guix
>> time-machine, e.g.
>>
>> → guix time-machine --commit=72308f262c910977e40c2c9f350dc563c0a8437a -- describe
>> Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.org/git/guix.git'...
>> substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 100.0%
>> substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 100.0%
>> substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 100.0%
>> nss-3.99.tar.xz 55.2MiB 13.7MiB/s 00:04 ???????????????????? 100.0%
>> building /gnu/store/8379qa0y6s7ssjr8gplm5fyw9r5pnxhn-nss-3.99.0.drv...
>
> So with the changes in #70693 merged, this issue should be fixed going
> forward, but the revisions with the broken nss are going to be affected
> forever and thus the impact is going to drag on for a while. For
> example, data.guix.gnu.org is going to be struggling to process the
> revisions with the broken nss for a long while to come.
>
> Before closing this bug, it would be good to understand more about how
> this happened and from that try to think if anything can be done to
> prevent similar issues in the future?
>
> At least from what I can see on the issues, the problem was introduced
> with the update to 3.98.0 [3] and then continued with the update to 3.99
> [4]. Given the changes in 70662 were sent to guix-patches and then
> merged less than 24 hours later, I'd imagine that wasn't sufficient time
> for data.qa.guix.gnu.org to fail attempting to build nss.

I think in [3] you meant https://issues.guix.gnu.org/70569,not #70662.

Since this was security sensitive, I built it on x86_64, tested it there
to ensure that IceCat worked as expected, had others confirmed it worked
for them on #guix then pushed.

In the past, I've had more patience waiting for QA to build things, but
since this is not guaranteed (it sometimes never happened), it seems
reasonable to me to promptly push security fixes that were manually
built & tested and adjust for any breakage later, if there is any.

Toggle quote (10 lines)
> 4: https://issues.guix.gnu.org/70618
>
> Had the changes waited for longer, then these failures should have been
> spotted by QA, I would guess that the revision might have failed to be
> processed, and if it was processed successfully, the nss failures should
> have shown up, so maybe we should start requiring [5] that not only are
> changes sent to guix-patches@gnu.org, but that QA processes them (to
> some extent) before merging?

I have some apprehensions about that; given the QA build farm is
somewhat under-resourced for builds, I fear security changes could be
gated for longer periods of time than is reasonable to wait. If we go
that route, I think we should dedicate more hardware first.

--
Thanks,
Maxim
C
C
Christopher Baines wrote on 14 May 15:33 +0200
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87cypozfeo.fsf@cbaines.net
Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (12 lines)
>> Before closing this bug, it would be good to understand more about how
>> this happened and from that try to think if anything can be done to
>> prevent similar issues in the future?
>>
>> At least from what I can see on the issues, the problem was introduced
>> with the update to 3.98.0 [3] and then continued with the update to 3.99
>> [4]. Given the changes in 70662 were sent to guix-patches and then
>> merged less than 24 hours later, I'd imagine that wasn't sufficient time
>> for data.qa.guix.gnu.org to fail attempting to build nss.
>
> I think in [3] you meant https://issues.guix.gnu.org/70569, not #70662.

Ah, yep.

Toggle quote (9 lines)
> Since this was security sensitive, I built it on x86_64, tested it there
> to ensure that IceCat worked as expected, had others confirmed it worked
> for them on #guix then pushed.
>
> In the past, I've had more patience waiting for QA to build things, but
> since this is not guaranteed (it sometimes never happened), it seems
> reasonable to me to promptly push security fixes that were manually
> built & tested and adjust for any breakage later, if there is any.

I think pushing security fixes quickly is good, but this does set a
precedent on architecture support (only x86_64-linux matters).

For some packages (including nss in this instance), not looking at non
x86_64-linux support doesn't just affect users, the data service and
ci.guix.gnu.org were particularly affected, so for some packages it's
important to test across the "supported" systems just to ensure the
projects own tooling doesn't break.

Toggle quote (15 lines)
>> 4: https://issues.guix.gnu.org/70618
>>
>> Had the changes waited for longer, then these failures should have been
>> spotted by QA, I would guess that the revision might have failed to be
>> processed, and if it was processed successfully, the nss failures should
>> have shown up, so maybe we should start requiring [5] that not only are
>> changes sent to guix-patches@gnu.org, but that QA processes them (to
>> some extent) before merging?
>
> I have some apprehensions about that; given the QA build farm is
> somewhat under-resourced for builds, I fear security changes could be
> gated for longer periods of time than is reasonable to wait. If we go
> that route, I think we should dedicate more hardware first.

I think that's reasonable, I have been putting time in to the hardware,
but it's not been particularly easy going. The data service instances
are also still stuck on hardware I'm renting as well. In terms of QA
speed, there's the resources for data.qa.guix.gnu.org and there's the
hardware available for the bordeaux build farm.

There's also the potential to try and add more prioritisation. Currently
the data service prioritises branch revisions over patches, and newer
revisions more generally, but I guess it could prioritise security
related things. I'm not sure how to make that happen yet though, QA can
probably come up with the priorities, but I don't know how to convey
that to the data service.
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmZDaD9fFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XcTDA/+Pw/fN3P+zkMhqBtpLQssfpjGvQjp+ij3
cGbv2/n5uktDH4NQnqevPScAwt3zmPY34upkG5aIcgBpBDt9s7lYK67ZVAKsZFVZ
oX5R7QbIwc/gH4rHFFtCFz7g0iGQlZrpBID7w0UtoJd8+MEoA64Jgs1PI26ZXdD2
wAHCEo7kehW/PBwvQeggjmhwJFexeeZj72qB4WQSYveQLmkMtsjkJtMiiC2xkwgS
0avBTsM7JXq9AnKTOhQfUiYwMPxC1xkij2GFCm/Bph9u02MJ6lsSwNQMapHjNeRK
UzkZn1YpX6kWqdkEQ1KNIZhKPzBJpIemoE2jsjyfQlQA9QdK++hcWqx0JkEHJk1V
d3rLHLvm9GiIDJOspFHvNuzRgr1yreoexL78hCogVhMNv83Vi5k9YrvJAC35tFZT
AeLtrOFD4wtSC/zGtdbL1Ho7WyZfjzSqMog6EEkX0uYmfWm0t8UrVmWT6D1fktGi
zi9YDGDffhLmX3OotuejFUym9RnVWXasAjW+LX5TxAd76GZVpKSTwhYbbnE/wrCu
Re2ooKQj0MOY+FoKoX9z0lGS/LbnTGGfSamJsRfLJ3igz6W5zzOQ5HfIv4BuwINQ
kYFLF+iMf3YTbnI0xJJJVm0mmFU3OvBBIclUlttPCSy/BQQHD7GKD0mJaxKt0lnx
EDmhajU+qU4=
=vAiA
-----END PGP SIGNATURE-----

C
C
Christopher Baines wrote on 14 May 15:37 +0200
Re: bug#70663: nss@3.99 is really hard to build
(name . pelzflorian (Florian Pelz))(address . pelzflorian@pelzflorian.de)
877cfwzf8g.fsf@cbaines.net
"pelzflorian (Florian Pelz)" <pelzflorian@pelzflorian.de> writes:

Toggle quote (33 lines)
> Hello Christopher.
>
> Christopher Baines <mail@cbaines.net> writes:
>> Had the changes waited for longer, then these failures should have been
>> spotted by QA, I would guess that the revision might have failed to be
>> processed, and if it was processed successfully, the nss failures should
>> have shown up, so maybe we should start requiring [5] that not only are
>> changes sent to guix-patches@gnu.org, but that QA processes them (to
>> some extent) before merging?
>>
>> 5: https://guix.gnu.org/manual/devel/en/html_node/Managing-Patches-and-Branches.html#
>
> Yes, though note that the nss change did provide security fixes:
>
> commit e584ff08b162c46ef587daca438e97d56bc20b32
> Author: Maxim Cournoyer <maxim.cournoyer@gmail.com>
> Date: Wed Apr 24 11:22:30 2024 -0400
>
> gnu: nss: Graft with version 3.98 [security fixes].
>
> This fixes CVE-2023-5388, CVE-2023-6135 and CVE-2024-0743.
>
> * gnu/packages/nss.scm (nss) [replacement]: New field.
> (nss-3.98): Rename variable to...
> (nss/fixed): ... this. Make it a hidden package.
> * gnu/packages/librewolf.scm (librewolf) [inputs]: Replace nss-3.98 with
> nss/fixed.
>
> Change-Id: I8cc667c53a270dfe00738bf731923f1342036624
>
> I suppose the requirement to wait for QA should apply to security fixes
> as well?

Well, there's a risk in not testing things across multiple
machines/architectures at least. The value of getting a security fix
merged quickly is reduced if users on some architectures/systems can't
use it.

There's always going to be trade offs, and that's fine, but the question
is more what can be done to try and improve things for the future.
-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmZDaR9fFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9Xdx9A/+Iy7qrK5xRz7Uh4//Oyyb4U3MwHQkAL84
0feFaH+RlZhkETWohhK65KN4B2iCEWs0UVRc5IR5FFCnygCDRiIZPU3miO45Xe7B
+YCWUPYld26mwlwkH7QneWYSJ5Hb9peamqA9YWhtYZ7aoionZrlybYx0MV5Rpj/K
2lWA0S5Qn0TQKgfF63fN5CLl90DKtCjP/yIwnjE05Ca1SnLA/uJvSWzd8pKKv33Z
q4ylaqIuYEQMCDh4X5ag0qr0DtFFn6UluEHMIFIt9NB6+GGzRC2aGdw1bxvLAlpB
WXvimaRAD0tIY4FX7+TDOZBUHnrgizNL18SWfADQ9AZ2iLoHKHJNzqqytv8Bq/oA
AEzzuSH9y95W88L6qWIYmqM31M1x7AK05m/M9pwHoar4dBp6D099hW+xiF+Hl7Ei
UlNXd3TNLzTxZWyy1w0nUrd5QUrBimkrDAvZVehWAN6/uYzarAAF8TruyqTNa+Nd
t7SXF/IbD5+qf74gVc7ArD0+arm4SeawPcs3Bihr0xMsXlHZg7SxeUgetRs095jG
NSrIhdzs87wdtMAAcapuZJzCen1yj6YJrucYiV5J1hOa2cKQX9P1u6FzYHUbxZ+A
EAtCISJ7V/G45euhFAtqVcbEDb1FL/A5jGjhhrthuwI43ou3hGwrTB0vwwGg0O5D
58i8A38ReTo=
=2wdF
-----END PGP SIGNATURE-----

M
M
Mark H Weaver wrote on 14 May 17:03 +0200
87h6f0a107.fsf@netris.org
Hi Maxim,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (1 lines)
> Christopher Baines <mail@cbaines.net> writes:
[...]
Toggle quote (11 lines)
>> At least from what I can see on the issues, the problem was introduced
>> with the update to 3.98.0 [3] and then continued with the update to 3.99
>> [4]. Given the changes in 70662 were sent to guix-patches and then
>> merged less than 24 hours later, I'd imagine that wasn't sufficient time
>> for data.qa.guix.gnu.org to fail attempting to build nss.
>
> I think in [3] you meant https://issues.guix.gnu.org/70569, not #70662.
>
> Since this was security sensitive, I built it on x86_64, tested it there
> to ensure that IceCat worked as expected, had others confirmed it worked
> for them on #guix then pushed.
[...]
Toggle quote (3 lines)
>> 4: https://issues.guix.gnu.org/70618

Note that the IceCat package in Guix currently uses the copy of NSS that
comes bundled with the IceCat source code, so testing IceCat probably
won't tell you much about whether the standalone NSS package in Guix
works properly.

Regards,
Mark
M
M
Maxim Cournoyer wrote on 16 May 04:44 +0200
(name . Mark H Weaver)(address . mhw@netris.org)
8734qixyq3.fsf@gmail.com
Hi Mark,

Mark H Weaver <mhw@netris.org> writes:

Toggle quote (26 lines)
> Hi Maxim,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>
>> Christopher Baines <mail@cbaines.net> writes:
> [...]
>>> At least from what I can see on the issues, the problem was introduced
>>> with the update to 3.98.0 [3] and then continued with the update to 3.99
>>> [4]. Given the changes in 70662 were sent to guix-patches and then
>>> merged less than 24 hours later, I'd imagine that wasn't sufficient time
>>> for data.qa.guix.gnu.org to fail attempting to build nss.
>>
>> I think in [3] you meant https://issues.guix.gnu.org/70569, not #70662.
>>
>> Since this was security sensitive, I built it on x86_64, tested it there
>> to ensure that IceCat worked as expected, had others confirmed it worked
>> for them on #guix then pushed.
> [...]
>>> 3: https://issues.guix.gnu.org/70662
>>> 4: https://issues.guix.gnu.org/70618
>
> Note that the IceCat package in Guix currently uses the copy of NSS that
> comes bundled with the IceCat source code, so testing IceCat probably
> won't tell you much about whether the standalone NSS package in Guix
> works properly.

Thanks for the heads-up. It looks like there are now some low hanging
fruits in terms of unbundling opportunities for icecat/Icedove!

--
Thanks,
Maxim
I
I
Ian Eure wrote on 16 May 06:02 +0200
(name . Maxim Cournoyer)(address . maxim.cournoyer@gmail.com)
87r0e2v1t6.fsf@meson
Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

Toggle quote (45 lines)
> Hi Mark,
>
> Mark H Weaver <mhw@netris.org> writes:
>
>> Hi Maxim,
>>
>> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>>
>>> Christopher Baines <mail@cbaines.net> writes:
>> [...]
>>>> At least from what I can see on the issues, the problem was
>>>> introduced
>>>> with the update to 3.98.0 [3] and then continued with the
>>>> update to 3.99
>>>> [4]. Given the changes in 70662 were sent to guix-patches and
>>>> then
>>>> merged less than 24 hours later, I'd imagine that wasn't
>>>> sufficient time
>>>> for data.qa.guix.gnu.org to fail attempting to build nss.
>>>
>>> I think in [3] you meant https://issues.guix.gnu.org/70569,
>>> not #70662.
>>>
>>> Since this was security sensitive, I built it on x86_64,
>>> tested it there
>>> to ensure that IceCat worked as expected, had others confirmed
>>> it worked
>>> for them on #guix then pushed.
>> [...]
>>>> 3: https://issues.guix.gnu.org/70662
>>>> 4: https://issues.guix.gnu.org/70618
>>
>> Note that the IceCat package in Guix currently uses the copy of
>> NSS that
>> comes bundled with the IceCat source code, so testing IceCat
>> probably
>> won't tell you much about whether the standalone NSS package in
>> Guix
>> works properly.
>
> Thanks for the heads-up. It looks like there are now some low
> hanging
> fruits in terms of unbundling opportunities for icecat/Icedove!
>

Definitely. The LibreWolf package is probably a good reference,
as I was able to unbundle all its library dependencies and use the
Guix-packaged versions instead.

— Ian
M
M
Maxim Cournoyer wrote 7 days ago
Re: bug#70662: Problems building nss@3.98.0
(name . Christina O'Donnell)(address . cdo@mutix.org)
878r0268ba.fsf_-_@gmail.com
Hi,

Christina O'Donnell <cdo@mutix.org> writes:

Toggle quote (29 lines)
> Hi,
>
> On 08/05/2024 14:01, Christopher Baines wrote:
>> I think it would be nice to have a new release, and indeed release more
>> often, I think the way to get there is for less things to be broken
>> between releases, such that releasing takes less effort in terms of
>> testing and fixing things.
>>
>> To give some specific issues, I've run up against the recent issues with
>> nss [1][2] and I don't think we could release with the nss package as is
>> currently.
>>
>> 1: https://issues.guix.gnu.org/70662
>> 2: https://issues.guix.gnu.org/70663
>
> I can fix these by disabling tests, but I would prefer if someone with
> more experience packaging for guix could make a decision on
> it. Otherwise, I don't have any problem reducing the number of tests
> and disabling all tests on PowerPC at least.
>
> I could also do some analysis if it was deemed necessary, inserting a
> patch to measure the timings of each test/cycle. Additionally, I could
> try packaging some of the versions between 0.88 and 0.98 to identify
> the exact change that could be to blame. However, both of these seem
> overkill, given the backlog of patches/issues we have left to get
> through, and the manpower we currently have to work with.
>
> Would any of that be helpful?

I just encountered the following single test failure building on
powerpc64le:

Toggle snippet (7 lines)
time certutil -K -d /tmp/guix-build-nss-3.99.0.drv-0/nss-3.99/tests_results/security/localhost.1/bigdir -f ../tests.pw
------------- time ----------------------
real 10.32 user 10.25 sys 0.07
10 seconds
dbtests.sh: #27: certutil dump keys with explicit default trust flags - FAILED

with the summary:

Toggle snippet (43 lines)
SUMMARY:
========
NSS variables:
--------------
HOST=localhost
DOMSUF=localdomain
BUILD_OPT=
USE_X32=
USE_64=1
NSS_CYCLES=""
NSS_TESTS=""
NSS_SSL_TESTS="crl iopr policy normal_normal"
NSS_SSL_RUN="cov auth stapling signed_cert_timestamps scheme"
NSS_AIA_PATH=
NSS_AIA_HTTP=
NSS_AIA_OCSP=
IOPR_HOSTADDR_LIST=
PKITS_DATA=
NSS_DISABLE_HW_AES=
NSS_DISABLE_HW_SHA1=
NSS_DISABLE_HW_SHA2=
NSS_DISABLE_PCLMUL=
NSS_DISABLE_AVX=
NSS_DISABLE_ARM_NEON=
NSS_DISABLE_SSSE3=

Tests summary:
--------------
Passed: 79016
Failed: 1
Failed with core: 0
ASan failures: 0
Unknown status: 2
TinderboxPrint:Unknown: 2

error: in phase 'check': uncaught exception:
%exception #<&invoke-error program: "faketime" arguments: ("2024-01-23" "./nss/tests/all.sh") exit-status: 1 term-signal: #f stop-signal: #f>
phase `check' failed after 36124.0 seconds
command "faketime" "2024-01-23" "./nss/tests/all.sh" failed with status 1
builder for `/gnu/store/q3cqzzd4fg384lfmk91gd6higsyhs1nq-nss-3.99.0.drv' failed with exit code 1
@ build-failed /gnu/store/q3cqzzd4fg384lfmk91gd6higsyhs1nq-nss-3.99.0.drv - 1 builder for `/gnu/store/q3cqzzd4fg384lfmk91gd6higsyhs1nq-nss-3.99.0.drv' failed with exit code 1

So I'm not sure why, but the 'certutil dump keys with explicit default
trust flags' fails on powerpc64le and should probably be disabled.
Also, 36124 s, ouch! #70950 should help for that.

--
Thanks,
Maxim
?