Hi Liliana,
Liliana Marie Prikler <liliana.prikler@gmail.com> writes:
Toggle quote (12 lines)
> Hi,
>
> Am Mittwoch, dem 15.01.2025 um 15:48 +0000 schrieb 45mg:
>> The idea of authentication is that once you trust the channel
>> introduction, you can be sure that everything you pull after that is
>> authentic. The introduction only needs to be trusted once. If you're
>> bumping the introduction every time, then you need to obtain and
>> verify the introduction every time. You're going from 'Trust On First
>> Use' to 'Trust On Every Use'. Not ideal IMO.
> Let's recall that the entity you need to trust is still yourself in
> most of those cases.
If you host your repo unauthenticated on a server, you need to fully
trust the server, as well as the connection between you and the server.
Regarding the former, none of the most popular ways to host a git repo
(eg. GitHub, Codeberg, your own forge instance on a VPS) allow you to
know much about the underlying server, so you can't really assume it to
be secure. The latter is a ridiculously complicated topic that I'm not
qualified to go into. To avoid trusting all these intermediaries more
than once if at all, we have authentication.
I realise it may seem silly to worry about your own little fork being
directly targeted in ways like this, but the main reason I chose Guix in
the first place is the focus on getting the fundamentals right -
reproducibility, bootstrappability, free software, etc. - even though
most projects don't put in as much effort towards them, and even though
a lot of users may not be directly affected by these things. I think
security is one such thing. As the 'Authenticate Your Git Checkouts'
blog post [9] pointed out, we wouldn't need `guix git authenticate` if
we were willing to delegate our security to a trusted third party, like
all the open source projects that sport those "fancy “? verified”
badges as found on GitLab and on GitHub" do. We shouldn't force anyone
hosting a fork to do so as well.
Toggle quote (30 lines)
>> You could do it like this:
>> 0) Before creating your fork, authenticate every commit in the Guix
>> checkout (as described in the manual).
>> 1) Switch to your branch that tracks upstream.
>> 2) Pull from upstream.
>> 3) Run `guix git authenticate`, supplying Guix's channel introduction
>> as
>> arguments.
>> 4) After this succeeds, create and switch to a branch from the
>> current
>> tip of your upstream-tracking branch. Edit .guix_authorizations to
>> add your key, and create a signed commit.
>> 5) Merge this branch into your fork branch.
>> 6) Switch back to your fork branch.
>> 7) Delete the [guix "authentication"] section from .git/config.
>> 8) Run `guix git authenticate` with the introduction of your fork
>> branch, to authenticate the merge commit.
>>
>> That's a lot of manual steps for every pull from upstream! While I do
>> have to give you credit for this idea - at least we now have a
>> workaround for people who are determined enough - I'm guessing a lot
>> of people will probably just skip authentication if it's going to be
>> this annoying. Authenticating a fresh clone from scratch will be even
>> more annoying, especially if you have multiple fork branches (eg.
>> you're tracking someone else's fork).
> I think you're making this more complicated than it needs to be.
> checkout, authenticate, rebase*, merge* ought to have you covered.
>
> * you can authenticate after these if you're paranoid
The complexity is due to the requirements of not bumping the channel
introduction (to avoid the increased attack surface from having to keep
obtaining the updated one, as I discussed earlier), keeping fork history
intact (to avoid force pulls), keeping upstream history intact, and
being able to authenticate both upstream and fork commits. If you can
think of a simpler method that meets these requirements, I'd love to
hear it.
Also, I just realised that this one won't even work. The commit created
in step 4 cannot be authenticated, as it's signed with your key, which
is not in its parent's .guix_authorizations.
Toggle quote (13 lines)
>> We could create a script to do all the steps for us, but if and when
>> it fails on whatever insane edge cases people are able to come up
>> with, they're going to need to understand all the steps involved
>> anyway. Abstraction is not a substitute for a clean underlying
>> design.
>>
>> Also I just want to point out that rebasing /will/ change the
>> history.
>> The `guix pull` after every time you update your fork will need to be
>> a force-pull (--allow-downgrades [1]).
> No, it wouldn't. You would rebase those changes on top of what you
> already have on those respective branches.
It looks like at least one of us is misunderstanding rebasing. Could be
me, so I'm consulting the relevant chapter from the Pro Git book [11]
for a refresher.
Let's imagine that the first example given there represents our fork of
Guix, where the 'experiment' branch marks the beginning of our fork (and
its channel introduction) and the 'master' branch tracks upstream Guix.
After `git rebase master`, the commit that used to be C4 is gone, and
now C4' takes its place. It may contain the same changes, but it's a
different commit - so it (and any commits that it's the parent of) has a
different hash. So the channel introduction has changed, and so has the
entire history of the `experimental` branch. So we need to force-pull.
Toggle quote (18 lines)
>> > Of course, you can also keep your own fork unauthenticated, which
>> > might be preferable if you only do local work anyway, but that's
>> > besides the issue here.
>>
>> Yes, to be clear, I'm talking about the use-case where your fork is
>> hosted remotely, and you or someone else needs to pull changes from
>> it. For example, my prospective use case would be quickly
>> bootstrapping Guix on a new machine - I build my own installation
>> image, and I'd want it to pull from my fork. I can include my
>> introduction into my installer, just like the official one. But if
>> the introduction changes before I use my installer, then the first
>> pull can't be authenticated.
> I don't see why in your particular use case you can not use a channel
> on top of Guix rather than replicating Guix itself. Now there might be
> some weird edge case I'm overlooking where you cut deep into the
> dependency graph and that makes sense, but I sure hope that's a rare
> edge case in and of itself.
See Tomas's reply [10]. I'll continue this particular tangent in that
sub-thread.
Toggle quote (31 lines)
>> The purpose of the additional introductions is to make it so that
>> signed commits from upstream Guix, or commits from other people's
>> forks, can still be authenticated. As I mentioned above, the current
>> design is not suited to this.
>>
>> To go a bit more into detail - we will accomplish authentication by
>> doing a postorder traversal of the commit tree, considering the
>> latest commit as the root node. We traverse its parents recursively
>> until we reach a commit whose parent is one of the channel
>> introductions (primary or additional). Then that commit and all its
>> children are authenticated from the introduction that we encountered.
>> In this way, every commit is authenticated from the introduction that
>> is its most recent ancestor.
> Yeah, I think this scheme will still end up in [4]. As pointed out in
> [8], "primary" is just a convention that we can't rely on. So let's
> just talk about the idea of widening one channel introduction to any
> number of channel introductions – we can always store a mapping of HEAD
> → first authenticated commit and then assert that this set is a subset
> of what we declare as introductions. (This mapping will also make
> authentication as efficient as it currently is, since we don't need to
> reauthenticate everything all the time.)
>
> Is this good enough? No: an attacker could easily add their own
> introduction and call it a day. In fact, this scheme is even worse
> than what was exploited in [4], because they never need commit access
> to the Guix repo to do so. Ahh, but wait! `guix pull` on the user's
> side uses their clean set of channels for authentication. Those only
> have upstream Guix… unless you actually pull your own fork or manage an
> attack as outlined below (in which case you do need commit access for
> some amount of time).
Whew. Ok, before I can reply directly to this, I need to discuss a few
related things.
First of all, let's talk about [8]. It isn't part of this thread so I'll
quote the relevant part here:
Toggle quote (28 lines)
> Problem here is that this (which parent is first) is just a convention
> that the attacker does not have to follow. Example:
>
> --8<---------------cut here---------------start------------->8---
> /tmp/xx $ git commit-tree -p HEAD -p HEAD~1 -m M HEAD^{tree}
> c040e61bc184b5971f68c4b794c3158350b5d5e9
> /tmp/xx $ g show c040e61bc184b5971f68c4b794c3158350b5d5e9
> commit c040e61bc184b5971f68c4b794c3158350b5d5e9
> Merge: 40ef875 17451b8
> Author: Tomas Volf <~@wolfsden.cz>
> Date: Tue Jan 14 23:12:17 2025 +0100
>
> M
>
> /tmp/xx $ git commit-tree -p HEAD~1 -p HEAD -m M HEAD^{tree}
> ec74e368519b667d8d280639db6642b28d37eb53
> /tmp/xx $ g show ec74e368519b667d8d280639db6642b28d37eb53
> commit ec74e368519b667d8d280639db6642b28d37eb53
> Merge: 17451b8 40ef875
> Author: Tomas Volf <~@wolfsden.cz>
> Date: Tue Jan 14 23:12:32 2025 +0100
>
> M
> --8<---------------cut here---------------end--------------->8---
>
> Notice that I have created two commits, and they have the same parents,
> just in swapped order.
Here, Tomas is presumably reacting to Condition 2b in my procedure for
authenticating merge commits, which I will quote here again:
Toggle quote (20 lines)
> For commits that have multiple parents - ie. merge commits - we weaken
> the invariant as follows:
>
> 1. If all parents have the primary introduction as their most recent
> ancestor, then the invariant holds as usual.
>
> 2. If one or more parents has the primary introduction as its most
> recent ancestor (call these the 'primary parents'), and the rest have
> any of the additional introductions, then the merge commit is
> authenticated if and only if:
> a) it's signed by a key authorized in all of the primary parents, AND
> b) the /first parent/ [^] of the merge commit is a primary parent.
>
> 3. If all parents have the same additional introduction as their most
> recent ancestor, then the invariant holds as usual.
>
> 4. If none of the parents have the primary introduction as their most
> recent ancestor, nor do they have the same additional introduction,
> then the merge commit cannot be authenticated.
Now, it turns out that the parent order in a merge commit isn't actually
the relevant detail here. The parent order is a /UI detail/: it's a
convention that helps indicate in which direction a branch was merged
(and possibly other things), so that `git log` can show this to us, but
it doesn't actually affect the internal representation of the commit
graph.
The relevant detail is the fact that Tomas's observation should lead us
to remember - a Git commit graph doesn't include any information about
'merge order', ie. 'which branch was merged into which'. In fact it
doesn't include any information about /branches/ - those are just refs
that can be made to point to whatever commit you want, they are not part
of the commit graph.
Once we realise this, we can see that trying to control which branch can
be merged into which doesn't make sense.
This led my to think of an attack that's possible with my design - if I
want to screw with anyone `guix pull`ing from my fork, I can merge
upstream into my fork branch, add a bunch of malicious commits, and then
make the upstream branch ref point to the latest such commit. Now anyone
pulling from my fork will recieve the malicious commits as part of
upstream's history - since no commit hashes needed to change, the pull
is a regular fast-forward one, with no indication that anything is
amiss. Authentication will succeed since the malicious merge commit has
our fork as its (first) parent, and that parent has the primary
introduction as its most recent ancestor.
The takeaway here is that anyone authorized via the primary introduction
can fake new upstream commits.
So why bother with additional introductions at all, then? Because as far
as I can tell, they are still the only solution mentioned so far that
satisfies the requirements I mentioned earlier:
Toggle quote (5 lines)
> not bumping the channel introduction (to avoid the increased attack
> surface from having to keep obtaining the updated one, as I discussed
> earlier), keeping fork history intact (to avoid force pulls), keeping
> upstream history intact, and being able to authenticate both upstream
> and fork commits
...and yes, you do have to trust the fork maintainer to not deliberately
mess those things up. But that's nothing new - even in the existing
design, we have to trust everyone who can make trusted commits not to
mess things up on purpose.
So what does this all of this mean for the statement of my design? Well,
it means that we need to stop thinking in terms of 'which branch can be
merged into which?' and more in terms of 'which merge commits can be
authenticated?'. And the answer to that question, with my design, would
be:
1. Any merge commit signed with a key in the intersection of its
parents' .guix_authorizations. (Standard authorization invariant.)
2. Any merge commit that doesn't meet the above conditions, but has a
parent whose most recent ancestor is the primary introduction, and is
signed by a key in the .guix_authorizations of that parent. (My
weakened authorization invariant.)
Finally, let me restate the conditions for authenticating merge commits,
taking this into account:
Toggle snippet (20 lines)
For commits that have multiple parents - ie. merge commits - we weaken
the invariant as follows:
1. If all parents have the primary introduction as their most recent
ancestor, then the invariant holds as usual.
2. If one or more parents has the primary introduction as its most
recent ancestor (call these the 'primary parents'), and the rest have
any of the additional introductions, then the merge commit is
authenticated if and only if it's signed by a key authorized in all
of the primary parents.
3. If all parents have the same additional introduction as their most
recent ancestor, then the invariant holds as usual.
4. If none of the parents have the primary introduction as their most
recent ancestor, nor do they have the same additional introduction,
then the merge commit cannot be authenticated.
I merged 2a. into 2., and removed 2b.
Now let me try to respond to you:
Toggle quote (3 lines)
> Yeah, I think this scheme will still end up in [4]. As pointed out in
> [8], "primary" is just a convention that we can't rely on.
Not really. As I discussed, [8] points out that /merge order/ is the
convention that we can't rely on. Introductions can be deliberately
specified as primary or additional, whether via command-line flags or
separate sections in .git/config.
Toggle quote (7 lines)
> So let's just talk about the idea of widening one channel introduction
> to any number of channel introductions – we can always store a mapping
> of HEAD → first authenticated commit and then assert that this set is
> a subset of what we declare as introductions. (This mapping will also
> make authentication as efficient as it currently is, since we don't
> need to reauthenticate everything all the time.)
I'm not sure what you mean. What do you mean by "mapping of HEAD → first
authenticated commit"? Does this perhaps mean 'all commits between the
latest one and the first authenticated commit'?
What does "assert that this set is a subset of what we declare as
introductions" mean?
Toggle quote (9 lines)
> Is this good enough? No: an attacker could easily add their own
> introduction and call it a day. In fact, this scheme is even worse
> than what was exploited in [4], because they never need commit access
> to the Guix repo to do so. Ahh, but wait! `guix pull` on the user's
> side uses their clean set of channels for authentication. Those only
> have upstream Guix… unless you actually pull your own fork or manage an
> attack as outlined below (in which case you do need commit access for
> some amount of time).
I should point out - my design does not require us to distribute any
introductions besides Guix's existing one, nor will it provide any
mechanism to automatically 'install' someone else's introduction. An
introduction is a tuple of (introductory commit, key that signs it) that
you specify as arguments to `guix git authenticate`. An attacker would
have to convince the entire Guix community to specify their (the
attacker's) own introduction on the command line (or directly add it
into .git/config). And given that there is no reason to ever do so
unless you're using someone's fork... that's a hard sell.
Perhaps I should have mentioned this when you suggested the attack below
in the first place.
Toggle quote (27 lines)
>> > I think this might still hide a serious flaw. With the way
>> > *upstream* authentication works. Let's flip the example in [6]
>> > around a little bit and construct the following:
>> >
>> > -A---B---C---D
>> > \ \
>> > \ \-E---F---?
>> > \ /
>> > \----G--H--I*-/
>> >
>> > Both A and I* are introductory commits on their various branches.
>> > In ?, any committer who has valid keys in both F and I* can merge
>> > a branch with unsigned commits, effectively voiding the invariant
>> > of BCEF, e.g. by undoing any changes that happened there. Of
>> > course, they can do so with signed commits as well, given that they
>> > have commit access to the main repository, but the point still
>> > holds that they may introduce unsigned commits to any fork where
>> > their key is valid in.
>>
>> So, my design enables an attacker who can make authorized signed
>> commits to also introduce changes made in unsigned commits. Hmm.
>>
>> I don't think this compromises our current security guarantees,
>> though?
> I mean, the promise we do make is that all commits starting from a
> certain commit are signed. So IMHO, this effectively breaks that :)
Again, you need to deliberately use the attacker's introduction for this
to work. Unless you're pulling from their fork (in which case you
already trust them), there's no reason for them to ask you do so.
Toggle quote (9 lines)
>> If the attacker can already make trusted commits, then any attack
>> they can perform in the way you described can also be done directly
>> with signed commits onto F, as you pointed out. And the latter way
>> would be far simpler for them.
> Simpler, yes, but less stealthy. Most contributors don't concern
> themselves with the specifics of any particular branch, and you may
> even be able to dress up your evil branch as a good branch until the
> point where you finally merge it.
See above. We will never need to specify more than one introduction for
the main Guix repo, so this doesn't come up. We're not trying to enable
pull-request-style workflows within Guix; we're just trying to permit
authenticated forks.
Toggle quote (11 lines)
>> Also, the branch they merged into would not contain any unsigned
>> commits; the commit '?' is still signed with a key authorized for
>> F's branch. So at most, we can say that the attacker can introduce
>> /changes made in/ unsigned commits, not 'introduce unsigned commits'.
> They can make an arbitrary number of unsigned commits before needing to
> sign off one commit that will be merged. If they follow the style of
> merging master into their branch and then their branch into master,
> said commit can even be empty, though that would no longer be stealthy.
> Now if they were to I don't know, bump 9000 Rust packages or something
> like that, they have a lot of space to exploit the as-of-yet in this
> manner unexploited, but still weak SHA-1 ha