fix __legitimize_mnt()/mntput() race [Linux 4.18]

This Linux kernel change "fix __legitimize_mnt()/mntput() race" is included in the Linux 4.18 release. This change is authored by Al Viro <viro [at] zeniv.linux.org.uk> on Thu Aug 9 17:51:32 2018 -0400. The commit for this change in Linux stable tree is 119e1ef (patch).

fix __legitimize_mnt()/mntput() race

__legitimize_mnt() has two problems - one is that in case of success
the check of mount_lock is not ordered wrt preceding increment of
refcount, making it possible to have successful __legitimize_mnt()
on one CPU just before the otherwise final mntpu() on another,
with __legitimize_mnt() not seeing mntput() taking the lock and
mntput() not seeing the increment done by __legitimize_mnt().
Solved by a pair of barriers.

Another is that failure of __legitimize_mnt() on the second
read_seqretry() leaves us with reference that'll need to be
dropped by caller; however, if that races with final mntput()
we can end up with caller dropping rcu_read_lock() and doing
mntput() to release that reference - with the first mntput()
having freed the damn thing just as rcu_read_lock() had been
dropped.  Solution: in "do mntput() yourself" failure case
grab mount_lock, check if MNT_DOOMED has been set by racing
final mntput() that has missed our increment and if it has -
undo the increment and treat that as "failure, caller doesn't
need to drop anything" case.

It's not easy to hit - the final mntput() has to come right
after the first read_seqretry() in __legitimize_mnt() *and*
manage to miss the increment done by __legitimize_mnt() before
the second read_seqretry() in there.  The things that are almost
impossible to hit on bare hardware are not impossible on SMP
KVM, though...

Reported-by: Oleg Nesterov <[email protected]>
Fixes: 48a066e72d97 ("RCU'd vsfmounts")
Cc: [email protected]
Signed-off-by: Al Viro <[email protected]>

There are 14 lines of Linux source code added/deleted in this change. Code changes to Linux kernel are as follows.

 fs/namespace.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index d46a951..bd2f4c6 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -659,12 +659,21 @@ int __legitimize_mnt(struct vfsmount *bastard, unsigned seq)
        return 0;
    mnt = real_mount(bastard);
    mnt_add_count(mnt, 1);
+   smp_mb();           // see mntput_no_expire()
    if (likely(!read_seqretry(&mount_lock, seq)))
        return 0;
    if (bastard->mnt_flags & MNT_SYNC_UMOUNT) {
        mnt_add_count(mnt, -1);
        return 1;
    }
+   lock_mount_hash();
+   if (unlikely(bastard->mnt_flags & MNT_DOOMED)) {
+       mnt_add_count(mnt, -1);
+       unlock_mount_hash();
+       return 1;
+   }
+   unlock_mount_hash();
+   /* caller will mntput() */
    return -1;
 }

@@ -1210,6 +1219,11 @@ static void mntput_no_expire(struct mount *mnt)
        return;
    }
    lock_mount_hash();
+   /*
+    * make sure that if __legitimize_mnt() has not seen us grab
+    * mount_lock, we'll see their refcount increment here.
+    */
+   smp_mb();
    mnt_add_count(mnt, -1);
    if (mnt_get_count(mnt)) {
        rcu_read_unlock();

Leave a Reply

Your email address will not be published. Required fields are marked *