ext4: fix data corruption caused by unaligned direct AIO [Linux 4.9.166]

ext4: fix data corruption caused by unaligned direct AIO [Linux 4.9.166]

This Linux kernel change "ext4: fix data corruption caused by unaligned direct AIO" is included in the Linux 4.9.166 release. This change is authored by Lukas Czerner <lczerner [at] redhat.com> on Thu Mar 14 23:20:25 2019 -0400. The commit for this change in Linux stable tree is 8651fa1 (patch) which is from upstream commit 372a03e. The same Linux upstream change may have been applied to various maintained Linux releases and you can find all Linux releases containing changes from upstream 372a03e.

ext4: fix data corruption caused by unaligned direct AIO

commit 372a03e01853f860560eade508794dd274e9b390 upstream.

Ext4 needs to serialize unaligned direct AIO because the zeroing of
partial blocks of two competing unaligned AIOs can result in data
corruption.

However it decides not to serialize if the potentially unaligned aio is
past i_size with the rationale that no pending writes are possible past
i_size. Unfortunately if the i_size is not block aligned and the second
unaligned write lands past i_size, but still into the same block, it has
the potential of corrupting the previous unaligned write to the same
block.

This is (very simplified) reproducer from Frank

    // 41472 = (10 * 4096) + 512
    // 37376 = 41472 - 4096

    ftruncate(fd, 41472);
    io_prep_pwrite(iocbs[0], fd, buf[0], 4096, 37376);
    io_prep_pwrite(iocbs[1], fd, buf[1], 4096, 41472);

    io_submit(io_ctx, 1, &iocbs[1]);
    io_submit(io_ctx, 1, &iocbs[2]);

    io_getevents(io_ctx, 2, 2, events, NULL);

Without this patch the 512B range from 40960 up to the start of the
second unaligned write (41472) is going to be zeroed overwriting the data
written by the first write. This is a data corruption.

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
*
0000a000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31

With this patch the data corruption is avoided because we will recognize
the unaligned_aio and wait for the unwritten extent conversion.

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
*
00009200  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 30
*
0000a200  31 31 31 31 31 31 31 31  31 31 31 31 31 31 31 31
*
0000b200

Reported-by: Frank Sorenson <fsorenso@redhat.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fixes: e9e3bcecf44c ("ext4: serialize unaligned asynchronous DIO")
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

There are 2 lines of Linux source code added/deleted in this change. Code changes to Linux kernel are as follows.

 fs/ext4/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 08fca4a..fe76d09 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -79,7 +79,7 @@ static void ext4_unwritten_wait(struct inode *inode)
    struct super_block *sb = inode->i_sb;
    int blockmask = sb->s_blocksize - 1;

-   if (pos >= i_size_read(inode))
+   if (pos >= ALIGN(i_size_read(inode), sb->s_blocksize))
        return 0;

    if ((pos | iov_iter_alignment(from)) & blockmask)

Leave a Reply

Your email address will not be published. Required fields are marked *