zram: remove BD_CAP_SYNCHRONOUS_IO with writeback feature [Linux 4.18]

This Linux kernel change "zram: remove BD_CAP_SYNCHRONOUS_IO with writeback feature" is included in the Linux 4.18 release. This change is authored by Minchan Kim <minchan [at] kernel.org> on Fri Aug 10 17:23:10 2018 -0700. The commit for this change in Linux stable tree is 4f7a7be (patch).

zram: remove BD_CAP_SYNCHRONOUS_IO with writeback feature

If zram supports writeback feature, it's no longer a
BD_CAP_SYNCHRONOUS_IO device beause zram does asynchronous IO operations
for incompressible pages.

Do not pretend to be synchronous IO device.  It makes the system very
sluggish due to waiting for IO completion from upper layers.

Furthermore, it causes a user-after-free problem because swap thinks the
opearion is done when the IO functions returns so it can free the page
(e.g., lock_page_or_retry and goto out_release in do_swap_page) but in
fact, IO is asynchronous so the driver could access a just freed page
afterward.

This patch fixes the problem.

  BUG: Bad page state in process qemu-system-x86  pfn:3dfab21
  page:ffffdfb137eac840 count:0 mapcount:0 mapping:0000000000000000 index:0x1
  flags: 0x17fffc000000008(uptodate)
  raw: 017fffc000000008 dead000000000100 dead000000000200 0000000000000000
  raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
  bad because of flags: 0x8(uptodate)
  CPU: 4 PID: 1039 Comm: qemu-system-x86 Tainted: G    B 4.18.0-rc5+ #1
  Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017
  Call Trace:
    dump_stack+0x5c/0x7b
    bad_page+0xba/0x120
    get_page_from_freelist+0x1016/0x1250
    __alloc_pages_nodemask+0xfa/0x250
    alloc_pages_vma+0x7c/0x1c0
    do_swap_page+0x347/0x920
    __handle_mm_fault+0x7b4/0x1110
    handle_mm_fault+0xfc/0x1f0
    __get_user_pages+0x12f/0x690
    get_user_pages_unlocked+0x148/0x1f0
    __gfn_to_pfn_memslot+0xff/0x3c0 [kvm]
    try_async_pf+0x87/0x230 [kvm]
    tdp_page_fault+0x132/0x290 [kvm]
    kvm_mmu_page_fault+0x74/0x570 [kvm]
    kvm_arch_vcpu_ioctl_run+0x9b3/0x1990 [kvm]
    kvm_vcpu_ioctl+0x388/0x5d0 [kvm]
    do_vfs_ioctl+0xa2/0x630
    ksys_ioctl+0x70/0x80
    __x64_sys_ioctl+0x16/0x20
    do_syscall_64+0x55/0x100
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Link: https://lore.kernel.org/lkml/[email protected]/
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: fix changelog, add comment]
 Link: https://lore.kernel.org/lkml/[email protected]/
 Link: http://lkml.kernel.org/r/[email protected]
 Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: coding-style fixes]
Signed-off-by: Minchan Kim <min[email protected]>
Reported-by: Tino Lehnig <[email protected]>
Tested-by: Tino Lehnig <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: <[email protected]>    [4.15+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

There are 15 lines of Linux source code added/deleted in this change. Code changes to Linux kernel are as follows.

 drivers/block/zram/zram_drv.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 7436b2d..a390c6d 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -298,7 +298,8 @@ static void reset_bdev(struct zram *zram)
    zram->backing_dev = NULL;
    zram->old_block_size = 0;
    zram->bdev = NULL;
-
+   zram->disk->queue->backing_dev_info->capabilities |=
+               BDI_CAP_SYNCHRONOUS_IO;
    kvfree(zram->bitmap);
    zram->bitmap = NULL;
 }
@@ -400,6 +401,18 @@ static ssize_t backing_dev_store(struct device *dev,
    zram->backing_dev = backing_dev;
    zram->bitmap = bitmap;
    zram->nr_pages = nr_pages;
+   /*
+    * With writeback feature, zram does asynchronous IO so it's no longer
+    * synchronous device so let's remove synchronous io flag. Othewise,
+    * upper layer(e.g., swap) could wait IO completion rather than
+    * (submit and return), which will cause system sluggish.
+    * Furthermore, when the IO function returns(e.g., swap_readpage),
+    * upper layer expects IO was done so it could deallocate the page
+    * freely but in fact, IO is going on so finally could cause
+    * use-after-free when the IO is really done.
+    */
+   zram->disk->queue->backing_dev_info->capabilities &=
+           ~BDI_CAP_SYNCHRONOUS_IO;
    up_write(&zram->init_lock);

    pr_info("setup backing device %s\n", file_name);

Leave a Reply

Your email address will not be published. Required fields are marked *