This Linux kernel change "blk-mq: fix a hung issue when fsync" is included in the Linux 5.0 release. This change is authored by Jianchao Wang <jianchao.w.wang [at] oracle.com> on Wed Jan 30 17:01:56 2019 +0800. The commit for this change in Linux stable tree is 85bd6e6 (patch).
blk-mq: fix a hung issue when fsync Florian reported a io hung issue when fsync(). It should be triggered by following race condition. data + post flush a flush blk_flush_complete_seq case REQ_FSEQ_DATA blk_flush_queue_rq issued to driver blk_mq_dispatch_rq_list try to issue a flush req failed due to NON-NCQ command .queue_rq return BLK_STS_DEV_RESOURCE request completion req->end_io // doesn't check RESTART mq_flush_data_end_io case REQ_FSEQ_POSTFLUSH blk_kick_flush do nothing because previous flush has not been completed blk_mq_run_hw_queue insert rq to hctx->dispatch due to RESTART is still set, do nothing To fix this, replace the blk_mq_run_hw_queue in mq_flush_data_end_io with blk_mq_sched_restart to check and clear the RESTART flag. Fixes: bd166ef1 (blk-mq-sched: add framework for MQ capable IO schedulers) Reported-by: Florian Stecker <[email protected]> Tested-by: Florian Stecker <[email protected]> Signed-off-by: Jianchao Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
There are 2 lines of Linux source code added/deleted in this change. Code changes to Linux kernel are as follows.
block/blk-flush.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-flush.c b/block/blk-flush.c index a3fc7191..6e0f2d9 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -335,7 +335,7 @@ static void mq_flush_data_end_io(struct request *rq, blk_status_t error) blk_flush_complete_seq(rq, fq, REQ_FSEQ_DATA, error); spin_unlock_irqrestore(&fq->mq_flush_lock, flags); - blk_mq_run_hw_queue(hctx, true); + blk_mq_sched_restart(hctx); } /**