- Mar 16, 2023
-
-
Eric Biggers authored
Once all I/O using a blk_crypto_key has completed, filesystems can call blk_crypto_evict_key(). However, the block layer currently doesn't call blk_crypto_put_keyslot() until the request is being freed, which happens after upper layers have been told (via bio_endio()) the I/O has completed. This causes a race condition where blk_crypto_evict_key() can see 'slot_refs != 0' without there being an actual bug. This makes __blk_crypto_evict_key() hit the 'WARN_ON_ONCE(atomic_read(&slot->slot_refs) != 0)' and return without doing anything, eventually causing a use-after-free in blk_crypto_reprogram_all_keys(). (This is a very rare bug and has only been seen when per-file keys are being used with fscrypt.) There are two options to fix this: either release the keyslot before bio_endio() is called on the request's last bio, or make __blk_crypto_evict_key() ignore slot_refs. Let's go with the first solution, since it preserves the ability to report bugs (via WARN_ON_ONCE) where a key is evicted while still in-use. Fixes: a892c8d5 ("block: Inline encryption support for blk-mq") Cc: stable@vger.kernel.org Reviewed-by:
Nathan Huckleberry <nhuck@google.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230315183907.53675-2-ebiggers@kernel.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 03, 2023
-
-
Uday Shankar authored
The block layer might merge together discard requests up until the max_discard_segments limit is hit, but blk_insert_cloned_request checks the segment count against max_segments regardless of the req op. This can result in errors like the following when discards are issued through a DM device and max_discard_segments exceeds max_segments for the queue of the chosen underlying device. blk_insert_cloned_request: over max segments limit. (256 > 129) Fix this by looking at the req_op and enforcing the appropriate segment limit - max_discard_segments for REQ_OP_DISCARDs and max_segments for everything else. Signed-off-by:
Uday Shankar <ushankar@purestorage.com> Reviewed-by:
Keith Busch <kbusch@kernel.org> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230301000655.48112-1-ushankar@purestorage.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 17, 2023
-
-
Jens Axboe authored
kernel test robot complains about a type mismatch: block/blk-merge.c:984:42: sparse: expected restricted blk_opf_t const [usertype] ff block/blk-merge.c:984:42: sparse: got unsigned int block/blk-merge.c:1010:42: sparse: sparse: incorrect type in initializer (different base types) @@ expected restricted blk_opf_t const [usertype] ff @@ got unsigned int @@ block/blk-merge.c:1010:42: sparse: expected restricted blk_opf_t const [usertype] ff block/blk-merge.c:1010:42: sparse: got unsigned int because bio_failfast() is return an unsigned int rather than the appropriate blk_opt_f type. Fix it up. Fixes: 3ce6a115 ("block: sync mixed merged request's failfast with 1st bio's") Reported-by:
kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202302170743.GXypM9Rt-lkp@intel.com/ Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 16, 2023
-
-
Ming Lei authored
We support mixed merge for requests/bios with different fastfail settings. When request fails, each time we only handle the portion with same failfast setting, then bios with failfast can be failed immediately, and bios without failfast can be retried. The idea is pretty good, but the current implementation has several defects: 1) initially RA bio doesn't set failfast, however bio merge code doesn't consider this point, and just check its failfast setting for deciding if mixed merge is required. Fix this issue by adding helper of bio_failfast(). 2) when merging bio to request front, if this request is mixed merged, we have to sync request's faifast setting with 1st bio's failfast. Fix it by calling blk_update_mixed_merge(). 3) when merging bio to request back, if this request is mixed merged, we have to mark the bio as failfast, because blk_update_request simply updates request failfast with 1st bio's failfast. Fix it by calling blk_update_mixed_merge(). Fixes one normal EXT4 READ IO failure issue, because it is observed that the normal READ IO is merged with RA IO, and the mixed merged request has different failfast setting with 1st bio's, so finally the normal READ IO doesn't get retried. Cc: Tejun Heo <tj@kernel.org> Fixes: 80a761fd ("block: implement mixed merge of different failfast requests") Signed-off-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230209125527.667004-1-ming.lei@redhat.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 15, 2023
-
-
Christoph Hellwig authored
bio_split_rw can be used by file systems to split and incoming write bio into multiple bios fitting the hardware limit for use as ZONE_APPEND bios. Export it for initial use in btrfs. Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
- Jan 04, 2023
-
-
Jens Axboe authored
If we split a bio marked with REQ_NOWAIT, then we can trigger spurious EAGAIN if constituent parts of that split bio end up failing request allocations. Parts will complete just fine, but just a single failure in one of the chained bios will yield an EAGAIN final result for the parent bio. Return EAGAIN early if we end up needing to split such a bio, which allows for saner recovery handling. Cc: stable@vger.kernel.org # 5.15+ Link: https://github.com/axboe/liburing/issues/766 Reported-by:
Michael Kelley <mikelley@microsoft.com> Reviewed-by:
Keith Busch <kbusch@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This can't happen right now, but in preparation for allowing bio_split_to_limits() returning NULL if it ended the bio, check for it in all the callers. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 25, 2022
-
-
Bart Van Assche authored
This patch removes a conditional jump from get_max_segment_size(). The x86-64 assembler code for this function without this patch is as follows: 206 return min_not_zero(mask - offset + 1, 0x0000000000000118 <+72>: not %rax 0x000000000000011b <+75>: and 0x8(%r10),%rax 0x000000000000011f <+79>: add $0x1,%rax 0x0000000000000123 <+83>: je 0x138 <bvec_split_segs+104> 0x0000000000000125 <+85>: cmp %rdx,%rax 0x0000000000000128 <+88>: mov %rdx,%r12 0x000000000000012b <+91>: cmovbe %rax,%r12 0x000000000000012f <+95>: test %rdx,%rdx 0x0000000000000132 <+98>: mov %eax,%edx 0x0000000000000134 <+100>: cmovne %r12d,%edx With this patch applied: 206 return min(mask - offset, (unsigned long)lim->max_segment_size - 1) + 1; 0x000000000000003f <+63>: mov 0x28(%rdi),%ebp 0x0000000000000042 <+66>: not %rax 0x0000000000000045 <+69>: and 0x8(%rdi),%rax 0x0000000000000049 <+73>: sub $0x1,%rbp 0x000000000000004d <+77>: cmp %rbp,%rax 0x0000000000000050 <+80>: cmova %rbp,%rax 0x0000000000000054 <+84>: add $0x1,%eax Reviewed-by:
Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Keith Busch <kbusch@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221025191755.1711437-4-bvanassche@acm.org Reviewed-by:
Keith Busch <kbusch@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
Document which functions do not modify the queue limits. Reviewed-by:
Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Keith Busch <kbusch@kernel.org> Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221025191755.1711437-3-bvanassche@acm.org Reviewed-by:
Keith Busch <kbusch@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Aug 03, 2022
-
-
Christoph Hellwig authored
Allow using the splitting helpers on just a queue_limits instead of a full request_queue structure. This will eventually allow file systems or remapping drivers to split REQ_OP_ZONE_APPEND bios based on limits calculated as the minimum common capabilities over multiple devices. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220727162300.3089193-7-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Move this helper into the only file where it is used. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220727162300.3089193-6-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Prepare for reusing blk_bio_segment_split for (file system controlled) splits of REQ_OP_ZONE_APPEND bios by letting the caller control the maximum size of the bio. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220727162300.3089193-5-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Only non-passthrough requests are split by the block layer and use the ->bio_split bio_set. Move it from the request_queue to the gendisk. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220727162300.3089193-4-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Aug 02, 2022
-
-
Christoph Hellwig authored
The double indirect bio leads to somewhat suboptimal code generation. Instead return the (original or split) bio, and make sure the request_queue arguments to the lower level helpers is passed after the bio to avoid constant reshuffling of the argument passing registers. Also give it and the helpers used to implement it more descriptive names. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220727162300.3089193-2-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jul 14, 2022
-
-
Bart Van Assche authored
Use the new blk_opf_t type for arguments and variables that represent request flags or a bitwise combination of a request operation and request flags. Rename the function arguments and also a structure member that hold a request operation and flags from 'rw' into 'opf'. This patch does not change any functionality. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-7-bvanassche@acm.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
Improve static type checking by changing the type of the value returned by req_op() and bio_op() from unsigned int into enum req_op. Insert 'default: break;' in switch statements on the enum req_op type to prevent that the compiler warns about these switch statements. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Tim Waugh <tim@cyberelk.net> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-5-bvanassche@acm.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Muchun Song authored
The commit 51361684 ("block: remove superfluous calls to blkcg_bio_issue_init") has removed blkcg_bio_issue_init from __bio_clone since submit_bio will override ->bi_issue. However, __blk_queue_split is called after blkcg_bio_issue_init (see blk_mq_submit_bio) in submit_bio. In this case, the ->bi_issue is 0. Fix it. Fixes: 51361684 ("block: remove superfluous calls to blkcg_bio_issue_init") Signed-off-by:
Muchun Song <songmuchun@bytedance.com> Link: https://lore.kernel.org/r/20220713140226.68135-1-songmuchun@bytedance.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jul 06, 2022
-
-
Christoph Hellwig authored
Drop the unused q argument, and invert the check to move the exception into a branch and the regular path as the normal return. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-5-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jun 27, 2022
-
-
Christoph Hellwig authored
Now that blk_max_size_offset has a single caller left, fold it into that and clean up the naming convention for the local variables there. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Pankaj Raghav <p.raghav@samsung.com> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220614090934.570632-6-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
get_max_io_size has a very odd choice of variables names and initialization patterns. Switch to more descriptive names and more clear initialization of them. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220614090934.570632-5-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
blk_rq_get_max_sectors always uses q->limits.chunk_sectors as the chunk_sectors argument, and already checks for max_sectors through the call to blk_queue_get_max_sectors. That means much of blk_max_size_offset is not needed and open coding it simplifies the code. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220614090934.570632-4-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Keith Busch authored
Individual bv_len's may not be a sector size. Signed-off-by:
Keith Busch <kbusch@kernel.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220610195830.3574005-7-kbusch@fb.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 15, 2022
-
-
Tejun Heo authored
blk-iocost and iolatency are cgroup aware rq-qos policies but they didn't disable merges across different cgroups. This obviously can lead to accounting and control errors but more importantly to priority inversions - e.g. an IO which belongs to a higher priority cgroup or IO class may end up getting throttled incorrectly because it gets merged to an IO issued from a low priority cgroup. Fix it by adding blk_cgroup_mergeable() which is called from merge paths and rejects cross-cgroup and cross-issue_as_root merges. Signed-off-by:
Tejun Heo <tj@kernel.org> Fixes: d7067512 ("block: introduce blk-iolatency io controller") Cc: stable@vger.kernel.org # v4.19+ Cc: Josef Bacik <jbacik@fb.com> Link: https://lore.kernel.org/r/Yi/eE/6zFNyWJ+qd@slm.duckdns.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 11, 2022
-
-
Jens Axboe authored
Song reports that a RAID rebuild workload runs much slower recently, and it is seeing a lot less merging than it did previously. The reason is that a previous commit reduced the amount of work we do for plug merging. RAID rebuild interleaves requests between disks, so a last-entry check in plug merging always misses a merge opportunity since we always find a different disk than what we are looking for. Modify the logic such that it's still a one-hit cache, but ensure that we check enough to find the right target before giving up. Fixes: d38a9c04 ("block: only check previous entry for plug merge attempt") Reported-and-tested-by:
Song Liu <song@kernel.org> Reviewed-by:
Song Liu <songliubraving@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 07, 2022
-
-
Christoph Hellwig authored
With the NVMe support for this gone, there are no consumers of these hints left, so remove them. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220304175556.407719-2-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 23, 2022
-
-
Christoph Hellwig authored
No more users of REQ_OP_WRITE_SAME or drivers implementing it are left, so remove the infrastructure. [mkp: fold in and tweak sysfs reporting fix] Link: https://lore.kernel.org/r/20220209082828.2629273-8-hch@lst.de Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- Feb 17, 2022
-
-
Ming Lei authored
Commit 111be883 ("block-throttle: avoid double charge") marks bio as BIO_THROTTLED unconditionally if __blk_throtl_bio() is called on this bio, then this bio won't be called into __blk_throtl_bio() any more. This way is to avoid double charge in case of bio splitting. It is reasonable for read/write throughput limit, but not reasonable for IOPS limit because block layer provides io accounting against split bio. Chunguang Xu has already observed this issue and fixed it in commit 4f1e9630 ("blk-throtl: optimize IOPS throttle for large IO scenarios"). However, that patch only covers bio splitting in __blk_queue_split(), and we have other kind of bio splitting, such as bio_split() & submit_bio_noacct() and other ways. This patch tries to fix the issue in one generic way by always charging the bio for iops limit in blk_throtl_bio(). This way is reasonable: re-submission & fast-cloned bio is charged if it is submitted to same disk/queue, and BIO_THROTTLED will be cleared if bio->bi_bdev is changed. This new approach can get much more smooth/stable iops limit compared with commit 4f1e9630 ("blk-throtl: optimize IOPS throttle for large IO scenarios") since that commit can't throttle current split bios actually. Also this way won't cause new double bio iops charge in blk_throtl_dispatch_work_fn() in which blk_throtl_bio() won't be called any more. Reported-by:
Ning Li <lining2020x@163.com> Acked-by:
Tejun Heo <tj@kernel.org> Cc: Chunguang Xu <brookxu@tencent.com> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220216044514.2903784-7-ming.lei@redhat.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 29, 2021
-
-
Christoph Hellwig authored
There is a 1:1 relationship between request_queues and gendisks now, so no need for these extra checks. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-3-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Not needed, shift it into the source files that need it instead. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211123185312.1432157-9-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
No needed, shift it into the source files that need it instead. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211123185312.1432157-5-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
blk_mq_submit_bio has two different plug cases, one that uses full plugging and a limited plugging one. The limited plugging case is only used for a corner case that does not matter in real life: - no ->commit_rqs (so not NVMe) - no shared tags (so not SCSI) - not rotational (so no old disk or floppy driver) - must have multiple queues (so no eMMC) Remove the limited merging case and all the related junk to simplify blk_mq_submit_bio and the functions called from it. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211123160443.1315598-2-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 03, 2021
-
-
Ming Lei authored
It is obvious that io merge can't be done between two different queues, so just try to run io merge in case of same queue. Signed-off-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211102133502.3619184-2-ming.lei@redhat.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 20, 2021
-
-
Pavel Begunkov authored
Convert bdev->bd_disk->queue to bdev_get_queue(), which is faster. Apparently, there are a few such spots in block that got lost during rebases. Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 19, 2021
-
-
Jens Axboe authored
Use a singly linked list for the blk_plug. This saves 8 bytes in the blk_plug struct, and makes for faster list manipulations than doubly linked lists. As we don't use the doubly linked lists for anything, singly linked is just fine. This yields a bump in default (merging enabled) performance from 7.0 to 7.1M IOPS, and ~7.5M IOPS with merging disabled. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Instead of returning the same queue request through a request pointer, use a boolean to accomplish the same. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 18, 2021
-
-
Jens Axboe authored
The fast path is no splitting needed. Separate the handling into a check part we can inline, and an out-of-line handling path if we do need to split. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Unlike the RWF_HIPRI userspace ABI which is intentionally kept vague, the bio flag is specific to the polling implementation, so rename and document it properly. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Tested-by:
Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-12-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Currently we scan the entire plug list, which is potentially very expensive. In an IOPS bound workload, we can drive about 5.6M IOPS with merging enabled, and profiling shows that the plug merge check is the (by far) most expensive thing we're doing: Overhead Command Shared Object Symbol + 20.89% io_uring [kernel.vmlinux] [k] blk_attempt_plug_merge + 4.98% io_uring [kernel.vmlinux] [k] io_submit_sqes + 4.78% io_uring [kernel.vmlinux] [k] blkdev_direct_IO + 4.61% io_uring [kernel.vmlinux] [k] blk_mq_submit_bio Instead of browsing the whole list, just check the previously inserted entry. That is enough for a naive merge check and will catch most cases, and for devices that need full merging, the IO scheduler attached to such devices will do that anyway. The plug merge is meant to be an inexpensive check to avoid getting a request, but if we repeatedly scan the list for every single insert, it is very much not a cheap check. With this patch, the workload instead runs at ~7.0M IOPS, providing a 25% improvement. Disabling merging entirely yields another 5% improvement. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
bio_get_first_bvec and bio_get_last_bvec are only used in blk-merge.c, so move them there. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211012161804.991559-8-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
Even if no policies are defined, we spend ~2% of the total IO time checking. Move the fast path inline. Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-