Skip to content
Snippets Groups Projects
  1. Oct 04, 2023
  2. Sep 26, 2023
    • Coly Li's avatar
      badblocks: switch to the improved badblock handling code · aa511ff8
      Coly Li authored
      
      This patch removes old code of badblocks_set(), badblocks_clear() and
      badblocks_check(), and make them as wrappers to call _badblocks_set(),
      _badblocks_clear() and _badblocks_check().
      
      By this change now the badblock handing switch to the improved algorithm
      in  _badblocks_set(), _badblocks_clear() and _badblocks_check().
      
      This patch only contains the changes of old code deletion, new added
      code for the improved algorithms are in previous patches.
      
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Geliang Tang <geliang.tang@suse.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: NeilBrown <neilb@suse.de>
      Cc: Vishal L Verma <vishal.l.verma@intel.com>
      Cc: Xiao Ni <xni@redhat.com>
      Reviewed-by: default avatarXiao Ni <xni@redhat.com>
      Acked-by: default avatarGeliang Tang <geliang.tang@suse.com>
      Link: https://lore.kernel.org/r/20230811170513.2300-7-colyli@suse.de
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      aa511ff8
    • Coly Li's avatar
      badblocks: improve badblocks_check() for multiple ranges handling · 3ea3354c
      Coly Li authored
      
      This patch rewrites badblocks_check() with similar coding style as
      _badblocks_set() and _badblocks_clear(). The only difference is bad
      blocks checking may handle multiple ranges in bad tables now.
      
      If a checking range covers multiple bad blocks range in bad block table,
      like the following condition (C is the checking range, E1, E2, E3 are
      three bad block ranges in bad block table),
        +------------------------------------+
        |                C                   |
        +------------------------------------+
          +----+      +----+      +----+
          | E1 |      | E2 |      | E3 |
          +----+      +----+      +----+
      The improved badblocks_check() algorithm will divide checking range C
      into multiple parts, and handle them in 7 runs of a while-loop,
        +--+ +----+ +----+ +----+ +----+ +----+ +----+
        |C1| | C2 | | C3 | | C4 | | C5 | | C6 | | C7 |
        +--+ +----+ +----+ +----+ +----+ +----+ +----+
             +----+        +----+        +----+
             | E1 |        | E2 |        | E3 |
             +----+        +----+        +----+
      And the start LBA and length of range E1 will be set as first_bad and
      bad_sectors for the caller.
      
      The return value rule is consistent for multiple ranges. For example if
      there are following bad block ranges in bad block table,
         Index No.     Start        Len         Ack
             0          400          20          1
             1          500          50          1
             2          650          20          0
      the return value, first_bad, bad_sectors by calling badblocks_set() with
      different checking range can be the following values,
          Checking Start, Len     Return Value   first_bad    bad_sectors
                     100, 100          0           N/A           N/A
                     100, 310          1           400           10
                     100, 440          1           400           10
                     100, 540          1           400           10
                     100, 600         -1           400           10
                     100, 800         -1           400           10
      
      In order to make code review easier, this patch names the improved bad
      block range checking routine as _badblocks_check() and does not change
      existing badblock_check() code yet. Later patch will delete old code of
      badblocks_check() and make it as a wrapper to call _badblocks_check().
      Then the new added code won't mess up with the old deleted code, it will
      be more clear and easier for code review.
      
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Geliang Tang <geliang.tang@suse.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: NeilBrown <neilb@suse.de>
      Cc: Vishal L Verma <vishal.l.verma@intel.com>
      Cc: Xiao Ni <xni@redhat.com>
      Reviewed-by: default avatarXiao Ni <xni@redhat.com>
      Acked-by: default avatarGeliang Tang <geliang.tang@suse.com>
      Link: https://lore.kernel.org/r/20230811170513.2300-6-colyli@suse.de
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3ea3354c
    • Coly Li's avatar
      badblocks: improve badblocks_clear() for multiple ranges handling · db448eb6
      Coly Li authored
      
      With the fundamental ideas and helper routines from badblocks_set()
      improvement, clearing bad block for multiple ranges is much simpler.
      
      With a similar idea from badblocks_set() improvement, this patch
      simplifies bad block range clearing into 5 situations. No matter how
      complicated the clearing condition is, we just look at the head part
      of clearing range with relative already set bad block range from the
      bad block table. The rested part will be handled in next run of the
      while-loop.
      
      Based on existing helpers added from badblocks_set(), this patch adds
      two more helpers,
      - front_clear()
        Clear the bad block range from bad block table which is front
        overlapped with the clearing range.
      - front_splitting_clear()
        Handle the condition that the clearing range hits middle of an
        already set bad block range from bad block table.
      
      Similar as badblocks_set(), the first part of clearing range is handled
      with relative bad block range which is find by prev_badblocks(). In most
      cases a valid hint is provided to prev_badblocks() to avoid unnecessary
      bad block table iteration.
      
      This patch also explains the detail algorithm code comments at beginning
      of badblocks.c, including which five simplified situations are
      categrized and how all the bad block range clearing conditions are
      handled by these five situations.
      
      Again, in order to make the code review easier and avoid the code
      changes mixed together, this patch does not modify badblock_clear() and
      implement another routine called _badblock_clear() for the improvement.
      Later patch will delete current code of badblock_clear() and make it as
      a wrapper to _badblock_clear(), so the code change can be much clear for
      review.
      
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Geliang Tang <geliang.tang@suse.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: NeilBrown <neilb@suse.de>
      Cc: Vishal L Verma <vishal.l.verma@intel.com>
      Cc: Xiao Ni <xni@redhat.com>
      Reviewed-by: default avatarXiao Ni <xni@redhat.com>
      Acked-by: default avatarGeliang Tang <geliang.tang@suse.com>
      Link: https://lore.kernel.org/r/20230811170513.2300-5-colyli@suse.de
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      db448eb6
    • Coly Li's avatar
      badblocks: improve badblocks_set() for multiple ranges handling · 1726c774
      Coly Li authored
      
      Recently I received a bug report that current badblocks code does not
      properly handle multiple ranges. For example,
              badblocks_set(bb, 32, 1, true);
              badblocks_set(bb, 34, 1, true);
              badblocks_set(bb, 36, 1, true);
              badblocks_set(bb, 32, 12, true);
      Then indeed badblocks_show() reports,
              32 3
              36 1
      But the expected bad blocks table should be,
              32 12
      Obviously only the first 2 ranges are merged and badblocks_set() returns
      and ignores the rest setting range.
      
      This behavior is improper, if the caller of badblocks_set() wants to set
      a range of blocks into bad blocks table, all of the blocks in the range
      should be handled even the previous part encountering failure.
      
      The desired way to set bad blocks range by badblocks_set() is,
      - Set as many as blocks in the setting range into bad blocks table.
      - Merge the bad blocks ranges and occupy as less as slots in the bad
        blocks table.
      - Fast.
      
      Indeed the above proposal is complicated, especially with the following
      restrictions,
      - The setting bad blocks range can be acknowledged or not acknowledged.
      - The bad blocks table size is limited.
      - Memory allocation should be avoided.
      
      The basic idea of the patch is to categorize all possible bad blocks
      range setting combinations into much less simplified and more less
      special conditions. Inside badblocks_set() there is an implicit loop
      composed by jumping between labels 're_insert' and 'update_sectors'. No
      matter how large the setting bad blocks range is, in every loop just a
      minimized range from the head is handled by a pre-defined behavior from
      one of the categorized conditions. The logic is simple and code flow is
      manageable.
      
      The different relative layout between the setting range and existing bad
      block range are checked and handled (merge, combine, overwrite, insert)
      by the helpers in previous patch. This patch is to make all the helpers
      work together with the above idea.
      
      This patch only has the algorithm improvement for badblocks_set(). There
      are following patches contain improvement for badblocks_clear() and
      badblocks_check(). But the algorithm in badblocks_set() is fundamental
      and typical, other improvement in clear and check routines are based on
      all the helpers and ideas in this patch.
      
      In order to make the change to be more clear for code review, this patch
      does not directly modify existing badblocks_set(), and just add a new
      one named _badblocks_set(). Later patch will remove current existing
      badblocks_set() code and make it as a wrapper of _badblocks_set(). So
      the new added change won't be mixed with deleted code, the code review
      can be easier.
      
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Geliang Tang <geliang.tang@suse.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: NeilBrown <neilb@suse.de>
      Cc: Vishal L Verma <vishal.l.verma@intel.com>
      Cc: Wols Lists <antlists@youngman.org.uk>
      Cc: Xiao Ni <xni@redhat.com>
      Reviewed-by: default avatarXiao Ni <xni@redhat.com>
      Acked-by: default avatarGeliang Tang <geliang.tang@suse.com>
      Link: https://lore.kernel.org/r/20230811170513.2300-4-colyli@suse.de
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1726c774
    • Coly Li's avatar
      badblocks: add helper routines for badblock ranges handling · c3c6a86e
      Coly Li authored
      
      This patch adds several helper routines to improve badblock ranges
      handling. These helper routines will be used later in the improved
      version of badblocks_set()/badblocks_clear()/badblocks_check().
      
      - Helpers prev_by_hint() and prev_badblocks() are used to find the bad
        range from bad table which the searching range starts at or after.
      
      - The following helpers are to decide the relative layout between the
        manipulating range and existing bad block range from bad table.
        - can_merge_behind()
          Return 'true' if the manipulating range can backward merge with the
          bad block range.
        - can_merge_front()
          Return 'true' if the manipulating range can forward merge with the
          bad block range.
        - can_combine_front()
          Return 'true' if two adjacent bad block ranges before the
          manipulating range can be merged.
        - overlap_front()
          Return 'true' if the manipulating range exactly overlaps with the
          bad block range in front of its range.
        - overlap_behind()
          Return 'true' if the manipulating range exactly overlaps with the
          bad block range behind its range.
        - can_front_overwrite()
          Return 'true' if the manipulating range can forward overwrite the
          bad block range in front of its range.
      
      - The following helpers are to add the manipulating range into the bad
        block table. Different routine is called with the specific relative
        layout between the manipulating range and other bad block range in the
        bad block table.
        - behind_merge()
          Merge the manipulating range with the bad block range behind its
          range, and return the number of merged length in unit of sector.
        - front_merge()
          Merge the manipulating range with the bad block range in front of
          its range, and return the number of merged length in unit of sector.
        - front_combine()
          Combine the two adjacent bad block ranges before the manipulating
          range into a larger one.
        - front_overwrite()
          Overwrite partial of whole bad block range which is in front of the
          manipulating range. The overwrite may split existing bad block range
          and generate more bad block ranges into the bad block table.
        - insert_at()
          Insert the manipulating range at a specific location in the bad
          block table.
      
      All the above helpers are used in later patches to improve the bad block
      ranges handling for badblocks_set()/badblocks_clear()/badblocks_check().
      
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Geliang Tang <geliang.tang@suse.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: NeilBrown <neilb@suse.de>
      Cc: Vishal L Verma <vishal.l.verma@intel.com>
      Cc: Xiao Ni <xni@redhat.com>
      Reviewed-by: default avatarXiao Ni <xni@redhat.com>
      Acked-by: default avatarGeliang Tang <geliang.tang@suse.com>
      Link: https://lore.kernel.org/r/20230811170513.2300-3-colyli@suse.de
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c3c6a86e
  3. Sep 22, 2023
  4. Sep 11, 2023
  5. Sep 06, 2023
  6. Aug 31, 2023
  7. Aug 30, 2023
  8. Aug 23, 2023
  9. Aug 22, 2023
  10. Aug 21, 2023
  11. Aug 19, 2023
  12. Aug 18, 2023
    • Sweet Tea Dorminy's avatar
      blk-crypto: dynamically allocate fallback profile · c984ff14
      Sweet Tea Dorminy authored
      
      blk_crypto_profile_init() calls lockdep_register_key(), which warns and
      does not register if the provided memory is a static object.
      blk-crypto-fallback currently has a static blk_crypto_profile and calls
      blk_crypto_profile_init() thereupon, resulting in the warning and
      failure to register.
      
      Fortunately it is simple enough to use a dynamically allocated profile
      and make lockdep function correctly.
      
      Fixes: 2fb48d88 ("blk-crypto: use dynamic lock class for blk_crypto_profile::lock")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSweet Tea Dorminy <sweettea-kernel@dorminy.me>
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20230817141615.15387-1-sweettea-kernel@dorminy.me
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c984ff14
    • Ming Lei's avatar
      blk-cgroup: hold queue_lock when removing blkg->q_node · c164c7bc
      Ming Lei authored
      
      When blkg is removed from q->blkg_list from blkg_free_workfn(), queue_lock
      has to be held, otherwise, all kinds of bugs(list corruption, hard lockup,
      ..) can be triggered from blkg_destroy_all().
      
      Fixes: f1c006f1 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()")
      Cc: Yu Kuai <yukuai3@huawei.com>
      Cc: xiaoli feng <xifeng@redhat.com>
      Cc: Chunyu Hu <chuhu@redhat.com>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Link: https://lore.kernel.org/r/20230817141751.1128970-1-ming.lei@redhat.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c164c7bc
    • Tejun Heo's avatar
      blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init · ec14a87e
      Tejun Heo authored
      
      blk-iocost sometimes causes the following crash:
      
        BUG: kernel NULL pointer dereference, address: 00000000000000e0
        ...
        RIP: 0010:_raw_spin_lock+0x17/0x30
        Code: be 01 02 00 00 e8 79 38 39 ff 31 d2 89 d0 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 65 ff 05 48 d0 34 7e b9 01 00 00 00 31 c0 <f0> 0f b1 0f 75 02 5d c3 89 c6 e8 ea 04 00 00 5d c3 0f 1f 84 00 00
        RSP: 0018:ffffc900023b3d40 EFLAGS: 00010046
        RAX: 0000000000000000 RBX: 00000000000000e0 RCX: 0000000000000001
        RDX: ffffc900023b3d20 RSI: ffffc900023b3cf0 RDI: 00000000000000e0
        RBP: ffffc900023b3d40 R08: ffffc900023b3c10 R09: 0000000000000003
        R10: 0000000000000064 R11: 000000000000000a R12: ffff888102337000
        R13: fffffffffffffff2 R14: ffff88810af408c8 R15: ffff8881070c3600
        FS:  00007faaaf364fc0(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000000000e0 CR3: 00000001097b1000 CR4: 0000000000350ea0
        Call Trace:
         <TASK>
         ioc_weight_write+0x13d/0x410
         cgroup_file_write+0x7a/0x130
         kernfs_fop_write_iter+0xf5/0x170
         vfs_write+0x298/0x370
         ksys_write+0x5f/0xb0
         __x64_sys_write+0x1b/0x20
         do_syscall_64+0x3d/0x80
         entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      This happens because iocg->ioc is NULL. The field is initialized by
      ioc_pd_init() and never cleared. The NULL deref is caused by
      blkcg_activate_policy() installing blkg_policy_data before initializing it.
      
      blkcg_activate_policy() was doing the following:
      
      1. Allocate pd's for all existing blkg's and install them in blkg->pd[].
      2. Initialize all pd's.
      3. Online all pd's.
      
      blkcg_activate_policy() only grabs the queue_lock and may release and
      re-acquire the lock as allocation may need to sleep. ioc_weight_write()
      grabs blkcg->lock and iterates all its blkg's. The two can race and if
      ioc_weight_write() runs during #1 or between #1 and #2, it can encounter a
      pd which is not initialized yet, leading to crash.
      
      The crash can be reproduced with the following script:
      
        #!/bin/bash
      
        echo +io > /sys/fs/cgroup/cgroup.subtree_control
        systemd-run --unit touch-sda --scope dd if=/dev/sda of=/dev/null bs=1M count=1 iflag=direct
        echo 100 > /sys/fs/cgroup/system.slice/io.weight
        bash -c "echo '8:0 enable=1' > /sys/fs/cgroup/io.cost.qos" &
        sleep .2
        echo 100 > /sys/fs/cgroup/system.slice/io.weight
      
      with the following patch applied:
      
      > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
      > index fc49be622e05..38d671d5e10c 100644
      > --- a/block/blk-cgroup.c
      > +++ b/block/blk-cgroup.c
      > @@ -1553,6 +1553,12 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol)
      > 		pd->online = false;
      > 	}
      >
      > +       if (system_state == SYSTEM_RUNNING) {
      > +               spin_unlock_irq(&q->queue_lock);
      > +               ssleep(1);
      > +               spin_lock_irq(&q->queue_lock);
      > +       }
      > +
      > 	/* all allocated, init in the same order */
      > 	if (pol->pd_init_fn)
      > 		list_for_each_entry_reverse(blkg, &q->blkg_list, q_node)
      
      I don't see a reason why all pd's should be allocated, initialized and
      onlined together. The only ordering requirement is that parent blkgs to be
      initialized and onlined before children, which is guaranteed from the
      walking order. Let's fix the bug by allocating, initializing and onlining pd
      for each blkg and holding blkcg->lock over initialization and onlining. This
      ensures that an installed blkg is always fully initialized and onlined
      removing the the race window.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarBreno Leitao <leitao@debian.org>
      Fixes: 9d179b86 ("blkcg: Fix multiple bugs in blkcg_activate_policy()")
      Link: https://lore.kernel.org/r/ZN0p5_W-Q9mAHBVY@slm.duckdns.org
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ec14a87e
  13. Aug 14, 2023
  14. Aug 10, 2023
  15. Aug 09, 2023
Loading