Skip to content
Snippets Groups Projects
  1. Jul 27, 2023
    • Yu Kuai's avatar
      md: refactor idle/frozen_sync_thread() to fix deadlock · 130443d6
      Yu Kuai authored
      Our test found a following deadlock in raid10:
      
      1) Issue a normal write, and such write failed:
      
        raid10_end_write_request
         set_bit(R10BIO_WriteError, &r10_bio->state)
         one_write_done
          reschedule_retry
      
        // later from md thread
        raid10d
         handle_write_completed
          list_add(&r10_bio->retry_list, &conf->bio_end_io_list)
      
        // later from md thread
        raid10d
         if (!test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags))
          list_move(conf->bio_end_io_list.prev, &tmp)
          r10_bio = list_first_entry(&tmp, struct r10bio, retry_list)
          raid_end_bio_io(r10_bio)
      
      Dependency chain 1: normal io is waiting for updating superblock
      
      2) Trigger a recovery:
      
        raid10_sync_request
         raise_barrier
      
      Dependency chain 2: sync thread is waiting for normal io
      
      3) echo idle/frozen to sync_action:
      
        action_store
         mddev_lock
          md_unregister_thread
           kthread_stop
      
      Dependency chain 3: drop 'reconfig_mutex' is waiting for sync thread
      
      4) md thread can't update superblock:
      
        raid10d
         md_check_recovery
          if (mddev_trylock(mddev))
           md_update_sb
      
      Dependency chain 4: update superblock is waiting for 'reconfig_mutex'
      
      Hence cyclic dependency exist, in order to fix the problem, we must
      break one of them. Dependency 1 and 2 can't be broken because they are
      foundation design. Dependency 4 may be possible if it can be guaranteed
      that no io can be inflight, however, this requires a new mechanism which
      seems complex. Dependency 3 is a good choice, because idle/frozen only
      requires sync thread to finish, which can be done asynchronously that is
      already implemented, and 'reconfig_mutex' is not needed anymore.
      
      This patch switch 'idle' and 'frozen' to wait sync thread to be done
      asynchronously, and this patch also add a sequence counter to record how
      many times sync thread is done, so that 'idle' won't keep waiting on new
      started sync thread.
      
      Noted that raid456 has similiar deadlock([1]), and it's verified[2] this
      deadlock can be fixed by this patch as well.
      
      [1] https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
      [2] https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/
      
      
      
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230529132037.2124527-5-yukuai1@huaweicloud.com
      130443d6
    • Yu Kuai's avatar
      md: add a mutex to synchronize idle and frozen in action_store() · 6f56f0c4
      Yu Kuai authored
      
      Currently, for idle and frozen, action_store will hold 'reconfig_mutex'
      and call md_reap_sync_thread() to stop sync thread, however, this will
      cause deadlock (explained in the next patch). In order to fix the
      problem, following patch will release 'reconfig_mutex' and wait on
      'resync_wait', like md_set_readonly() and do_md_stop() does.
      
      Consider that action_store() will set/clear 'MD_RECOVERY_FROZEN'
      unconditionally, which might cause unexpected problems, for example,
      frozen just set 'MD_RECOVERY_FROZEN' and is still in progress, while
      'idle' clear 'MD_RECOVERY_FROZEN' and new sync thread is started, which
      might starve in progress frozen. A mutex is added to synchronize idle
      and frozen from action_store().
      
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230529132037.2124527-4-yukuai1@huaweicloud.com
      6f56f0c4
    • Yu Kuai's avatar
      md: refactor action_store() for 'idle' and 'frozen' · 64e5e09a
      Yu Kuai authored
      
      Prepare to handle 'idle' and 'frozen' differently to fix a deadlock, there
      are no functional changes except that MD_RECOVERY_RUNNING is checked
      again after 'reconfig_mutex' is held.
      
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230529132037.2124527-3-yukuai1@huaweicloud.com
      64e5e09a
    • Yu Kuai's avatar
      Revert "md: unlock mddev before reap sync_thread in action_store" · a865b96c
      Yu Kuai authored
      
      This reverts commit 9dfbdafd.
      
      Because it will introduce a defect that sync_thread can be running while
      MD_RECOVERY_RUNNING is cleared, which will cause some unexpected problems,
      for example:
      
      list_add corruption. prev->next should be next (ffff0001ac1daba0), but was ffff0000ce1a02a0. (prev=ffff0000ce1a02a0).
      Call trace:
       __list_add_valid+0xfc/0x140
       insert_work+0x78/0x1a0
       __queue_work+0x500/0xcf4
       queue_work_on+0xe8/0x12c
       md_check_recovery+0xa34/0xf30
       raid10d+0xb8/0x900 [raid10]
       md_thread+0x16c/0x2cc
       kthread+0x1a4/0x1ec
       ret_from_fork+0x10/0x18
      
      This is because work is requeued while it's still inside workqueue:
      
      t1:			t2:
      action_store
       mddev_lock
        if (mddev->sync_thread)
         mddev_unlock
         md_unregister_thread
         // first sync_thread is done
      			md_check_recovery
      			 mddev_try_lock
      			 /*
      			  * once MD_RECOVERY_DONE is set, new sync_thread
      			  * can start.
      			  */
      			 set_bit(MD_RECOVERY_RUNNING, &mddev->recovery)
      			 INIT_WORK(&mddev->del_work, md_start_sync)
      			 queue_work(md_misc_wq, &mddev->del_work)
      			  test_and_set_bit(WORK_STRUCT_PENDING_BIT, ...)
      			  // set pending bit
      			  insert_work
      			   list_add_tail
      			 mddev_unlock
         mddev_lock_nointr
         md_reap_sync_thread
         // MD_RECOVERY_RUNNING is cleared
       mddev_unlock
      
      t3:
      
      // before queued work started from t2
      md_check_recovery
       // MD_RECOVERY_RUNNING is not set, a new sync_thread can be started
       INIT_WORK(&mddev->del_work, md_start_sync)
        work->data = 0
        // work pending bit is cleared
       queue_work(md_misc_wq, &mddev->del_work)
        insert_work
         list_add_tail
         // list is corrupted
      
      The above commit is reverted to fix the problem, the deadlock this
      commit tries to fix will be fixed in following patches.
      
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230529132037.2124527-2-yukuai1@huaweicloud.com
      a865b96c
  2. Jul 26, 2023
  3. Jul 25, 2023
  4. Jul 20, 2023
  5. Jul 17, 2023
  6. Jul 16, 2023
  7. Jul 15, 2023
Loading