Skip to content
Snippets Groups Projects
  1. Jul 25, 2023
  2. Jul 24, 2023
  3. Jul 17, 2023
  4. Jun 29, 2023
  5. Jun 09, 2023
  6. Jun 01, 2023
  7. May 24, 2023
  8. Apr 21, 2023
    • Ritesh Harjani (IBM)'s avatar
      iomap: Add DIO tracepoints · 3fd41721
      Ritesh Harjani (IBM) authored
      
      Add trace_iomap_dio_rw_begin, trace_iomap_dio_rw_queued and
      trace_iomap_dio_complete tracepoint.
      trace_iomap_dio_rw_queued is mostly only to know that the request was
      queued and -EIOCBQUEUED was returned. It is mostly trace_iomap_dio_rw_begin
      & trace_iomap_dio_complete which has all the details.
      
      <example output log>
            a.out-2073  [006]   134.225717: iomap_dio_rw_begin:   dev 7:7 ino 0xe size 0x0 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT|WRITE dio_flags DIO_FORCE_WAIT aio 1
            a.out-2073  [006]   134.226234: iomap_dio_complete:   dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT|WRITE aio 1 error 0 ret 4096
            a.out-2074  [006]   136.225975: iomap_dio_rw_begin:   dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags  aio 1
            a.out-2074  [006]   136.226173: iomap_dio_rw_queued:  dev 7:7 ino 0xe size 0x1000 offset 0x1000 length 0x0
      ksoftirqd/3-31    [003]   136.226389: iomap_dio_complete:   dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 1 error 0 ret 4096
            a.out-2075  [003]   141.674969: iomap_dio_rw_begin:   dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT|WRITE dio_flags  aio 1
            a.out-2075  [003]   141.676085: iomap_dio_rw_queued:  dev 7:7 ino 0xe size 0x1000 offset 0x1000 length 0x0
      kworker/2:0-27    [002]   141.676432: iomap_dio_complete:   dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT|WRITE aio 1 error 0 ret 4096
            a.out-2077  [006]   143.443746: iomap_dio_rw_begin:   dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags  aio 1
            a.out-2077  [006]   143.443866: iomap_dio_rw_queued:  dev 7:7 ino 0xe size 0x1000 offset 0x1000 length 0x0
      ksoftirqd/5-41    [005]   143.444134: iomap_dio_complete:   dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 1 error 0 ret 4096
            a.out-2078  [007]   146.716833: iomap_dio_rw_begin:   dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags  aio 0
            a.out-2078  [007]   146.717639: iomap_dio_complete:   dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 0 error 0 ret 4096
            a.out-2079  [006]   148.972605: iomap_dio_rw_begin:   dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags  aio 0
            a.out-2079  [006]   148.973099: iomap_dio_complete:   dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 0 error 0 ret 4096
      
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      [djwong: line up strings all prettylike]
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      3fd41721
    • Ritesh Harjani (IBM)'s avatar
      iomap: Remove IOMAP_DIO_NOSYNC unused dio flag · d3bff1fc
      Ritesh Harjani (IBM) authored
      
      IOMAP_DIO_NOSYNC earlier was added for use in btrfs. But it seems for
      aio dsync writes this is not useful anyway. For aio dsync case, we
      we queue the request and return -EIOCBQUEUED. Now, since IOMAP_DIO_NOSYNC
      doesn't let iomap_dio_complete() to call generic_write_sync(),
      hence we may lose the sync write.
      
      Hence kill this flag as it is not in use by any FS now.
      
      Tested-by: default avatarDisha Goel <disgoel@linux.ibm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      d3bff1fc
  9. Apr 06, 2023
  10. Feb 15, 2023
  11. Feb 03, 2023
  12. Jan 18, 2023
  13. Nov 28, 2022
    • Dave Chinner's avatar
      iomap: write iomap validity checks · d7b64041
      Dave Chinner authored
      A recent multithreaded write data corruption has been uncovered in
      the iomap write code. The core of the problem is partial folio
      writes can be flushed to disk while a new racing write can map it
      and fill the rest of the page:
      
      writeback			new write
      
      allocate blocks
        blocks are unwritten
      submit IO
      .....
      				map blocks
      				iomap indicates UNWRITTEN range
      				loop {
      				  lock folio
      				  copyin data
      .....
      IO completes
        runs unwritten extent conv
          blocks are marked written
      				  <iomap now stale>
      				  get next folio
      				}
      
      Now add memory pressure such that memory reclaim evicts the
      partially written folio that has already been written to disk.
      
      When the new write finally gets to the last partial page of the new
      write, it does not find it in cache, so it instantiates a new page,
      sees the iomap is unwritten, and zeros the part of the page that
      it does not have data from. This overwrites the data on disk that
      was originally written.
      
      The full description of the corruption mechanism can be found here:
      
      https://lore.kernel.org/linux-xfs/20220817093627.GZ3600936@dread.disaster.area/
      
      
      
      To solve this problem, we need to check whether the iomap is still
      valid after we lock each folio during the write. We have to do it
      after we lock the page so that we don't end up with state changes
      occurring while we wait for the folio to be locked.
      
      Hence we need a mechanism to be able to check that the cached iomap
      is still valid (similar to what we already do in buffered
      writeback), and we need a way for ->begin_write to back out and
      tell the high level iomap iterator that we need to remap the
      remaining write range.
      
      The iomap needs to grow some storage for the validity cookie that
      the filesystem provides to travel with the iomap. XFS, in
      particular, also needs to know some more information about what the
      iomap maps (attribute extents rather than file data extents) to for
      the validity cookie to cover all the types of iomaps we might need
      to validate.
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      d7b64041
    • Dave Chinner's avatar
      iomap: buffered write failure should not truncate the page cache · f43dc4dc
      Dave Chinner authored
      
      iomap_file_buffered_write_punch_delalloc() currently invalidates the
      page cache over the unused range of the delalloc extent that was
      allocated. While the write allocated the delalloc extent, it does
      not own it exclusively as the write does not hold any locks that
      prevent either writeback or mmap page faults from changing the state
      of either the page cache or the extent state backing this range.
      
      Whilst xfs_bmap_punch_delalloc_range() already handles races in
      extent conversion - it will only punch out delalloc extents and it
      ignores any other type of extent - the page cache truncate does not
      discriminate between data written by this write or some other task.
      As a result, truncating the page cache can result in data corruption
      if the write races with mmap modifications to the file over the same
      range.
      
      generic/346 exercises this workload, and if we randomly fail writes
      (as will happen when iomap gets stale iomap detection later in the
      patchset), it will randomly corrupt the file data because it removes
      data written by mmap() in the same page as the write() that failed.
      
      Hence we do not want to punch out the page cache over the range of
      the extent we failed to write to - what we actually need to do is
      detect the ranges that have dirty data in cache over them and *not
      punch them out*.
      
      To do this, we have to walk the page cache over the range of the
      delalloc extent we want to remove. This is made complex by the fact
      we have to handle partially up-to-date folios correctly and this can
      happen even when the FSB size == PAGE_SIZE because we now support
      multi-page folios in the page cache.
      
      Because we are only interested in discovering the edges of data
      ranges in the page cache (i.e. hole-data boundaries) we can make use
      of mapping_seek_hole_data() to find those transitions in the page
      cache. As we hold the invalidate_lock, we know that the boundaries
      are not going to change while we walk the range. This interface is
      also byte-based and is sub-page block aware, so we can find the data
      ranges in the cache based on byte offsets rather than page, folio or
      fs block sized chunks. This greatly simplifies the logic of finding
      dirty cached ranges in the page cache.
      
      Once we've identified a range that contains cached data, we can then
      iterate the range folio by folio. This allows us to determine if the
      data is dirty and hence perform the correct delalloc extent punching
      operations. The seek interface we use to iterate data ranges will
      give us sub-folio start/end granularity, so we may end up looking up
      the same folio multiple times as the seek interface iterates across
      each discontiguous data region in the folio.
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      f43dc4dc
  14. Nov 23, 2022
    • Dave Chinner's avatar
      xfs,iomap: move delalloc punching to iomap · 9c7babf9
      Dave Chinner authored
      
      Because that's what Christoph wants for this error handling path
      only XFS uses.
      
      It requires a new iomap export for handling errors over delalloc
      ranges. This is basically the XFS code as is stands, but even though
      Christoph wants this as iomap funcitonality, we still have 
      to call it from the filesystem specific ->iomap_end callback, and
      call into the iomap code with yet another filesystem specific
      callback to punch the delalloc extent within the defined ranges.
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      9c7babf9
Loading