Skip to content
Snippets Groups Projects
  1. Dec 11, 2023
  2. Oct 19, 2023
    • Jan Stancek's avatar
      iomap: fix short copy in iomap_write_iter() · 3ac97479
      Jan Stancek authored
      
      Starting with commit 5d8edfb9 ("iomap: Copy larger chunks from
      userspace"), iomap_write_iter() can get into endless loop. This can
      be reproduced with LTP writev07 which uses partially valid iovecs:
              struct iovec wr_iovec[] = {
                      { buffer, 64 },
                      { bad_addr, 64 },
                      { buffer + 64, 64 },
                      { buffer + 64 * 2, 64 },
              };
      
      commit bc1bb416 ("generic_perform_write()/iomap_write_actor():
      saner logics for short copy") previously introduced the logic, which
      made short copy retry in next iteration with amount of "bytes" it
      managed to copy:
      
                      if (unlikely(status == 0)) {
                              /*
                               * A short copy made iomap_write_end() reject the
                               * thing entirely.  Might be memory poisoning
                               * halfway through, might be a race with munmap,
                               * might be severe memory pressure.
                               */
                              if (copied)
                                      bytes = copied;
      
      However, since 5d8edfb9 "bytes" is no longer carried into next
      iteration, because it is now always initialized at the beginning of
      the loop. And for iov_iter_count < PAGE_SIZE, "bytes" ends up with
      same value as previous iteration, making the loop retry same copy
      over and over, which leads to writev07 testcase hanging.
      
      Make next iteration retry with amount of bytes we managed to copy.
      
      Fixes: 5d8edfb9 ("iomap: Copy larger chunks from userspace")
      Signed-off-by: default avatarJan Stancek <jstancek@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      3ac97479
  3. Oct 18, 2023
    • Matthew Wilcox (Oracle)'s avatar
      iomap: use folio_end_read() · 7a4847e5
      Matthew Wilcox (Oracle) authored
      Combine the setting of the uptodate flag with the clearing of the locked
      flag.
      
      Link: https://lkml.kernel.org/r/20231004165317.1061855-7-willy@infradead.org
      
      
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Henderson <richard.henderson@linaro.org>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7a4847e5
    • Matthew Wilcox (Oracle)'s avatar
      iomap: protect read_bytes_pending with the state_lock · f45b494e
      Matthew Wilcox (Oracle) authored
      Perform one atomic operation (acquiring the spinlock) instead of two
      (spinlock & atomic_sub) per read completion.
      
      Link: https://lkml.kernel.org/r/20231004165317.1061855-3-willy@infradead.org
      
      
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Henderson <richard.henderson@linaro.org>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f45b494e
    • Matthew Wilcox (Oracle)'s avatar
      iomap: hold state_lock over call to ifs_set_range_uptodate() · 279d5fc3
      Matthew Wilcox (Oracle) authored
      Patch series "Add folio_end_read", v2.
      
      The core of this patchset is the new folio_end_read() call which
      filesystems can use when finishing a page cache read instead of separate
      calls to mark the folio uptodate and unlock it.  As an illustration of its
      use, I converted ext4, iomap & mpage; more can be converted.
      
      I think that's useful by itself, but the interesting optimisation is that
      we can implement that with a single XOR instruction that sets the uptodate
      bit, clears the lock bit, tests the waiter bit and provides a write memory
      barrier.  That removes one memory barrier and one atomic instruction from
      each page read, which seems worth doing.  That's in patch 15.
      
      The last two patches could be a separate series, but basically we can do
      the same thing with the writeback flag that we do with the unlock flag;
      clear it and test the waiters bit at the same time.
      
      
      This patch (of 17):
      
      This is really preparation for the next patch, but it lets us call
      folio_mark_uptodate() in just one place instead of two.
      
      Link: https://lkml.kernel.org/r/20231004165317.1061855-1-willy@infradead.org
      Link: https://lkml.kernel.org/r/20231004165317.1061855-2-willy@infradead.org
      
      
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Richard Henderson <richard.henderson@linaro.org>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      279d5fc3
  4. Sep 28, 2023
  5. Sep 19, 2023
  6. Sep 18, 2023
    • Darrick J. Wong's avatar
      iomap: don't skip reading in !uptodate folios when unsharing a range · 35d30c9c
      Darrick J. Wong authored
      
      Prior to commit a01b8f22, we would always read in the contents of a
      !uptodate folio prior to writing userspace data into the folio,
      allocated a folio state object, etc.  Ritesh introduced an optimization
      that skips all of that if the write would cover the entire folio.
      
      Unfortunately, the optimization misses the unshare case, where we always
      have to read in the folio contents since there isn't a data buffer
      supplied by userspace.  This can result in stale kernel memory exposure
      if userspace issues a FALLOC_FL_UNSHARE_RANGE call on part of a shared
      file that isn't already cached.
      
      This was caught by observing fstests regressions in the "unshare around"
      mechanism that is used for unaligned writes to a reflinked realtime
      volume when the realtime extent size is larger than 1FSB, though I think
      it applies to any shared file.
      
      Cc: ritesh.list@gmail.com, willy@infradead.org
      Fixes: a01b8f22 ("iomap: Allocate ifs in ->write_begin() early")
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
      35d30c9c
  7. Aug 02, 2023
  8. Aug 01, 2023
  9. Jul 25, 2023
  10. Jul 24, 2023
  11. Jul 17, 2023
  12. Jun 29, 2023
  13. Jun 09, 2023
  14. Jun 01, 2023
  15. May 24, 2023
  16. Apr 21, 2023
    • Ritesh Harjani (IBM)'s avatar
      iomap: Add DIO tracepoints · 3fd41721
      Ritesh Harjani (IBM) authored
      
      Add trace_iomap_dio_rw_begin, trace_iomap_dio_rw_queued and
      trace_iomap_dio_complete tracepoint.
      trace_iomap_dio_rw_queued is mostly only to know that the request was
      queued and -EIOCBQUEUED was returned. It is mostly trace_iomap_dio_rw_begin
      & trace_iomap_dio_complete which has all the details.
      
      <example output log>
            a.out-2073  [006]   134.225717: iomap_dio_rw_begin:   dev 7:7 ino 0xe size 0x0 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT|WRITE dio_flags DIO_FORCE_WAIT aio 1
            a.out-2073  [006]   134.226234: iomap_dio_complete:   dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT|WRITE aio 1 error 0 ret 4096
            a.out-2074  [006]   136.225975: iomap_dio_rw_begin:   dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags  aio 1
            a.out-2074  [006]   136.226173: iomap_dio_rw_queued:  dev 7:7 ino 0xe size 0x1000 offset 0x1000 length 0x0
      ksoftirqd/3-31    [003]   136.226389: iomap_dio_complete:   dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 1 error 0 ret 4096
            a.out-2075  [003]   141.674969: iomap_dio_rw_begin:   dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT|WRITE dio_flags  aio 1
            a.out-2075  [003]   141.676085: iomap_dio_rw_queued:  dev 7:7 ino 0xe size 0x1000 offset 0x1000 length 0x0
      kworker/2:0-27    [002]   141.676432: iomap_dio_complete:   dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT|WRITE aio 1 error 0 ret 4096
            a.out-2077  [006]   143.443746: iomap_dio_rw_begin:   dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags  aio 1
            a.out-2077  [006]   143.443866: iomap_dio_rw_queued:  dev 7:7 ino 0xe size 0x1000 offset 0x1000 length 0x0
      ksoftirqd/5-41    [005]   143.444134: iomap_dio_complete:   dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 1 error 0 ret 4096
            a.out-2078  [007]   146.716833: iomap_dio_rw_begin:   dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags  aio 0
            a.out-2078  [007]   146.717639: iomap_dio_complete:   dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 0 error 0 ret 4096
            a.out-2079  [006]   148.972605: iomap_dio_rw_begin:   dev 7:7 ino 0xe size 0x1000 offset 0x0 length 0x1000 done_before 0x0 flags DIRECT dio_flags  aio 0
            a.out-2079  [006]   148.973099: iomap_dio_complete:   dev 7:7 ino 0xe size 0x1000 offset 0x1000 flags DIRECT aio 0 error 0 ret 4096
      
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      [djwong: line up strings all prettylike]
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      3fd41721
    • Ritesh Harjani (IBM)'s avatar
      iomap: Remove IOMAP_DIO_NOSYNC unused dio flag · d3bff1fc
      Ritesh Harjani (IBM) authored
      
      IOMAP_DIO_NOSYNC earlier was added for use in btrfs. But it seems for
      aio dsync writes this is not useful anyway. For aio dsync case, we
      we queue the request and return -EIOCBQUEUED. Now, since IOMAP_DIO_NOSYNC
      doesn't let iomap_dio_complete() to call generic_write_sync(),
      hence we may lose the sync write.
      
      Hence kill this flag as it is not in use by any FS now.
      
      Tested-by: default avatarDisha Goel <disgoel@linux.ibm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarRitesh Harjani (IBM) <ritesh.list@gmail.com>
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      d3bff1fc
Loading