Skip to content
Snippets Groups Projects
  1. Dec 12, 2022
    • Shiyang Ruan's avatar
      fsdax: invalidate pages when CoW · f80e1668
      Shiyang Ruan authored
      CoW changes the share state of a dax page, but the share count of the page
      isn't updated.  The next time access this page, it should have been a
      newly accessed, but old association exists.  So, we need to clear the
      share state when CoW happens, in both dax_iomap_rw() and dax_zero_iter().
      
      Link: https://lkml.kernel.org/r/1669908538-55-3-git-send-email-ruansy.fnst@fujitsu.com
      
      
      Signed-off-by: default avatarShiyang Ruan <ruansy.fnst@fujitsu.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f80e1668
    • Shiyang Ruan's avatar
      fsdax: introduce page->share for fsdax in reflink mode · 16900426
      Shiyang Ruan authored
      Patch series "fsdax,xfs: fix warning messages", v2.
      
      Many testcases failed in dax+reflink mode with warning message in dmesg.
      Such as generic/051,075,127.  The warning message is like this:
      [  775.509337] ------------[ cut here ]------------
      [  775.509636] WARNING: CPU: 1 PID: 16815 at fs/dax.c:386 dax_insert_entry.cold+0x2e/0x69
      [  775.510151] Modules linked in: auth_rpcgss oid_registry nfsv4 algif_hash af_alg af_packet nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter ip_tables x_tables dax_pmem nd_pmem nd_btt sch_fq_codel configfs xfs libcrc32c fuse
      [  775.524288] CPU: 1 PID: 16815 Comm: fsx Kdump: loaded Tainted: G        W          6.1.0-rc4+ #164 eb34e4ee4200c7cbbb47de2b1892c5a3e027fd6d
      [  775.524904] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.0-3-3 04/01/2014
      [  775.525460] RIP: 0010:dax_insert_entry.cold+0x2e/0x69
      [  775.525797] Code: c7 c7 18 eb e0 81 48 89 4c 24 20 48 89 54 24 10 e8 73 6d ff ff 48 83 7d 18 00 48 8b 54 24 10 48 8b 4c 24 20 0f 84 e3 e9 b9 ff <0f> 0b e9 dc e9 b9 ff 48 c7 c6 a0 20 c3 81 48 c7 c7 f0 ea e0 81 48
      [  775.526708] RSP: 0000:ffffc90001d57b30 EFLAGS: 00010082
      [  775.527042] RAX: 000000000000002a RBX: 0000000000000000 RCX: 0000000000000042
      [  775.527396] RDX: ffffea000a0f6c80 RSI: ffffffff81dfab1b RDI: 00000000ffffffff
      [  775.527819] RBP: ffffea000a0f6c40 R08: 0000000000000000 R09: ffffffff820625e0
      [  775.528241] R10: ffffc90001d579d8 R11: ffffffff820d2628 R12: ffff88815fc98320
      [  775.528598] R13: ffffc90001d57c18 R14: 0000000000000000 R15: 0000000000000001
      [  775.528997] FS:  00007f39fc75d740(0000) GS:ffff88817bc80000(0000) knlGS:0000000000000000
      [  775.529474] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  775.529800] CR2: 00007f39fc772040 CR3: 0000000107eb6001 CR4: 00000000003706e0
      [  775.530214] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  775.530592] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  775.531002] Call Trace:
      [  775.531230]  <TASK>
      [  775.531444]  dax_fault_iter+0x267/0x6c0
      [  775.531719]  dax_iomap_pte_fault+0x198/0x3d0
      [  775.532002]  __xfs_filemap_fault+0x24a/0x2d0 [xfs aa8d25411432b306d9554da38096f4ebb86bdfe7]
      [  775.532603]  __do_fault+0x30/0x1e0
      [  775.532903]  do_fault+0x314/0x6c0
      [  775.533166]  __handle_mm_fault+0x646/0x1250
      [  775.533480]  handle_mm_fault+0xc1/0x230
      [  775.533810]  do_user_addr_fault+0x1ac/0x610
      [  775.534110]  exc_page_fault+0x63/0x140
      [  775.534389]  asm_exc_page_fault+0x22/0x30
      [  775.534678] RIP: 0033:0x7f39fc55820a
      [  775.534950] Code: 00 01 00 00 00 74 99 83 f9 c0 0f 87 7b fe ff ff c5 fe 6f 4e 20 48 29 fe 48 83 c7 3f 49 8d 0c 10 48 83 e7 c0 48 01 fe 48 29 f9 <f3> a4 c4 c1 7e 7f 00 c4 c1 7e 7f 48 20 c5 f8 77 c3 0f 1f 44 00 00
      [  775.535839] RSP: 002b:00007ffc66a08118 EFLAGS: 00010202
      [  775.536157] RAX: 00007f39fc772001 RBX: 0000000000042001 RCX: 00000000000063c1
      [  775.536537] RDX: 0000000000006400 RSI: 00007f39fac42050 RDI: 00007f39fc772040
      [  775.536919] RBP: 0000000000006400 R08: 00007f39fc772001 R09: 0000000000042000
      [  775.537304] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000001
      [  775.537694] R13: 00007f39fc772000 R14: 0000000000006401 R15: 0000000000000003
      [  775.538086]  </TASK>
      [  775.538333] ---[ end trace 0000000000000000 ]---
      
      This also affects dax+noreflink mode if we run the test after a
      dax+reflink test.  So, the most urgent thing is solving the warning
      messages.
      
      With these fixes, most warning messages in dax_associate_entry() are gone.
      But honestly, generic/388 will randomly failed with the warning.  The
      case shutdown the xfs when fsstress is running, and do it for many times. 
      I think the reason is that dax pages in use are not able to be invalidated
      in time when fs is shutdown.  The next time dax page to be associated, it
      still remains the mapping value set last time.  I'll keep on solving it.
      
      The warning message in dax_writeback_one() can also be fixed because of
      the dax unshare.
      
      
      This patch (of 8):
      
      fsdax page is used not only when CoW, but also mapread.  To make the it
      easily understood, use 'share' to indicate that the dax page is shared by
      more than one extent.  And add helper functions to use it.
      
      Also, the flag needs to be renamed to PAGE_MAPPING_DAX_SHARED.
      
      [ruansy.fnst@fujitsu.com: rename several functions]
        Link: https://lkml.kernel.org/r/1669972991-246-1-git-send-email-ruansy.fnst@fujitsu.com
      [ruansy.fnst@fujitsu.com: v2.2]
        Link: https://lkml.kernel.org/r/1670381359-53-1-git-send-email-ruansy.fnst@fujitsu.com
      Link: https://lkml.kernel.org/r/1669908538-55-1-git-send-email-ruansy.fnst@fujitsu.com
      Link: https://lkml.kernel.org/r/1669908538-55-2-git-send-email-ruansy.fnst@fujitsu.com
      
      
      Signed-off-by: default avatarShiyang Ruan <ruansy.fnst@fujitsu.com>
      Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      16900426
    • Vishal Moola (Oracle)'s avatar
      fuse: convert fuse_try_move_page() to use folios · 063aaad7
      Vishal Moola (Oracle) authored
      Converts the function to try to move folios instead of pages. Also
      converts fuse_check_page() to fuse_get_folio() since this is its only
      caller. This change removes 15 calls to compound_head().
      
      Link: https://lkml.kernel.org/r/20221101175326.13265-3-vishal.moola@gmail.com
      
      
      Signed-off-by: default avatarVishal Moola (Oracle) <vishal.moola@gmail.com>
      Acked-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      063aaad7
    • Vishal Moola (Oracle)'s avatar
      filemap: convert replace_page_cache_page() to replace_page_cache_folio() · 3720dd6d
      Vishal Moola (Oracle) authored
      Patch series "Removing the lru_cache_add() wrapper".
      
      This patchset replaces all calls of lru_cache_add() with the folio
      equivalent: folio_add_lru().  This is allows us to get rid of the wrapper
      The series passes xfstests and the userfaultfd selftests.
      
      
      This patch (of 5):
      
      Eliminates 7 calls to compound_head().
      
      Link: https://lkml.kernel.org/r/20221101175326.13265-1-vishal.moola@gmail.com
      Link: https://lkml.kernel.org/r/20221101175326.13265-2-vishal.moola@gmail.com
      
      
      Signed-off-by: default avatarVishal Moola (Oracle) <vishal.moola@gmail.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3720dd6d
  2. Nov 30, 2022
    • Vishal Moola (Oracle)'s avatar
      ext4: convert move_extent_per_page() to use folios · 6dd8fe86
      Vishal Moola (Oracle) authored
      Patch series "Removing the try_to_release_page() wrapper", v3.
      
      This patchset replaces the remaining calls of try_to_release_page() with
      the folio equivalent: filemap_release_folio().  This allows us to remove
      the wrapper.
      
      
      This patch (of 4):
      
      Convert move_extent_per_page() to use folios.  This change removes 5 calls
      to compound_head() and is in preparation for the removal of the
      try_to_release_page() wrapper.
      
      Link: https://lkml.kernel.org/r/20221118073055.55694-1-vishal.moola@gmail.com
      Link: https://lkml.kernel.org/r/20221118073055.55694-2-vishal.moola@gmail.com
      
      
      Signed-off-by: default avatarVishal Moola (Oracle) <vishal.moola@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6dd8fe86
    • Li zeming's avatar
      hugetlbfs: inode: remove unnecessary (void*) conversions · dbaf7dc9
      Li zeming authored
      The ei pointer does not need to cast the type.
      
      Link: https://lkml.kernel.org/r/20221107015659.3221-1-zeming@nfschina.com
      
      
      Signed-off-by: default avatarLi zeming <zeming@nfschina.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      dbaf7dc9
    • Pasha Tatashin's avatar
      mm: anonymous shared memory naming · d09e8ca6
      Pasha Tatashin authored
      Since commit 9a10064f ("mm: add a field to store names for private
      anonymous memory"), name for private anonymous memory, but not shared
      anonymous, can be set.  However, naming shared anonymous memory just as
      useful for tracking purposes.
      
      Extend the functionality to be able to set names for shared anon.
      
      There are two ways to create anonymous shared memory, using memfd or
      directly via mmap():
      1. fd = memfd_create(...)
         mem = mmap(..., MAP_SHARED, fd, ...)
      2. mem = mmap(..., MAP_SHARED | MAP_ANONYMOUS, -1, ...)
      
      In both cases the anonymous shared memory is created the same way by
      mapping an unlinked file on tmpfs.
      
      The memfd way allows to give a name for anonymous shared memory, but
      not useful when parts of shared memory require to have distinct names.
      
      Example use case: The VMM maps VM memory as anonymous shared memory (not
      private because VMM is sandboxed and drivers are running in their own
      processes).  However, the VM tells back to the VMM how parts of the memory
      are actually used by the guest, how each of the segments should be backed
      (i.e.  4K pages, 2M pages), and some other information about the segments.
      The naming allows us to monitor the effective memory footprint for each
      of these segments from the host without looking inside the guest.
      
      Sample output:
        /* Create shared anonymous segmenet */
        anon_shmem = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
                          MAP_SHARED | MAP_ANONYMOUS, -1, 0);
        /* Name the segment: "MY-NAME" */
        rv = prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME,
                   anon_shmem, SIZE, "MY-NAME");
      
      cat /proc/<pid>/maps (and smaps):
      7fc8e2b4c000-7fc8f2b4c000 rw-s 00000000 00:01 1024 [anon_shmem:MY-NAME]
      
      If the segment is not named, the output is:
      7fc8e2b4c000-7fc8f2b4c000 rw-s 00000000 00:01 1024 /dev/zero (deleted)
      
      Link: https://lkml.kernel.org/r/20221115020602.804224-1-pasha.tatashin@soleen.com
      
      
      Signed-off-by: default avatarPasha Tatashin <pasha.tatashin@soleen.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Bagas Sanjaya <bagasdotme@gmail.com>
      Cc: Colin Cross <ccross@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Vincent Whitchurch <vincent.whitchurch@axis.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: xu xin <cgel.zte@gmail.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d09e8ca6
    • ZhangPeng's avatar
      nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry() · f0a0ccda
      ZhangPeng authored
      Syzbot reported a null-ptr-deref bug:
      
       NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP
       frequency < 30 seconds
       general protection fault, probably for non-canonical address
       0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN
       KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
       CPU: 1 PID: 3603 Comm: segctord Not tainted
       6.1.0-rc2-syzkaller-00105-gb229b6ca5abb #0
       Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google
       10/11/2022
       RIP: 0010:nilfs_palloc_commit_free_entry+0xe5/0x6b0
       fs/nilfs2/alloc.c:608
       Code: 00 00 00 00 fc ff df 80 3c 02 00 0f 85 cd 05 00 00 48 b8 00 00 00
       00 00 fc ff df 4c 8b 73 08 49 8d 7e 10 48 89 fa 48 c1 ea 03 <80> 3c 02
       00 0f 85 26 05 00 00 49 8b 46 10 be a6 00 00 00 48 c7 c7
       RSP: 0018:ffffc90003dff830 EFLAGS: 00010212
       RAX: dffffc0000000000 RBX: ffff88802594e218 RCX: 000000000000000d
       RDX: 0000000000000002 RSI: 0000000000002000 RDI: 0000000000000010
       RBP: ffff888071880222 R08: 0000000000000005 R09: 000000000000003f
       R10: 000000000000000d R11: 0000000000000000 R12: ffff888071880158
       R13: ffff88802594e220 R14: 0000000000000000 R15: 0000000000000004
       FS:  0000000000000000(0000) GS:ffff8880b9b00000(0000)
       knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007fb1c08316a8 CR3: 0000000018560000 CR4: 0000000000350ee0
       Call Trace:
        <TASK>
        nilfs_dat_commit_free fs/nilfs2/dat.c:114 [inline]
        nilfs_dat_commit_end+0x464/0x5f0 fs/nilfs2/dat.c:193
        nilfs_dat_commit_update+0x26/0x40 fs/nilfs2/dat.c:236
        nilfs_btree_commit_update_v+0x87/0x4a0 fs/nilfs2/btree.c:1940
        nilfs_btree_commit_propagate_v fs/nilfs2/btree.c:2016 [inline]
        nilfs_btree_propagate_v fs/nilfs2/btree.c:2046 [inline]
        nilfs_btree_propagate+0xa00/0xd60 fs/nilfs2/btree.c:2088
        nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337
        nilfs_collect_file_data+0x45/0xd0 fs/nilfs2/segment.c:568
        nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1018
        nilfs_segctor_scan_file+0x3f4/0x6f0 fs/nilfs2/segment.c:1067
        nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1197 [inline]
        nilfs_segctor_collect fs/nilfs2/segment.c:1503 [inline]
        nilfs_segctor_do_construct+0x12fc/0x6af0 fs/nilfs2/segment.c:2045
        nilfs_segctor_construct+0x8e3/0xb30 fs/nilfs2/segment.c:2379
        nilfs_segctor_thread_construct fs/nilfs2/segment.c:2487 [inline]
        nilfs_segctor_thread+0x3c3/0xf30 fs/nilfs2/segment.c:2570
        kthread+0x2e4/0x3a0 kernel/kthread.c:376
        ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
        </TASK>
       ...
      
      If DAT metadata file is corrupted on disk, there is a case where
      req->pr_desc_bh is NULL and blocknr is 0 at nilfs_dat_commit_end() during
      a b-tree operation that cascadingly updates ancestor nodes of the b-tree,
      because nilfs_dat_commit_alloc() for a lower level block can initialize
      the blocknr on the same DAT entry between nilfs_dat_prepare_end() and
      nilfs_dat_commit_end().
      
      If this happens, nilfs_dat_commit_end() calls nilfs_dat_commit_free()
      without valid buffer heads in req->pr_desc_bh and req->pr_bitmap_bh, and
      causes the NULL pointer dereference above in
      nilfs_palloc_commit_free_entry() function, which leads to a crash.
      
      Fix this by adding a NULL check on req->pr_desc_bh and req->pr_bitmap_bh
      before nilfs_palloc_commit_free_entry() in nilfs_dat_commit_free().
      
      This also calls nilfs_error() in that case to notify that there is a fatal
      flaw in the filesystem metadata and prevent further operations.
      
      Link: https://lkml.kernel.org/r/00000000000097c20205ebaea3d6@google.com
      Link: https://lkml.kernel.org/r/20221114040441.1649940-1-zhangpeng362@huawei.com
      Link: https://lkml.kernel.org/r/20221119120542.17204-1-konishi.ryusuke@gmail.com
      
      
      Signed-off-by: default avatarZhangPeng <zhangpeng362@huawei.com>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: default avatar <syzbot+ebe05ee8e98f755f61d0@syzkaller.appspotmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f0a0ccda
  3. Nov 23, 2022
  4. Nov 09, 2022
    • Hawkins Jiawei's avatar
      hugetlbfs: fix null-ptr-deref in hugetlbfs_parse_param() · 26215b7e
      Hawkins Jiawei authored
      Syzkaller reports a null-ptr-deref bug as follows:
      ======================================================
      KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
      RIP: 0010:hugetlbfs_parse_param+0x1dd/0x8e0 fs/hugetlbfs/inode.c:1380
      [...]
      Call Trace:
       <TASK>
       vfs_parse_fs_param fs/fs_context.c:148 [inline]
       vfs_parse_fs_param+0x1f9/0x3c0 fs/fs_context.c:129
       vfs_parse_fs_string+0xdb/0x170 fs/fs_context.c:191
       generic_parse_monolithic+0x16f/0x1f0 fs/fs_context.c:231
       do_new_mount fs/namespace.c:3036 [inline]
       path_mount+0x12de/0x1e20 fs/namespace.c:3370
       do_mount fs/namespace.c:3383 [inline]
       __do_sys_mount fs/namespace.c:3591 [inline]
       __se_sys_mount fs/namespace.c:3568 [inline]
       __x64_sys_mount+0x27f/0x300 fs/namespace.c:3568
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
       [...]
       </TASK>
      ======================================================
      
      According to commit "vfs: parse: deal with zero length string value",
      kernel will set the param->string to null pointer in vfs_parse_fs_string()
      if fs string has zero length.
      
      Yet the problem is that, hugetlbfs_parse_param() will dereference the
      param->string, without checking whether it is a null pointer.  To be more
      specific, if hugetlbfs_parse_param() parses an illegal mount parameter,
      such as "size=,", kernel will constructs struct fs_parameter with null
      pointer in vfs_parse_fs_string(), then passes this struct fs_parameter to
      hugetlbfs_parse_param(), which triggers the above null-ptr-deref bug.
      
      This patch solves it by adding sanity check on param->string
      in hugetlbfs_parse_param().
      
      Link: https://lkml.kernel.org/r/20221020231609.4810-1-yin31149@gmail.com
      
      
      Reported-by: default avatar <syzbot+a3e6acd85ded5c16a709@syzkaller.appspotmail.com>
      Tested-by: default avatar <syzbot+a3e6acd85ded5c16a709@syzkaller.appspotmail.com>
        Link: https://lore.kernel.org/all/0000000000005ad00405eb7148c6@google.com/
      
      
      Signed-off-by: default avatarHawkins Jiawei <yin31149@gmail.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Hawkins Jiawei <yin31149@gmail.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Ian Kent <raven@themaw.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      26215b7e
    • Kefeng Wang's avatar
      mm: remove kern_addr_valid() completely · e025ab84
      Kefeng Wang authored
      Most architectures (except arm64/x86/sparc) simply return 1 for
      kern_addr_valid(), which is only used in read_kcore(), and it calls
      copy_from_kernel_nofault() which could check whether the address is a
      valid kernel address.  So as there is no need for kern_addr_valid(), let's
      remove it.
      
      Link: https://lkml.kernel.org/r/20221018074014.185687-1-wangkefeng.wang@huawei.com
      
      
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Acked-by: Heiko Carstens <hca@linux.ibm.com>		[s390]
      Acked-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: Helge Deller <deller@gmx.de>			[parisc]
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
      Acked-by: Guo Ren <guoren@kernel.org>			[csky]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: <aou@eecs.berkeley.edu>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dinh Nguyen <dinguyen@kernel.org>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Palmer Dabbelt <palmer@rivosinc.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Henderson <richard.henderson@linaro.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xuerui Wang <kernel@xen0n.name>
      Cc: Yoshinori Sato <ysato@users.osdn.me>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e025ab84
    • Liu Shixin's avatar
      memory: move hotplug memory notifier priority to same file for easy sorting · 1eeaa4fd
      Liu Shixin authored
      The priority of hotplug memory callback is defined in a different file. 
      And there are some callers using numbers directly.  Collect them together
      into include/linux/memory.h for easy reading.  This allows us to sort
      their priorities more intuitively without additional comments.
      
      Link: https://lkml.kernel.org/r/20220923033347.3935160-9-liushixin2@huawei.com
      
      
      Signed-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Waiman Long <longman@redhat.com>
      Cc: zefan li <lizefan.x@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1eeaa4fd
    • Liu Shixin's avatar
      fs/proc/kcore.c: use hotplug_memory_notifier() directly · 5d89c224
      Liu Shixin authored
      Commit 76ae8474 ("Documentation: raise minimum supported version of
      GCC to 5.1") updated the minimum gcc version to 5.1.  So the problem
      mentioned in f02c6968 ("include/linux/memory.h: implement
      register_hotmemory_notifier()") no longer exist.  So we can now switch to
      use hotplug_memory_notifier() directly rather than
      register_hotmemory_notifier().
      
      Link: https://lkml.kernel.org/r/20220923033347.3935160-3-liushixin2@huawei.com
      
      
      Signed-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Waiman Long <longman@redhat.com>
      Cc: zefan li <lizefan.x@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5d89c224
    • Sidhartha Kumar's avatar
      hugetlbfs: convert hugetlb_delete_from_page_cache() to use folios · ece62684
      Sidhartha Kumar authored
      Remove the last caller of delete_from_page_cache() by converting the code
      to its folio equivalent.
      
      Link: https://lkml.kernel.org/r/20220922154207.1575343-5-sidhartha.kumar@oracle.com
      
      
      Signed-off-by: default avatarSidhartha Kumar <sidhartha.kumar@oracle.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Colin Cross <ccross@google.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ece62684
    • Sidhartha Kumar's avatar
      mm/hugetlb: add hugetlb_folio_subpool() helpers · 149562f7
      Sidhartha Kumar authored
      Allow hugetlbfs_migrate_folio to check and read subpool information by
      passing in a folio.
      
      Link: https://lkml.kernel.org/r/20220922154207.1575343-4-sidhartha.kumar@oracle.com
      
      
      Signed-off-by: default avatarSidhartha Kumar <sidhartha.kumar@oracle.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Colin Cross <ccross@google.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      149562f7
  5. Nov 08, 2022
  6. Nov 06, 2022
    • Theodore Ts'o's avatar
      ext4: fix fortify warning in fs/ext4/fast_commit.c:1551 · 0d043351
      Theodore Ts'o authored
      
      With the new fortify string system, rework the memcpy to avoid this
      warning:
      
      memcpy: detected field-spanning write (size 60) of single field "&raw_inode->i_generation" at fs/ext4/fast_commit.c:1551 (size 4)
      
      Cc: stable@kernel.org
      Fixes: 54d9469b ("fortify: Add run-time WARN for cross-field memcpy()")
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      0d043351
    • Jason Yan's avatar
      ext4: fix wrong return err in ext4_load_and_init_journal() · 9f2a1d9f
      Jason Yan authored
      
      The return value is wrong in ext4_load_and_init_journal(). The local
      variable 'err' need to be initialized before goto out. The original code
      in __ext4_fill_super() is fine because it has two return values 'ret'
      and 'err' and 'ret' is initialized as -EINVAL. After we factor out
      ext4_load_and_init_journal(), this code is broken. So fix it by directly
      returning -EINVAL in the error handler path.
      
      Cc: stable@kernel.org
      Fixes: 9c1dd22d ("ext4: factor out ext4_load_and_init_journal()")
      Signed-off-by: default avatarJason Yan <yanaijie@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20221025040206.3134773-1-yanaijie@huawei.com
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      9f2a1d9f
    • Ye Bin's avatar
      ext4: fix warning in 'ext4_da_release_space' · 1b8f787e
      Ye Bin authored
      
      Syzkaller report issue as follows:
      EXT4-fs (loop0): Free/Dirty block details
      EXT4-fs (loop0): free_blocks=0
      EXT4-fs (loop0): dirty_blocks=0
      EXT4-fs (loop0): Block reservation details
      EXT4-fs (loop0): i_reserved_data_blocks=0
      EXT4-fs warning (device loop0): ext4_da_release_space:1527: ext4_da_release_space: ino 18, to_free 1 with only 0 reserved data blocks
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 92 at fs/ext4/inode.c:1528 ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1524
      Modules linked in:
      CPU: 0 PID: 92 Comm: kworker/u4:4 Not tainted 6.0.0-syzkaller-09423-g493ffd6605b2 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
      Workqueue: writeback wb_workfn (flush-7:0)
      RIP: 0010:ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1528
      RSP: 0018:ffffc900015f6c90 EFLAGS: 00010296
      RAX: 42215896cd52ea00 RBX: 0000000000000000 RCX: 42215896cd52ea00
      RDX: 0000000000000000 RSI: 0000000080000001 RDI: 0000000000000000
      RBP: 1ffff1100e907d96 R08: ffffffff816aa79d R09: fffff520002bece5
      R10: fffff520002bece5 R11: 1ffff920002bece4 R12: ffff888021fd2000
      R13: ffff88807483ecb0 R14: 0000000000000001 R15: ffff88807483e740
      FS:  0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00005555569ba628 CR3: 000000000c88e000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       ext4_es_remove_extent+0x1ab/0x260 fs/ext4/extents_status.c:1461
       mpage_release_unused_pages+0x24d/0xef0 fs/ext4/inode.c:1589
       ext4_writepages+0x12eb/0x3be0 fs/ext4/inode.c:2852
       do_writepages+0x3c3/0x680 mm/page-writeback.c:2469
       __writeback_single_inode+0xd1/0x670 fs/fs-writeback.c:1587
       writeback_sb_inodes+0xb3b/0x18f0 fs/fs-writeback.c:1870
       wb_writeback+0x41f/0x7b0 fs/fs-writeback.c:2044
       wb_do_writeback fs/fs-writeback.c:2187 [inline]
       wb_workfn+0x3cb/0xef0 fs/fs-writeback.c:2227
       process_one_work+0x877/0xdb0 kernel/workqueue.c:2289
       worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
       kthread+0x266/0x300 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
       </TASK>
      
      Above issue may happens as follows:
      ext4_da_write_begin
        ext4_create_inline_data
          ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
          ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA);
      __ext4_ioctl
        ext4_ext_migrate -> will lead to eh->eh_entries not zero, and set extent flag
      ext4_da_write_begin
        ext4_da_convert_inline_data_to_extent
          ext4_da_write_inline_data_begin
            ext4_da_map_blocks
              ext4_insert_delayed_block
      	  if (!ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk))
      	    if (!ext4_es_scan_clu(inode, &ext4_es_is_mapped, lblk))
      	      ext4_clu_mapped(inode, EXT4_B2C(sbi, lblk)); -> will return 1
      	       allocated = true;
                ext4_es_insert_delayed_block(inode, lblk, allocated);
      ext4_writepages
        mpage_map_and_submit_extent(handle, &mpd, &give_up_on_write); -> return -ENOSPC
        mpage_release_unused_pages(&mpd, give_up_on_write); -> give_up_on_write == 1
          ext4_es_remove_extent
            ext4_da_release_space(inode, reserved);
              if (unlikely(to_free > ei->i_reserved_data_blocks))
      	  -> to_free == 1  but ei->i_reserved_data_blocks == 0
      	  -> then trigger warning as above
      
      To solve above issue, forbid inode do migrate which has inline data.
      
      Cc: stable@kernel.org
      Reported-by: default avatar <syzbot+c740bb18df70ad00952e@syzkaller.appspotmail.com>
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20221018022701.683489-1-yebin10@huawei.com
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      1b8f787e
    • Luís Henriques's avatar
      ext4: fix BUG_ON() when directory entry has invalid rec_len · 17a0bc9b
      Luís Henriques authored
      The rec_len field in the directory entry has to be a multiple of 4.  A
      corrupted filesystem image can be used to hit a BUG() in
      ext4_rec_len_to_disk(), called from make_indexed_dir().
      
       ------------[ cut here ]------------
       kernel BUG at fs/ext4/ext4.h:2413!
       ...
       RIP: 0010:make_indexed_dir+0x53f/0x5f0
       ...
       Call Trace:
        <TASK>
        ? add_dirent_to_buf+0x1b2/0x200
        ext4_add_entry+0x36e/0x480
        ext4_add_nondir+0x2b/0xc0
        ext4_create+0x163/0x200
        path_openat+0x635/0xe90
        do_filp_open+0xb4/0x160
        ? __create_object.isra.0+0x1de/0x3b0
        ? _raw_spin_unlock+0x12/0x30
        do_sys_openat2+0x91/0x150
        __x64_sys_open+0x6c/0xa0
        do_syscall_64+0x3c/0x80
        entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      The fix simply adds a call to ext4_check_dir_entry() to validate the
      directory entry, returning -EFSCORRUPTED if the entry is invalid.
      
      CC: stable@kernel.org
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216540
      
      
      Signed-off-by: default avatarLuís Henriques <lhenriques@suse.de>
      Link: https://lore.kernel.org/r/20221012131330.32456-1-lhenriques@suse.de
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      17a0bc9b
  7. Nov 05, 2022
    • ChenXiaoSong's avatar
      cifs: fix use-after-free on the link name · 542228db
      ChenXiaoSong authored
      
      xfstests generic/011 reported use-after-free bug as follows:
      
        BUG: KASAN: use-after-free in __d_alloc+0x269/0x859
        Read of size 15 at addr ffff8880078933a0 by task dirstress/952
      
        CPU: 1 PID: 952 Comm: dirstress Not tainted 6.1.0-rc3+ #77
        Call Trace:
         __dump_stack+0x23/0x29
         dump_stack_lvl+0x51/0x73
         print_address_description+0x67/0x27f
         print_report+0x3e/0x5c
         kasan_report+0x7b/0xa8
         kasan_check_range+0x1b2/0x1c1
         memcpy+0x22/0x5d
         __d_alloc+0x269/0x859
         d_alloc+0x45/0x20c
         d_alloc_parallel+0xb2/0x8b2
         lookup_open+0x3b8/0x9f9
         open_last_lookups+0x63d/0xc26
         path_openat+0x11a/0x261
         do_filp_open+0xcc/0x168
         do_sys_openat2+0x13b/0x3f7
         do_sys_open+0x10f/0x146
         __se_sys_creat+0x27/0x2e
         __x64_sys_creat+0x55/0x6a
         do_syscall_64+0x40/0x96
         entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
        Allocated by task 952:
         kasan_save_stack+0x1f/0x42
         kasan_set_track+0x21/0x2a
         kasan_save_alloc_info+0x17/0x1d
         __kasan_kmalloc+0x7e/0x87
         __kmalloc_node_track_caller+0x59/0x155
         kstrndup+0x60/0xe6
         parse_mf_symlink+0x215/0x30b
         check_mf_symlink+0x260/0x36a
         cifs_get_inode_info+0x14e1/0x1690
         cifs_revalidate_dentry_attr+0x70d/0x964
         cifs_revalidate_dentry+0x36/0x62
         cifs_d_revalidate+0x162/0x446
         lookup_open+0x36f/0x9f9
         open_last_lookups+0x63d/0xc26
         path_openat+0x11a/0x261
         do_filp_open+0xcc/0x168
         do_sys_openat2+0x13b/0x3f7
         do_sys_open+0x10f/0x146
         __se_sys_creat+0x27/0x2e
         __x64_sys_creat+0x55/0x6a
         do_syscall_64+0x40/0x96
         entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
        Freed by task 950:
         kasan_save_stack+0x1f/0x42
         kasan_set_track+0x21/0x2a
         kasan_save_free_info+0x1c/0x34
         ____kasan_slab_free+0x1c1/0x1d5
         __kasan_slab_free+0xe/0x13
         __kmem_cache_free+0x29a/0x387
         kfree+0xd3/0x10e
         cifs_fattr_to_inode+0xb6a/0xc8c
         cifs_get_inode_info+0x3cb/0x1690
         cifs_revalidate_dentry_attr+0x70d/0x964
         cifs_revalidate_dentry+0x36/0x62
         cifs_d_revalidate+0x162/0x446
         lookup_open+0x36f/0x9f9
         open_last_lookups+0x63d/0xc26
         path_openat+0x11a/0x261
         do_filp_open+0xcc/0x168
         do_sys_openat2+0x13b/0x3f7
         do_sys_open+0x10f/0x146
         __se_sys_creat+0x27/0x2e
         __x64_sys_creat+0x55/0x6a
         do_syscall_64+0x40/0x96
         entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      When opened a symlink, link name is from 'inode->i_link', but it may be
      reset to a new value when revalidate the dentry. If some processes get the
      link name on the race scenario, then UAF will happen on link name.
      
      Fix this by implementing 'get_link' interface to duplicate the link name.
      
      Fixes: 76894f3e ("cifs: improve symlink handling for smb2+")
      Signed-off-by: default avatarChenXiaoSong <chenxiaosong2@huawei.com>
      Reviewed-by: default avatarPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      542228db
    • Shyam Prasad N's avatar
      cifs: avoid unnecessary iteration of tcp sessions · 23d9b9b7
      Shyam Prasad N authored
      
      In a few places, we do unnecessary iterations of
      tcp sessions, even when the server struct is provided.
      
      The change avoids it and uses the server struct provided.
      
      Signed-off-by: default avatarShyam Prasad N <sprasad@microsoft.com>
      Reviewed-by: default avatarPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      23d9b9b7
    • Shyam Prasad N's avatar
      cifs: always iterate smb sessions using primary channel · 8abcaeae
      Shyam Prasad N authored
      
      smb sessions and tcons currently hang off primary channel only.
      Secondary channels have the lists as empty. Whenever there's a
      need to iterate sessions or tcons, we should use the list in the
      corresponding primary channel.
      
      Signed-off-by: default avatarShyam Prasad N <sprasad@microsoft.com>
      Reviewed-by: default avatarPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      8abcaeae
  8. Nov 02, 2022
    • Filipe Manana's avatar
      btrfs: fix inode reserve space leak due to nowait buffered write · eb81b682
      Filipe Manana authored
      
      During a nowait buffered write, if we fail to balance dirty pages we exit
      btrfs_buffered_write() without releasing the delalloc space reserved for
      an extent, resulting in leaking space from the inode's block reserve.
      
      So fix that by releasing the delalloc space for the extent when balancing
      dirty pages fails.
      
      Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
      Link: https://lore.kernel.org/all/202210111304.d369bc32-yujie.liu@intel.com
      
      
      Fixes: 965f47ae ("btrfs: make btrfs_buffered_write nowait compatible")
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      eb81b682
    • Filipe Manana's avatar
      btrfs: fix nowait buffered write returning -ENOSPC · a348c8d4
      Filipe Manana authored
      
      If we are doing a buffered write in NOWAIT context and we can't reserve
      metadata space due to -ENOSPC, then we should return -EAGAIN so that we
      retry the write in a context allowed to block and do metadata reservation
      with flushing, which might succeed this time due to the allowed flushing.
      
      Returning -ENOSPC while in NOWAIT context simply makes some writes fail
      with -ENOSPC when they would likely succeed after switching from NOWAIT
      context to blocking context. That is unexpected behaviour and even fio
      complains about it with a warning like this:
      
        fio: io_u error on file /mnt/sdi/task_0.0.0: No space left on device: write offset=1535705088, buflen=65536
        fio: pid=592630, err=28/file:io_u.c:1846, func=io_u error, error=No space left on device
      
      The fio's job config is this:
      
         [global]
         bs=64K
         ioengine=io_uring
         iodepth=1
         size=2236962133
         nr_files=1
         filesize=2236962133
         direct=0
         runtime=10
         fallocate=posix
         io_size=2236962133
         group_reporting
         time_based
      
         [task_0]
         rw=randwrite
         directory=/mnt/sdi
         numjobs=4
      
      So fix this by returning -EAGAIN if we are in NOWAIT context and the
      metadata reservation failed with -ENOSPC.
      
      Fixes: 304e45ac ("btrfs: plumb NOWAIT through the write path")
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a348c8d4
    • Filipe Manana's avatar
      btrfs: remove pointless and double ulist frees in error paths of qgroup tests · d0ea17ae
      Filipe Manana authored
      
      Several places in the qgroup self tests follow the pattern of freeing the
      ulist pointer they passed to btrfs_find_all_roots() if the call to that
      function returned an error. That is pointless because that function always
      frees the ulist in case it returns an error.
      
      Also In some places like at test_multiple_refs(), after a call to
      btrfs_qgroup_account_extent() we also leave "old_roots" and "new_roots"
      pointing to ulists that were freed, because btrfs_qgroup_account_extent()
      has freed those ulists, and if after that the next call to
      btrfs_find_all_roots() fails, we call ulist_free() on the "old_roots"
      ulist again, resulting in a double free.
      
      So remove those calls to reduce the code size and avoid double ulist
      free in case of an error.
      
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d0ea17ae
    • Filipe Manana's avatar
      btrfs: fix ulist leaks in error paths of qgroup self tests · d37de92b
      Filipe Manana authored
      
      In the test_no_shared_qgroup() and test_multiple_refs() qgroup self tests,
      if we fail to add the tree ref, remove the extent item or remove the
      extent ref, we are returning from the test function without freeing the
      "old_roots" ulist that was allocated by the previous calls to
      btrfs_find_all_roots(). Fix that by calling ulist_free() before returning.
      
      Fixes: 442244c9 ("btrfs: qgroup: Switch self test to extent-oriented qgroup mechanism.")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d37de92b
    • Filipe Manana's avatar
      btrfs: fix inode list leak during backref walking at find_parent_nodes() · 92876eec
      Filipe Manana authored
      
      During backref walking, at find_parent_nodes(), if we are dealing with a
      data extent and we get an error while resolving the indirect backrefs, at
      resolve_indirect_refs(), or in the while loop that iterates over the refs
      in the direct refs rbtree, we end up leaking the inode lists attached to
      the direct refs we have in the direct refs rbtree that were not yet added
      to the refs ulist passed as argument to find_parent_nodes(). Since they
      were not yet added to the refs ulist and prelim_release() does not free
      the lists, on error the caller can only free the lists attached to the
      refs that were added to the refs ulist, all the remaining refs get their
      inode lists never freed, therefore leaking their memory.
      
      Fix this by having prelim_release() always free any attached inode list
      to each ref found in the rbtree, and have find_parent_nodes() set the
      ref's inode list to NULL once it transfers ownership of the inode list
      to a ref added to the refs ulist passed to find_parent_nodes().
      
      Fixes: 86d5f994 ("btrfs: convert prelimary reference tracking to use rbtrees")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      92876eec
    • Filipe Manana's avatar
      btrfs: fix inode list leak during backref walking at resolve_indirect_refs() · 5614dc3a
      Filipe Manana authored
      
      During backref walking, at resolve_indirect_refs(), if we get an error
      we jump to the 'out' label and call ulist_free() on the 'parents' ulist,
      which frees all the elements in the ulist - however that does not free
      any inode lists that may be attached to elements, through the 'aux' field
      of a ulist node, so we end up leaking lists if we have any attached to
      the unodes.
      
      Fix this by calling free_leaf_list() instead of ulist_free() when we exit
      from resolve_indirect_refs(). The static function free_leaf_list() is
      moved up for this to be possible and it's slightly simplified by removing
      unnecessary code.
      
      Fixes: 3301958b ("Btrfs: add inodes before dropping the extent lock in find_all_leafs")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      5614dc3a
  9. Nov 01, 2022
  10. Oct 31, 2022
Loading