Skip to content
Snippets Groups Projects
  1. Aug 11, 2017
  2. Aug 10, 2017
  3. Aug 09, 2017
  4. Aug 08, 2017
  5. Aug 06, 2017
  6. Aug 05, 2017
    • Andreas Dilger's avatar
      ext4: fix dir_nlink behaviour · c7414892
      Andreas Dilger authored
      The dir_nlink feature has been enabled by default for new ext4
      filesystems since e2fsprogs-1.41 in 2008, and was automatically
      enabled by the kernel for older ext4 filesystems since the
      dir_nlink feature was added with ext4 in kernel 2.6.28+ when
      the subdirectory count exceeded EXT4_LINK_MAX-1.
      
      Automatically adding the file system features such as dir_nlink is
      generally frowned upon, since it could cause the file system to not be
      mountable on older kernel, thus preventing the administrator from
      rolling back to an older kernel if necessary.
      
      In this case, the administrator might also want to disable the feature
      because glibc's fts_read() function does not correctly optimize
      directory traversal for directories that use st_nlinks field of 1 to
      indicate that the number of links in the directory are not tracked by
      the file system, and could fail to traverse the full directory
      hierarchy.  Fortunately, in the past ten years very few users have
      complained about incomplete file system traversal by glibc's
      fts_read().
      
      This commit also changes ext4_inc_count() to allow i_nlinks to reach
      the full EXT4_LINK_MAX links on the parent directory (including "."
      and "..") before changing i_links_count to be 1.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196405
      
      
      Signed-off-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      c7414892
    • Dan Carpenter's avatar
      ext4: silence array overflow warning · 381cebfe
      Dan Carpenter authored
      
      I get a static checker warning:
      
          fs/ext4/ext4.h:3091 ext4_set_de_type()
          error: buffer overflow 'ext4_type_by_mode' 15 <= 15
      
      It seems unlikely that we would hit this read overflow in real life, but
      it's also simple enough to make the array 16 bytes instead of 15.
      
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      381cebfe
    • Jan Kara's avatar
      ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize · fcf5ea10
      Jan Kara authored
      
      ext4_find_unwritten_pgoff() does not properly handle a situation when
      starting index is in the middle of a page and blocksize < pagesize. The
      following command shows the bug on filesystem with 1k blocksize:
      
        xfs_io -f -c "falloc 0 4k" \
                  -c "pwrite 1k 1k" \
                  -c "pwrite 3k 1k" \
                  -c "seek -a -r 0" foo
      
      In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
      SEEK_DATA) will return the correct result.
      
      Fix the problem by neglecting buffers in a page before starting offset.
      
      Reported-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      CC: stable@vger.kernel.org # 3.8+
      fcf5ea10
    • Daeho Jeong's avatar
      ext4: release discard bio after sending discard commands · e4510577
      Daeho Jeong authored
      
      We've changed the discard command handling into parallel manner.
      But, in this change, I forgot decreasing the usage count of the bio
      which was used to send discard request. I'm sorry about that.
      
      Fixes: a0154344 ("ext4: send parallel discards on commit completions")
      Signed-off-by: default avatarDaeho Jeong <daeho.jeong@samsung.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      e4510577
  7. Aug 04, 2017
  8. Aug 03, 2017
    • Ashish Samant's avatar
      fuse: Dont call set_page_dirty_lock() for ITER_BVEC pages for async_dio · 61c12b49
      Ashish Samant authored
      
      Commit 8fba54ae ("fuse: direct-io: don't dirty ITER_BVEC pages") fixes
      the ITER_BVEC page deadlock for direct io in fuse by checking in
      fuse_direct_io(), whether the page is a bvec page or not, before locking
      it.  However, this check is missed when the "async_dio" mount option is
      enabled.  In this case, set_page_dirty_lock() is called from the req->end
      callback in request_end(), when the fuse thread is returning from userspace
      to respond to the read request.  This will cause the same deadlock because
      the bvec condition is not checked in this path.
      
      Here is the stack of the deadlocked thread, while returning from userspace:
      
      [13706.656686] INFO: task glusterfs:3006 blocked for more than 120 seconds.
      [13706.657808] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
      this message.
      [13706.658788] glusterfs       D ffffffff816c80f0     0  3006      1
      0x00000080
      [13706.658797]  ffff8800d6713a58 0000000000000086 ffff8800d9ad7000
      ffff8800d9ad5400
      [13706.658799]  ffff88011ffd5cc0 ffff8800d6710008 ffff88011fd176c0
      7fffffffffffffff
      [13706.658801]  0000000000000002 ffffffff816c80f0 ffff8800d6713a78
      ffffffff816c790e
      [13706.658803] Call Trace:
      [13706.658809]  [<ffffffff816c80f0>] ? bit_wait_io_timeout+0x80/0x80
      [13706.658811]  [<ffffffff816c790e>] schedule+0x3e/0x90
      [13706.658813]  [<ffffffff816ca7e5>] schedule_timeout+0x1b5/0x210
      [13706.658816]  [<ffffffff81073ffb>] ? gup_pud_range+0x1db/0x1f0
      [13706.658817]  [<ffffffff810668fe>] ? kvm_clock_read+0x1e/0x20
      [13706.658819]  [<ffffffff81066909>] ? kvm_clock_get_cycles+0x9/0x10
      [13706.658822]  [<ffffffff810f5792>] ? ktime_get+0x52/0xc0
      [13706.658824]  [<ffffffff816c6f04>] io_schedule_timeout+0xa4/0x110
      [13706.658826]  [<ffffffff816c8126>] bit_wait_io+0x36/0x50
      [13706.658828]  [<ffffffff816c7d06>] __wait_on_bit_lock+0x76/0xb0
      [13706.658831]  [<ffffffffa0545636>] ? lock_request+0x46/0x70 [fuse]
      [13706.658834]  [<ffffffff8118800a>] __lock_page+0xaa/0xb0
      [13706.658836]  [<ffffffff810c8500>] ? wake_atomic_t_function+0x40/0x40
      [13706.658838]  [<ffffffff81194d08>] set_page_dirty_lock+0x58/0x60
      [13706.658841]  [<ffffffffa054d968>] fuse_release_user_pages+0x58/0x70 [fuse]
      [13706.658844]  [<ffffffffa0551430>] ? fuse_aio_complete+0x190/0x190 [fuse]
      [13706.658847]  [<ffffffffa0551459>] fuse_aio_complete_req+0x29/0x90 [fuse]
      [13706.658849]  [<ffffffffa05471e9>] request_end+0xd9/0x190 [fuse]
      [13706.658852]  [<ffffffffa0549126>] fuse_dev_do_write+0x336/0x490 [fuse]
      [13706.658854]  [<ffffffffa054963e>] fuse_dev_write+0x6e/0xa0 [fuse]
      [13706.658857]  [<ffffffff812a9ef3>] ? security_file_permission+0x23/0x90
      [13706.658859]  [<ffffffff81205300>] do_iter_readv_writev+0x60/0x90
      [13706.658862]  [<ffffffffa05495d0>] ? fuse_dev_splice_write+0x350/0x350
      [fuse]
      [13706.658863]  [<ffffffff812062a1>] do_readv_writev+0x171/0x1f0
      [13706.658866]  [<ffffffff810b3d00>] ? try_to_wake_up+0x210/0x210
      [13706.658868]  [<ffffffff81206361>] vfs_writev+0x41/0x50
      [13706.658870]  [<ffffffff81206496>] SyS_writev+0x56/0xf0
      [13706.658872]  [<ffffffff810257a1>] ? syscall_trace_leave+0xf1/0x160
      [13706.658874]  [<ffffffff816cbb2e>] system_call_fastpath+0x12/0x71
      
      Fix this by making should_dirty a fuse_io_priv parameter that can be
      checked in fuse_aio_complete_req().
      
      Reported-by: default avatarTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: default avatarAshish Samant <ashish.samant@oracle.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      61c12b49
    • Jan Kara's avatar
      ocfs2: don't clear SGID when inheriting ACLs · 19ec8e48
      Jan Kara authored
      When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
      set, DIR1 is expected to have SGID bit set (and owning group equal to
      the owning group of 'DIR0').  However when 'DIR0' also has some default
      ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
      'DIR1' to get cleared if user is not member of the owning group.
      
      Fix the problem by moving posix_acl_update_mode() out of ocfs2_set_acl()
      into ocfs2_iop_set_acl().  That way the function will not be called when
      inheriting ACLs which is what we want as it prevents SGID bit clearing
      and the mode has been properly set by posix_acl_create() anyway.  Also
      posix_acl_chmod() that is calling ocfs2_set_acl() takes care of updating
      mode itself.
      
      Fixes: 07393101 ("posix_acl: Clear SGID bit when setting file permissions")
      Link: http://lkml.kernel.org/r/20170801141252.19675-3-jack@suse.cz
      
      
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      19ec8e48
    • Mike Rapoport's avatar
      userfaultfd: non-cooperative: flush event_wqh at release time · 5a18b64e
      Mike Rapoport authored
      There may still be threads waiting on event_wqh at the time the
      userfault file descriptor is closed.  Flush the events wait-queue to
      prevent waiting threads from hanging.
      
      Link: http://lkml.kernel.org/r/1501398127-30419-1-git-send-email-rppt@linux.vnet.ibm.com
      
      
      Fixes: 9cd75c3c ("userfaultfd: non-cooperative: add ability to report
      non-PF events from uffd descriptor")
      Signed-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5a18b64e
    • Mike Rapoport's avatar
      userfaultfd_zeropage: return -ENOSPC in case mm has gone · 9d95aa4b
      Mike Rapoport authored
      In the non-cooperative userfaultfd case, the process exit may race with
      outstanding mcopy_atomic called by the uffd monitor.  Returning -ENOSPC
      instead of -EINVAL when mm is already gone will allow uffd monitor to
      distinguish this case from other error conditions.
      
      Unfortunately I overlooked userfaultfd_zeropage when updating
      userfaultd_copy().
      
      Link: http://lkml.kernel.org/r/1501136819-21857-1-git-send-email-rppt@linux.vnet.ibm.com
      
      
      Fixes: 96333187 ("userfaultfd_copy: return -ENOSPC in case mm has gone")
      Signed-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9d95aa4b
  9. Aug 02, 2017
  10. Aug 01, 2017
  11. Jul 31, 2017
  12. Jul 28, 2017
    • Benjamin Coddington's avatar
      NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter · b7dbcc0e
      Benjamin Coddington authored
      
      nfs4_retry_setlk() sets the task's state to TASK_INTERRUPTIBLE within the
      same region protected by the wait_queue's lock after checking for a
      notification from CB_NOTIFY_LOCK callback.  However, after releasing that
      lock, a wakeup for that task may race in before the call to
      freezable_schedule_timeout_interruptible() and set TASK_WAKING, then
      freezable_schedule_timeout_interruptible() will set the state back to
      TASK_INTERRUPTIBLE before the task will sleep.  The result is that the task
      will sleep for the entire duration of the timeout.
      
      Since we've already set TASK_INTERRUPTIBLE in the locked section, just use
      freezable_schedule_timout() instead.
      
      Fixes: a1d617d8 ("nfs: allow blocking locks to be awoken by lock callbacks")
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      b7dbcc0e
  13. Jul 27, 2017
    • NeilBrown's avatar
      NFS: Optimize fallocate by refreshing mapping when needed. · 6ba80d43
      NeilBrown authored
      
      posix_fallocate() will allocate space in an NFS file by considering
      the last byte of every 4K block.  If it is before EOF, it will read
      the byte and if it is zero, a zero is written out.  If it is after EOF,
      the zero is unconditionally written.
      
      For the blocks beyond EOF, if NFS believes its cache is valid, it will
      expand these writes to write full pages, and then will merge the pages.
      This results if (typically) 1MB writes.  If NFS believes its cache is
      not valid (particularly if NFS_INO_INVALID_DATA or
      NFS_INO_REVAL_PAGECACHE are set - see nfs_write_pageuptodate()), it will
      send the individual 1-byte writes. This results in (typically) 256 times
      as many RPC requests, and can be substantially slower.
      
      Currently nfs_revalidate_mapping() is only used when reading a file or
      mmapping a file, as these are times when the content needs to be
      up-to-date.  Writes don't generally need the cache to be up-to-date, but
      writes beyond EOF can benefit, particularly in the posix_fallocate()
      case.
      
      So this patch calls nfs_revalidate_mapping() when writing beyond EOF -
      i.e. when there is a gap between the end of the file and the start of
      the write.  If the cache is thought to be out of date (as happens after
      taking a file lock), this will cause a GETATTR, and the two flags
      mentioned above will be cleared.  With this, posix_fallocate() on a
      newly locked file does not generate excessive tiny writes.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      6ba80d43
    • NeilBrown's avatar
      NFS: invalidate file size when taking a lock. · 442ce049
      NeilBrown authored
      
      Prior to commit ca0daa27 ("NFS: Cache aggressively when file is open
      for writing"), NFS would revalidate, or invalidate, the file size when
      taking a lock.  Since that commit it only invalidates the file content.
      
      If the file size is changed on the server while wait for the lock, the
      client will have an incorrect understanding of the file size and could
      corrupt data.  This particularly happens when writing beyond the
      (supposed) end of file and can be easily be demonstrated with
      posix_fallocate().
      
      If an application opens an empty file, waits for a write lock, and then
      calls posix_fallocate(), glibc will determine that the underlying
      filesystem doesn't support fallocate (assuming version 4.1 or earlier)
      and will write out a '0' byte at the end of each 4K page in the region
      being fallocated that is after the end of the file.
      NFS will (usually) detect that these writes are beyond EOF and will
      expand them to cover the whole page, and then will merge the pages.
      Consequently, NFS will write out large blocks of zeroes beyond where it
      thought EOF was.  If EOF had moved, the pre-existing part of the file
      will be over-written.  Locking should have protected against this,
      but it doesn't.
      
      This patch restores the use of nfs_zap_caches() which invalidated the
      cached attributes.  When posix_fallocate() asks for the file size, the
      request will go to the server and get a correct answer.
      
      cc: stable@vger.kernel.org (v4.8+)
      Fixes: ca0daa27 ("NFS: Cache aggressively when file is open for writing")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      442ce049
  14. Jul 26, 2017
Loading