- Dec 04, 2021
-
-
Christoph Hellwig authored
Add a flag so that the file system can easily detect DAX operations based just on the iomap operation requested instead of looking at inode state using IS_DAX. This will be needed to apply the to be added partition offset only for operations that actually use DAX, but not things like fiemap that are based on the block device. In the long run it should also allow turning the bdev, dax_dev and inline_data into a union. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Dan Williams <dan.j.williams@intel.com> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-25-hch@lst.de Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Christoph Hellwig authored
Unshare the DAX and iomap buffered I/O page zeroing code. This code previously did a IS_DAX check deep inside the iomap code, which in fact was the only DAX check in the code. Instead move these checks into the callers. Most callers already have DAX special casing anyway and XFS will need it for reflink support as well. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Dan Williams <dan.j.williams@intel.com> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-19-hch@lst.de Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Christoph Hellwig authored
Factor out a helper for the "manual" zeroing of a DAX range to clean up dax_iomap_zero a lot. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Dan Williams <dan.j.williams@intel.com> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-18-hch@lst.de Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Christoph Hellwig authored
The file relative offset must have the same alignment as the storage offset, so use that and get rid of the call to iomap_sector. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Dan Williams <dan.j.williams@intel.com> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-17-hch@lst.de Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Christoph Hellwig authored
Replace the two steps of dax_iomap_sector and bdev_dax_pgoff with a single dax_iomap_pgoff helper that avoids lots of cumbersome sector conversions. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Dan Williams <dan.j.williams@intel.com> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-15-hch@lst.de Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Christoph Hellwig authored
Just pass the vm_fault and iomap_iter structures, and figure out the rest locally. Note that this requires moving dax_iomap_sector up in the file. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Dan Williams <dan.j.williams@intel.com> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-14-hch@lst.de Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Christoph Hellwig authored
Despite its name copy_user_page expected kernel addresses, which is what we already have. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Dan Williams <dan.j.williams@intel.com> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-13-hch@lst.de Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- Aug 17, 2021
-
-
Christoph Hellwig authored
Avoid the open coded calls to ->iomap_begin and ->iomap_end and call iomap_iter instead. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Darrick J. Wong <djwong@kernel.org>
-
Shiyang Ruan authored
The core logic in the two dax page fault functions is similar. So, move the logic into a common helper function. Also, to facilitate the addition of new features, such as CoW, switch-case is no longer used to handle different iomap types. Signed-off-by:
Shiyang Ruan <ruansy.fnst@fujitsu.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Darrick J. Wong <djwong@kernel.org>
-
Shiyang Ruan authored
The dax page fault code is too long and a bit difficult to read. And it is hard to understand when we trying to add new features. Some of the PTE/PMD codes have similar logic. So, factor out helper functions to simplify the code. Signed-off-by:
Shiyang Ruan <ruansy.fnst@fujitsu.com> Reviewed-by:
Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> [hch: minor cleanups] Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Darrick J. Wong <djwong@kernel.org>
-
Christoph Hellwig authored
Switch the dax_iomap_rw implementation to use iomap_iter. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Darrick J. Wong <djwong@kernel.org>
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Darrick J. Wong <djwong@kernel.org>
-
- Jul 08, 2021
-
-
Ira Weiny authored
dax_direct_access() takes a number of pages. PHYS_PFN(PAGE_SIZE) is a very round about way to specify '1'. Change the nr_pages parameter to the explicit value of '1'. Reviewed-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20210525172428.3634316-3-ira.weiny@intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- Jun 29, 2021
-
-
Jan Kara authored
grab_mapping_entry() has a bug in handling of ENOMEM condition. Suppose we have a PMD entry at index i which we are downgrading to a PTE entry. grab_mapping_entry() will set pmd_downgrade to true, lock the entry, clear the entry in xarray, and decrement mapping->nrpages. The it will call: entry = dax_make_entry(pfn_to_pfn_t(0), flags); dax_lock_entry(xas, entry); which inserts new PTE entry into xarray. However this may fail allocating the new node. We handle this by: if (xas_nomem(xas, mapping_gfp_mask(mapping) & ~__GFP_HIGHMEM)) goto retry; however pmd_downgrade stays set to true even though 'entry' returned from get_unlocked_entry() will be NULL now. And we will go again through the downgrade branch. This is mostly harmless except that mapping->nrpages is decremented again and we temporarily have an invalid entry stored in xarray. Fix the problem by setting pmd_downgrade to false each time we lookup the entry we work with so that it matches the entry we found. Link: https://lkml.kernel.org/r/20210622160015.18004-1-jack@suse.cz Fixes: b15cd800 ("dax: Convert page fault handlers to XArray") Signed-off-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- May 07, 2021
-
-
Vivek Goyal authored
I am seeing missed wakeups which ultimately lead to a deadlock when I am using virtiofs with DAX enabled and running "make -j". I had to mount virtiofs as rootfs and also reduce to dax window size to 256M to reproduce the problem consistently. So here is the problem. put_unlocked_entry() wakes up waiters only if entry is not null as well as !dax_is_conflict(entry). But if I call multiple instances of invalidate_inode_pages2() in parallel, then I can run into a situation where there are waiters on this index but nobody will wake these waiters. invalidate_inode_pages2() invalidate_inode_pages2_range() invalidate_exceptional_entry2() dax_invalidate_mapping_entry_sync() __dax_invalidate_entry() { xas_lock_irq(&xas); entry = get_unlocked_entry(&xas, 0); ... ... dax_disassociate_entry(entry, mapping, trunc); xas_store(&xas, NULL); ... ... put_unlocked_entry(&xas, entry); xas_unlock_irq(&xas); } Say a fault in in progress and it has locked entry at offset say "0x1c". Now say three instances of invalidate_inode_pages2() are in progress (A, B, C) and they all try to invalidate entry at offset "0x1c". Given dax entry is locked, all tree instances A, B, C will wait in wait queue. When dax fault finishes, say A is woken up. It will store NULL entry at index "0x1c" and wake up B. When B comes along it will find "entry=0" at page offset 0x1c and it will call put_unlocked_entry(&xas, 0). And this means put_unlocked_entry() will not wake up next waiter, given the current code. And that means C continues to wait and is not woken up. This patch fixes the issue by waking up all waiters when a dax entry has been invalidated. This seems to fix the deadlock I am facing and I can make forward progress. Reported-by:
Sergio Lopez <slp@redhat.com> Fixes: ac401cc7 ("dax: New fault locking") Reviewed-by:
Jan Kara <jack@suse.cz> Suggested-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20210428190314.1865312-4-vgoyal@redhat.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Vivek Goyal authored
As of now put_unlocked_entry() always wakes up next waiter. In next patches we want to wake up all waiters at one callsite. Hence, add a parameter to the function. This patch does not introduce any change of behavior. Reviewed-by:
Greg Kurz <groug@kaod.org> Reviewed-by:
Jan Kara <jack@suse.cz> Suggested-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20210428190314.1865312-3-vgoyal@redhat.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Vivek Goyal authored
Dan mentioned that he is not very fond of passing around a boolean true/false to specify if only next waiter should be woken up or all waiters should be woken up. He instead prefers that we introduce an enum and make it very explicity at the callsite itself. Easier to read code. This patch should not introduce any change of behavior. Reviewed-by:
Greg Kurz <groug@kaod.org> Reviewed-by:
Jan Kara <jack@suse.cz> Suggested-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20210428190314.1865312-2-vgoyal@redhat.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- May 05, 2021
-
-
Matthew Wilcox (Oracle) authored
Simplify mapping_needs_writeback() by accounting DAX entries as pages instead of exceptional entries. Link: https://lkml.kernel.org/r/20201026151849.24232-4-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by:
Vishal Verma <vishal.l.verma@intel.com> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Patch series "Remove nrexceptional tracking", v2. We actually use nrexceptional for very little these days. It's a minor pain to keep in sync with nrpages, but the pain becomes much bigger with the THP patches because we don't know how many indices a shadow entry occupies. It's easier to just remove it than keep it accurate. Also, we save 8 bytes per inode which is nothing to sneeze at; on my laptop, it would improve shmem_inode_cache from 22 to 23 objects per 16kB, and inode_cache from 26 to 27 objects. Combined, that saves a megabyte of memory from a combined usage of 25MB for both caches. Unfortunately, ext4 doesn't cross a magic boundary, so it doesn't save any memory for ext4. This patch (of 4): Instead of checking the two counters (nrpages and nrexceptional), we can just check whether i_pages is empty. Link: https://lkml.kernel.org/r/20201026151849.24232-1-willy@infradead.org Link: https://lkml.kernel.org/r/20201026151849.24232-2-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by:
Vishal Verma <vishal.l.verma@intel.com> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Feb 09, 2021
-
-
Paolo Bonzini authored
Currently, the follow_pfn function is exported for modules but follow_pte is not. However, follow_pfn is very easy to misuse, because it does not provide protections (so most of its callers assume the page is writable!) and because it returns after having already unlocked the page table lock. Provide instead a simplified version of follow_pte that does not have the pmdpp and range arguments. The older version survives as follow_invalidate_pte() for use by fs/dax.c. Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Dec 16, 2020
-
-
Christoph Hellwig authored
Merge __follow_pte_pmd, follow_pte_pmd and follow_pte into a single follow_pte function and just pass two additional NULL arguments for the two previous follow_pte callers. [sfr@canb.auug.org.au: merge fix for "s390/pci: remove races against pte updates"] Link: https://lkml.kernel.org/r/20201111221254.7f6a3658@canb.auug.org.au Link: https://lkml.kernel.org/r/20201029101432.47011-3-hch@lst.de Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Sep 21, 2020
-
-
Matthew Wilcox (Oracle) authored
Pass the full length to iomap_zero() and dax_iomap_zero(), and have them return how many bytes they actually handled. This is preparatory work for handling THP, although it looks like DAX could actually take advantage of it if there's a larger contiguous area. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Christoph Hellwig <hch@lst.de>
-
- Sep 10, 2020
-
-
Vivek Goyal authored
virtiofs device has a range of memory which is mapped into file inodes using dax. This memory is mapped in qemu on host and maps different sections of real file on host. Size of this memory is limited (determined by administrator) and depending on filesystem size, we will soon reach a situation where all the memory is in use and we need to reclaim some. As part of reclaim process, we will need to make sure that there are no active references to pages (taken by get_user_pages()) on the memory range we are trying to reclaim. I am planning to use dax_layout_busy_page() for this. But in current form this is per inode and scans through all the pages of the inode. We want to reclaim only a portion of memory (say 2MB page). So we want to make sure that only that 2MB range of pages do not have any references (and don't want to unmap all the pages of inode). Hence, create a range version of this function named dax_layout_busy_page_range() which can be used to pass a range which needs to be unmapped. Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-nvdimm@lists.01.org Cc: Jan Kara <jack@suse.cz> Cc: Vishal L Verma <vishal.l.verma@intel.com> Cc: "Weiny, Ira" <ira.weiny@intel.com> Signed-off-by:
Vivek Goyal <vgoyal@redhat.com> Reviewed-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
- Aug 23, 2020
-
-
Gustavo A. R. Silva authored
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by:
Gustavo A. R. Silva <gustavoars@kernel.org>
-
- Jul 31, 2020
-
-
Hao Li authored
The argument passed to xas_set_err() to indicate an error should be negative. Otherwise, xas_error() will return 0, and grab_mapping_entry() will return the found entry instead of 'SIGBUS' when the entry is not in fact valid. This would result in problems in subsequent code paths. Link: https://lore.kernel.org/r/20200729034436.24267-1-lihao2018.fnst@cn.fujitsu.com Reviewed-by:
Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by:
Hao Li <lihao2018.fnst@cn.fujitsu.com> Signed-off-by:
Vishal Verma <vishal.l.verma@intel.com>
-
- Jul 28, 2020
-
-
Ira Weiny authored
Passing size to copy_user_dax implies it can copy variable sizes of data when in fact it calls copy_user_page() which is exactly a page. We are safe because the only caller uses PAGE_SIZE anyway so just remove the variable for clarity. While we are at it change copy_user_dax() to copy_cow_page_dax() to make it clear it is a singleton helper for this one case not implementing what dax_iomap_actor() does. Link: https://lore.kernel.org/r/20200717072056.73134-11-ira.weiny@intel.com Reviewed-by:
Ben Widawsky <ben.widawsky@intel.com> Reviewed-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Ira Weiny <ira.weiny@intel.com> Signed-off-by:
Vishal Verma <vishal.l.verma@intel.com>
-
- Apr 03, 2020
-
-
Vivek Goyal authored
Add a helper dax_ioamp_zero() to zero a range. This patch basically merges __dax_zero_page_range() and iomap_dax_zero(). Suggested-by:
Christoph Hellwig <hch@infradead.org> Signed-off-by:
Vivek Goyal <vgoyal@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20200228163456.1587-7-vgoyal@redhat.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
Vivek Goyal authored
Use new dax native zero page method for zeroing page if I/O is page aligned. Otherwise fall back to direct_access() + memcpy(). This gets rid of one of the depenendency on block device in dax path. Signed-off-by:
Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20200228163456.1587-6-vgoyal@redhat.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- Feb 06, 2020
-
-
Jeff Moyer authored
fstests generic/471 reports a failure when run with MOUNT_OPTIONS="-o dax". The reason is that the initial pwrite to an empty file with the RWF_NOWAIT flag set does not return -EAGAIN. It turns out that dax_iomap_rw doesn't pass that flag through to iomap_apply. With this patch applied, generic/471 passes for me. Signed-off-by:
Jeff Moyer <jmoyer@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/x49r1z86e1d.fsf@segfault.boston.devel.redhat.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- Jan 03, 2020
-
-
Vivek Goyal authored
As of now dax_writeback_mapping_range() takes "struct block_device" as a parameter and dax_dev is searched from bdev name. This also involves taking a fresh reference on dax_dev and putting that reference at the end of function. We are developing a new filesystem virtio-fs and using dax to access host page cache directly. But there is no block device. IOW, we want to make use of dax but want to get rid of this assumption that there is always a block device associated with dax_dev. So pass in "struct dax_device" as parameter instead of bdev. ext2/ext4/xfs are current users and they already have a reference on dax_device. So there is no need to take reference and drop reference to dax_device on each call of this function. Suggested-by:
Christoph Hellwig <hch@infradead.org> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20200103183307.GB13350@redhat.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- Oct 23, 2019
-
-
Dan Williams authored
Users reported a v5.3 performance regression and inability to establish huge page mappings. A revised version of the ndctl "dax.sh" huge page unit test identifies commit 23c84eb7 "dax: Fix missed wakeup with PMD faults" as the source. Update get_unlocked_entry() to check for NULL entries before checking the entry order, otherwise NULL is misinterpreted as a present pte conflict. The 'order' check needs to happen before the locked check as an unlocked entry at the wrong order must fallback to lookup the correct order. Reported-by:
Jeff Smits <jeff.smits@intel.com> Reported-by:
Doug Nelson <doug.nelson@intel.com> Cc: <stable@vger.kernel.org> Fixes: 23c84eb7 ("dax: Fix missed wakeup with PMD faults") Reviewed-by:
Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Link: https://lore.kernel.org/r/157167532455.3945484.11971474077040503994.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- Oct 21, 2019
-
-
Goldwyn Rodrigues authored
The srcmap is used to identify where the read is to be performed from. It is passed to ->iomap_begin, which can fill it in if we need to read data for partially written blocks from a different location than the write target. The srcmap is only supported for buffered writes so far. Signed-off-by:
Goldwyn Rodrigues <rgoldwyn@suse.com> [hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as srcmap if not set, adjust length down to srcmap end as well] Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Acked-by:
Goldwyn Rodrigues <rgoldwyn@suse.com>
-
- Aug 05, 2019
-
-
Vivek Goyal authored
Vivek: "As of now dax_layout_busy_page() calls unmap_mapping_range() with last argument as 1, which says even unmap cow pages. I am wondering who needs to get rid of cow pages as well. I noticed one interesting side affect of this. I mount xfs with -o dax and mmaped a file with MAP_PRIVATE and wrote some data to a page which created cow page. Then I called fallocate() on that file to zero a page of file. fallocate() called dax_layout_busy_page() which unmapped cow pages as well and then I tried to read back the data I wrote and what I get is old data from persistent memory. I lost the data I had written. This read basically resulted in new fault and read back the data from persistent memory. This sounds wrong. Are there any users which need to unmap cow pages as well? If not, I am proposing changing it to not unmap cow pages. I noticed this while while writing virtio_fs code where when I tried to reclaim a memory range and that corrupted the executable and I was running from virtio-fs and program got segment violation." Dan: "In fact the unmap_mapping_range() in this path is only to synchronize against get_user_pages_fast() and force it to call back into the filesystem to re-establish the mapping. COW pages should be left untouched by dax_layout_busy_page()." Cc: <stable@vger.kernel.org> Fixes: 5fac7408 ("mm, fs, dax: handle layout changes to pinned dax mappings") Signed-off-by:
Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20190802192956.GA3032@redhat.com Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- Jul 29, 2019
-
-
Jan Kara authored
The condition checking whether put_unlocked_entry() needs to wake up following waiter got broken by commit 23c84eb7 ("dax: Fix missed wakeup with PMD faults"). We need to wake the waiter whenever the passed entry is valid (i.e., non-NULL and not special conflict entry). This could lead to processes never being woken up when waiting for entry lock. Fix the condition. Cc: <stable@vger.kernel.org> Link: http://lore.kernel.org/r/20190729120228.GC17833@quack2.suse.cz Fixes: 23c84eb7 ("dax: Fix missed wakeup with PMD faults") Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- Jul 17, 2019
-
-
Darrick J. Wong authored
Move internal function declarations out of fs/internal.h into include/linux/iomap.h so that our transition is complete. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Christoph Hellwig <hch@lst.de>
-
Matthew Wilcox (Oracle) authored
RocksDB can hang indefinitely when using a DAX file. This is due to a bug in the XArray conversion when handling a PMD fault and finding a PTE entry. We use the wrong index in the hash and end up waiting on the wrong waitqueue. There's actually no need to wait; if we find a PTE entry while looking for a PMD entry, we can return immediately as we know we should fall back to a PTE fault (which may not conflict with the lock held). We reuse the XA_RETRY_ENTRY to signal a conflicting entry was found. This value can never be found in an XArray while holding its lock, so it does not create an ambiguity. Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/CAPcyv4hwHpX-MkUEqxwdTj7wCCZCN4RV-L4jsnuwLGyL_UEG4A@mail.gmail.com Fixes: b15cd800 ("dax: Convert page fault handlers to XArray") Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by:
Dan Williams <dan.j.williams@intel.com> Reported-by:
Robert Barror <robert.barror@intel.com> Reported-by:
Seema Pandit <seema.pandit@intel.com> Reviewed-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- Jun 17, 2019
-
-
Nikolay Borisov authored
All callers of lockdep_assert_held_exclusive() use it to verify the correct locking state of either a semaphore (ldisc_sem in tty, mmap_sem for perf events, i_rwsem of inode for dax) or rwlock by apparmor. Thus it makes sense to rename _exclusive to _write since that's the semantics callers care. Additionally there is already lockdep_assert_held_read(), which this new naming is more consistent with. No functional changes. Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190531100651.3969-1-nborisov@suse.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Jun 07, 2019
-
-
Jan Kara authored
When inserting entry into xarray, we store mapping and index in corresponding struct pages for memory error handling. When it happened that one process was mapping file at PMD granularity while another process at PTE granularity, we could wrongly deassociate PMD range and then reassociate PTE range leaving the rest of struct pages in PMD range without mapping information which could later cause missed notifications about memory errors. Fix the problem by calling the association / deassociation code if and only if we are really going to update the xarray (deassociating and associating zero or empty entries is just no-op so there's no reason to complicate the code with trying to avoid the calls for these cases). Cc: <stable@vger.kernel.org> Fixes: d2c997c0 ("fs, dax: use page->mapping to warn if truncate...") Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Dan Williams <dan.j.williams@intel.com>
-
- Jun 05, 2019
-
-
Thomas Gleixner authored
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms and conditions of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 263 file(s). Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Allison Randal <allison@lohutok.net> Reviewed-by:
Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- May 14, 2019
-
-
Aneesh Kumar K.V authored
MADV_DONTNEED is handled with mmap_sem taken in read mode. We call page_mkclean without holding mmap_sem. MADV_DONTNEED implies that pages in the region are unmapped and subsequent access to the pages in that range is handled as a new page fault. This implies that if we don't have parallel access to the region when MADV_DONTNEED is run we expect those range to be unallocated. w.r.t page_mkclean() we need to make sure that we don't break the MADV_DONTNEED semantics. MADV_DONTNEED check for pmd_none without holding pmd_lock. This implies we skip the pmd if we temporarily mark pmd none. Avoid doing that while marking the page clean. Keep the sequence same for dax too even though we don't support MADV_DONTNEED for dax mapping The bug was noticed by code review and I didn't observe any failures w.r.t test run. This is similar to commit 58ceeb6b Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Date: Thu Apr 13 14:56:26 2017 -0700 thp: fix MADV_DONTNEED vs. MADV_FREE race commit ced10803 Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Date: Thu Apr 13 14:56:20 2017 -0700 thp: fix MADV_DONTNEED vs. numa balancing race Link: http://lkml.kernel.org/r/20190321040610.14226-1-aneesh.kumar@linux.ibm.com Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc:"Kirill A . Shutemov" <kirill@shutemov.name> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-