Skip to content
Snippets Groups Projects
  1. Nov 01, 2023
  2. Oct 20, 2023
  3. Oct 19, 2023
    • Jianyong Wu's avatar
      init/mount: print pretty name of root device when panics · 84d2b696
      Jianyong Wu authored
      
      Given a wrong root device, current log may not give the pretty name
      which is useful to locate root cause.
      
      For example, there are 2 blk devs in a VM, /dev/vda which has 2 partitials
      /dev/vda1 and /dev/vda2 and /dev/vdb which is blank. /dev/vda2 is the
      right root dev. When set "root=/dev/vdb", we get error log:
      
      [    0.635575] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(254,16)
      
      It's not straightforward to find out the root cause as there is lack of
      the root devive name therefore hard for people to get those info from the
      device number, in the example, (254,16).
      
      It is more comprehensive way to hint the root cause if pretty name is
      given here, like:
      
      [    0.559887] Kernel panic - not syncing: VFS: Unable to mount root fs on "/dev/vdb" or unknown-block(254,16)
      
      Signed-off-by: default avatarJianyong Wu <jianyong.wu@arm.com>
      Message-Id: <20230907091025.3436878-1-jianyong.wu@arm.com>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      84d2b696
  4. Oct 04, 2023
  5. Sep 22, 2023
  6. Sep 12, 2023
    • Kent Overstreet's avatar
      sched: Add task_struct->faults_disabled_mapping · 2b69987b
      Kent Overstreet authored
      
      There has been a long standing page cache coherence bug with direct IO.
      This provides part of a mechanism to fix it, currently just used by
      bcachefs but potentially worth promoting to the VFS.
      
      Direct IO evicts the range of the pagecache being read or written to.
      
      For reads, we need dirty pages to be written to disk, so that the read
      doesn't return stale data. For writes, we need to evict that range of
      the pagecache so that it's not stale after the write completes.
      
      However, without a locking mechanism to prevent those pages from being
      re-added to the pagecache - by a buffered read or page fault - page
      cache inconsistency is still possible.
      
      This isn't necessarily just an issue for userspace when they're playing
      games; filesystems may hang arbitrary state off the pagecache, and so
      page cache inconsistency may cause real filesystem bugs, depending on
      the filesystem. This is less of an issue for iomap based filesystems,
      but e.g. buffer heads caches disk block mappings (!) and attaches them
      to the pagecache, and bcachefs attaches disk reservations to pagecache
      pages.
      
      This issue has been hard to fix, because
       - we need to add a lock (henceforth called pagecache_add_lock), which
         would be held for the duration of the direct IO
       - page faults add pages to the page cache, thus need to take the same
         lock
       - dio -> gup -> page fault thus can deadlock
      
      And we cannot enforce a lock ordering with this lock, since userspace
      will be controlling the lock ordering (via the fd and buffer arguments
      to direct IOs), so we need a different method of deadlock avoidance.
      
      We need to tell the page fault handler that we're already holding a
      pagecache_add_lock, and since plumbing it through the entire gup() path
      would be highly impractical this adds a field to task_struct.
      
      Then the full method is:
       - in the dio path, when we first take the pagecache_add_lock, note the
         mapping in the current task_struct
       - in the page fault handler, if faults_disabled_mapping is set, we
         check if it's the same mapping as the one we're taking a page fault
         for, and if so return an error.
      
         Then we check lock ordering: if there's a lock ordering violation and
         trylock fails, we'll have to cycle the locks and return an error that
         tells the DIO path to retry: faults_disabled_mapping is also used for
         signalling "locks were dropped, please retry".
      
      Also relevant to this patch: mapping->invalidate_lock.
      mapping->invalidate_lock provides most of the required semantics - it's
      used by truncate/fallocate to block pages being added to the pagecache.
      However, since it's a rwsem, direct IOs would need to take the write
      side in order to block page cache adds, and would then be exclusive with
      each other - we'll need a new type of lock to pair with this approach.
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Darrick J. Wong <djwong@kernel.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Andreas Grünbacher <andreas.gruenbacher@gmail.com>
      2b69987b
  7. Sep 11, 2023
    • Ard Biesheuvel's avatar
      arch: Remove Itanium (IA-64) architecture · cf8e8658
      Ard Biesheuvel authored
      The Itanium architecture is obsolete, and an informal survey [0] reveals
      that any residual use of Itanium hardware in production is mostly HP-UX
      or OpenVMS based. The use of Linux on Itanium appears to be limited to
      enthusiasts that occasionally boot a fresh Linux kernel to see whether
      things are still working as intended, and perhaps to churn out some
      distro packages that are rarely used in practice.
      
      None of the original companies behind Itanium still produce or support
      any hardware or software for the architecture, and it is listed as
      'Orphaned' in the MAINTAINERS file, as apparently, none of the engineers
      that contributed on behalf of those companies (nor anyone else, for that
      matter) have been willing to support or maintain the architecture
      upstream or even be responsible for applying the odd fix. The Intel
      firmware team removed all IA-64 support from the Tianocore/EDK2
      reference implementation of EFI in 2018. (Itanium is the original
      architecture for which EFI was developed, and the way Linux supports it
      deviates significantly from other architectures.) Some distros, such as
      Debian and Gentoo, still maintain [unofficial] ia64 ports, but many have
      dropped support years ago.
      
      While the argument is being made [1] that there is a 'for the common
      good' angle to being able to build and run existing projects such as the
      Grid Community Toolkit [2] on Itanium for interoperability testing, the
      fact remains that none of those projects are known to be deployed on
      Linux/ia64, and very few people actually have access to such a system in
      the first place. Even if there were ways imaginable in which Linux/ia64
      could be put to good use today, what matters is whether anyone is
      actually doing that, and this does not appear to be the case.
      
      There are no emulators widely available, and so boot testing Itanium is
      generally infeasible for ordinary contributors. GCC still supports IA-64
      but its compile farm [3] no longer has any IA-64 machines. GLIBC would
      like to get rid of IA-64 [4] too because it would permit some overdue
      code cleanups. In summary, the benefits to the ecosystem of having IA-64
      be part of it are mostly theoretical, whereas the maintenance overhead
      of keeping it supported is real.
      
      So let's rip off the band aid, and remove the IA-64 arch code entirely.
      This follows the timeline proposed by the Debian/ia64 maintainer [5],
      which removes support in a controlled manner, leaving IA-64 in a known
      good state in the most recent LTS release. Other projects will follow
      once the kernel support is removed.
      
      [0] https://lore.kernel.org/all/CAMj1kXFCMh_578jniKpUtx_j8ByHnt=s7S+yQ+vGbKt9ud7+kQ@mail.gmail.com/
      [1] https://lore.kernel.org/all/0075883c-7c51-00f5-2c2d-5119c1820410@web.de/
      [2] https://gridcf.org/gct-docs/latest/index.html
      [3] https://cfarm.tetaneutral.net/machines/list/
      [4] https://lore.kernel.org/all/87bkiilpc4.fsf@mid.deneb.enyo.de/
      [5] https://lore.kernel.org/all/ff58a3e76e5102c94bb5946d99187b358def688a.camel@physik.fu-berlin.de/
      
      
      
      Acked-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      cf8e8658
  8. Aug 21, 2023
  9. Aug 18, 2023
    • Eric DeVolder's avatar
      kexec: consolidate kexec and crash options into kernel/Kconfig.kexec · 89cde455
      Eric DeVolder authored
      Patch series "refactor Kconfig to consolidate KEXEC and CRASH options", v6.
      
      The Kconfig is refactored to consolidate KEXEC and CRASH options from
      various arch/<arch>/Kconfig files into new file kernel/Kconfig.kexec.
      
      The Kconfig.kexec is now a submenu titled "Kexec and crash features"
      located under "General Setup".
      
      The following options are impacted:
      
       - KEXEC
       - KEXEC_FILE
       - KEXEC_SIG
       - KEXEC_SIG_FORCE
       - KEXEC_IMAGE_VERIFY_SIG
       - KEXEC_BZIMAGE_VERIFY_SIG
       - KEXEC_JUMP
       - CRASH_DUMP
      
      Over time, these options have been copied between Kconfig files and
      are very similar to one another, but with slight differences.
      
      The following architectures are impacted by the refactor (because of
      use of one or more KEXEC/CRASH options):
      
       - arm
       - arm64
       - ia64
       - loongarch
       - m68k
       - mips
       - parisc
       - powerpc
       - riscv
       - s390
       - sh
       - x86 
      
      More information:
      
      In the patch series "crash: Kernel handling of CPU and memory hot
      un/plug"
      
       https://lore.kernel.org/lkml/20230503224145.7405-1-eric.devolder@oracle.com/
      
      the new kernel feature introduces the config option CRASH_HOTPLUG.
      
      In reviewing, Thomas Gleixner requested that the new config option
      not be placed in x86 Kconfig. Rather the option needs a generic/common
      home. To Thomas' point, the KEXEC and CRASH options have largely been
      duplicated in the various arch/<arch>/Kconfig files, with minor
      differences. This kind of proliferation is to be avoid/stopped.
      
       https://lore.kernel.org/lkml/875y91yv63.ffs@tglx/
      
      To that end, I have refactored the arch Kconfigs so as to consolidate
      the various KEXEC and CRASH options. Generally speaking, this work has
      the following themes:
      
      - KEXEC and CRASH options are moved into new file kernel/Kconfig.kexec
        - These items from arch/Kconfig:
            CRASH_CORE KEXEC_CORE KEXEC_ELF HAVE_IMA_KEXEC
        - These items from arch/x86/Kconfig form the common options:
            KEXEC KEXEC_FILE KEXEC_SIG KEXEC_SIG_FORCE
            KEXEC_BZIMAGE_VERIFY_SIG KEXEC_JUMP CRASH_DUMP
        - These items from arch/arm64/Kconfig form the common options:
            KEXEC_IMAGE_VERIFY_SIG
        - The crash hotplug series appends CRASH_HOTPLUG to Kconfig.kexec
      - The Kconfig.kexec is now a submenu titled "Kexec and crash features"
        and is now listed in "General Setup" submenu from init/Kconfig.
      - To control the common options, each has a new ARCH_SUPPORTS_<option>
        option. These gateway options determine whether the common options
        options are valid for the architecture.
      - To account for the slight differences in the original architecture
        coding of the common options, each now has a corresponding
        ARCH_SELECTS_<option> which are used to elicit the same side effects
        as the original arch/<arch>/Kconfig files for KEXEC and CRASH options.
      
      An example, 'make menuconfig' illustrating the submenu:
      
        > General setup > Kexec and crash features
        [*] Enable kexec system call
        [*] Enable kexec file based system call
        [*]   Verify kernel signature during kexec_file_load() syscall
        [ ]     Require a valid signature in kexec_file_load() syscall
        [ ]     Enable bzImage signature verification support
        [*] kexec jump
        [*] kernel crash dumps
        [*]   Update the crash elfcorehdr on system configuration changes
      
      In the process of consolidating the common options, I encountered
      slight differences in the coding of these options in several of the
      architectures. As a result, I settled on the following solution:
      
      - Each of the common options has a 'depends on ARCH_SUPPORTS_<option>'
        statement. For example, the KEXEC_FILE option has a 'depends on
        ARCH_SUPPORTS_KEXEC_FILE' statement.
      
        This approach is needed on all common options so as to prevent
        options from appearing for architectures which previously did
        not allow/enable them. For example, arm supports KEXEC but not
        KEXEC_FILE. The arch/arm/Kconfig does not provide
        ARCH_SUPPORTS_KEXEC_FILE and so KEXEC_FILE and related options
        are not available to arm.
      
      - The boolean ARCH_SUPPORTS_<option> in effect allows the arch to
        determine when the feature is allowed.  Archs which don't have the
        feature simply do not provide the corresponding ARCH_SUPPORTS_<option>.
        For each arch, where there previously were KEXEC and/or CRASH
        options, these have been replaced with the corresponding boolean
        ARCH_SUPPORTS_<option>, and an appropriate def_bool statement.
      
        For example, if the arch supports KEXEC_FILE, then the
        ARCH_SUPPORTS_KEXEC_FILE simply has a 'def_bool y'. This permits
        the KEXEC_FILE option to be available.
      
        If the arch has a 'depends on' statement in its original coding
        of the option, then that expression becomes part of the def_bool
        expression. For example, arm64 had:
      
        config KEXEC
          depends on PM_SLEEP_SMP
      
        and in this solution, this converts to:
      
        config ARCH_SUPPORTS_KEXEC
          def_bool PM_SLEEP_SMP
      
      
      - In order to account for the architecture differences in the
        coding for the common options, the ARCH_SELECTS_<option> in the
        arch/<arch>/Kconfig is used. This option has a 'depends on
        <option>' statement to couple it to the main option, and from
        there can insert the differences from the common option and the
        arch original coding of that option.
      
        For example, a few archs enable CRYPTO and CRYTPO_SHA256 for
        KEXEC_FILE. These require a ARCH_SELECTS_KEXEC_FILE and
        'select CRYPTO' and 'select CRYPTO_SHA256' statements.
      
      Illustrating the option relationships:
      
      For each of the common KEXEC and CRASH options:
       ARCH_SUPPORTS_<option> <- <option> <- ARCH_SELECTS_<option>
      
       <option>                   # in Kconfig.kexec
       ARCH_SUPPORTS_<option>     # in arch/<arch>/Kconfig, as needed
       ARCH_SELECTS_<option>      # in arch/<arch>/Kconfig, as needed
      
      
      For example, KEXEC:
       ARCH_SUPPORTS_KEXEC <- KEXEC <- ARCH_SELECTS_KEXEC
      
       KEXEC                      # in Kconfig.kexec
       ARCH_SUPPORTS_KEXEC        # in arch/<arch>/Kconfig, as needed
       ARCH_SELECTS_KEXEC         # in arch/<arch>/Kconfig, as needed
      
      
      To summarize, the ARCH_SUPPORTS_<option> permits the <option> to be
      enabled, and the ARCH_SELECTS_<option> handles side effects (ie.
      select statements).
      
      Examples:
      A few examples to show the new strategy in action:
      
      ===== x86 (minus the help section) =====
      Original:
       config KEXEC
          bool "kexec system call"
          select KEXEC_CORE
      
       config KEXEC_FILE
          bool "kexec file based system call"
          select KEXEC_CORE
          select HAVE_IMA_KEXEC if IMA
          depends on X86_64
          depends on CRYPTO=y
          depends on CRYPTO_SHA256=y
      
       config ARCH_HAS_KEXEC_PURGATORY
          def_bool KEXEC_FILE
      
       config KEXEC_SIG
          bool "Verify kernel signature during kexec_file_load() syscall"
          depends on KEXEC_FILE
      
       config KEXEC_SIG_FORCE
          bool "Require a valid signature in kexec_file_load() syscall"
          depends on KEXEC_SIG
      
       config KEXEC_BZIMAGE_VERIFY_SIG
          bool "Enable bzImage signature verification support"
          depends on KEXEC_SIG
          depends on SIGNED_PE_FILE_VERIFICATION
          select SYSTEM_TRUSTED_KEYRING
      
       config CRASH_DUMP
          bool "kernel crash dumps"
          depends on X86_64 || (X86_32 && HIGHMEM)
      
       config KEXEC_JUMP
          bool "kexec jump"
          depends on KEXEC && HIBERNATION
          help
      
      becomes...
      New:
      config ARCH_SUPPORTS_KEXEC
          def_bool y
      
      config ARCH_SUPPORTS_KEXEC_FILE
          def_bool X86_64 && CRYPTO && CRYPTO_SHA256
      
      config ARCH_SELECTS_KEXEC_FILE
          def_bool y
          depends on KEXEC_FILE
          select HAVE_IMA_KEXEC if IMA
      
      config ARCH_SUPPORTS_KEXEC_PURGATORY
          def_bool KEXEC_FILE
      
      config ARCH_SUPPORTS_KEXEC_SIG
          def_bool y
      
      config ARCH_SUPPORTS_KEXEC_SIG_FORCE
          def_bool y
      
      config ARCH_SUPPORTS_KEXEC_BZIMAGE_VERIFY_SIG
          def_bool y
      
      config ARCH_SUPPORTS_KEXEC_JUMP
          def_bool y
      
      config ARCH_SUPPORTS_CRASH_DUMP
          def_bool X86_64 || (X86_32 && HIGHMEM)
      
      
      ===== powerpc (minus the help section) =====
      Original:
       config KEXEC
          bool "kexec system call"
          depends on PPC_BOOK3S || PPC_E500 || (44x && !SMP)
          select KEXEC_CORE
      
       config KEXEC_FILE
          bool "kexec file based system call"
          select KEXEC_CORE
          select HAVE_IMA_KEXEC if IMA
          select KEXEC_ELF
          depends on PPC64
          depends on CRYPTO=y
          depends on CRYPTO_SHA256=y
      
       config ARCH_HAS_KEXEC_PURGATORY
          def_bool KEXEC_FILE
      
       config CRASH_DUMP
          bool "Build a dump capture kernel"
          depends on PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP)
          select RELOCATABLE if PPC64 || 44x || PPC_85xx
      
      becomes...
      New:
      config ARCH_SUPPORTS_KEXEC
          def_bool PPC_BOOK3S || PPC_E500 || (44x && !SMP)
      
      config ARCH_SUPPORTS_KEXEC_FILE
          def_bool PPC64 && CRYPTO=y && CRYPTO_SHA256=y
      
      config ARCH_SUPPORTS_KEXEC_PURGATORY
          def_bool KEXEC_FILE
      
      config ARCH_SELECTS_KEXEC_FILE
          def_bool y
          depends on KEXEC_FILE
          select KEXEC_ELF
          select HAVE_IMA_KEXEC if IMA
      
      config ARCH_SUPPORTS_CRASH_DUMP
          def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP)
      
      config ARCH_SELECTS_CRASH_DUMP
          def_bool y
          depends on CRASH_DUMP
          select RELOCATABLE if PPC64 || 44x || PPC_85xx
      
      
      Testing Approach and Results
      
      There are 388 config files in the arch/<arch>/configs directories.
      For each of these config files, a .config is generated both before and
      after this Kconfig series, and checked for equivalence. This approach
      allows for a rather rapid check of all architectures and a wide
      variety of configs wrt/ KEXEC and CRASH, and avoids requiring
      compiling for all architectures and running kernels and run-time
      testing.
      
      For each config file, the olddefconfig, allnoconfig and allyesconfig
      targets are utilized. In testing the randconfig has revealed problems
      as well, but is not used in the before and after equivalence check
      since one can not generate the "same" .config for before and after,
      even if using the same KCONFIG_SEED since the option list is
      different.
      
      As such, the following script steps compare the before and after
      of 'make olddefconfig'. The new symbols introduced by this series
      are filtered out, but otherwise the config files are PASS only if
      they were equivalent, and FAIL otherwise.
      
      The script performs the test by doing the following:
      
       # Obtain the "golden" .config output for given config file
       # Reset test sandbox
       git checkout master
       git branch -D test_Kconfig
       git checkout -B test_Kconfig master
       make distclean
       # Write out updated config
       cp -f <config file> .config
       make ARCH=<arch> olddefconfig
       # Track each item in .config, LHSB is "golden"
       scoreboard .config 
      
       # Obtain the "changed" .config output for given config file
       # Reset test sandbox
       make distclean
       # Apply this Kconfig series
       git am <this Kconfig series>
       # Write out updated config
       cp -f <config file> .config
       make ARCH=<arch> olddefconfig
       # Track each item in .config, RHSB is "changed"
       scoreboard .config 
      
       # Determine test result
       # Filter-out new symbols introduced by this series
       # Filter-out symbol=n which not in either scoreboard
       # Compare LHSB "golden" and RHSB "changed" scoreboards and issue PASS/FAIL
      
      The script was instrumental during the refactoring of Kconfig as it
      continually revealed problems. The end result being that the solution
      presented in this series passes all configs as checked by the script,
      with the following exceptions:
      
      - arch/ia64/configs/zx1_config with olddefconfig
        This config file has:
        # CONFIG_KEXEC is not set
        CONFIG_CRASH_DUMP=y
        and this refactor now couples KEXEC to CRASH_DUMP, so it is not
        possible to enable CRASH_DUMP without KEXEC.
      
      - arch/sh/configs/* with allyesconfig
        The arch/sh/Kconfig codes CRASH_DUMP as dependent upon BROKEN_ON_MMU
        (which clearly is not meant to be set). This symbol is not provided
        but with the allyesconfig it is set to yes which enables CRASH_DUMP.
        But KEXEC is coded as dependent upon MMU, and is set to no in
        arch/sh/mm/Kconfig, so KEXEC is not enabled.
        This refactor now couples KEXEC to CRASH_DUMP, so it is not
        possible to enable CRASH_DUMP without KEXEC.
      
      While the above exceptions are not equivalent to their original,
      the config file produced is valid (and in fact better wrt/ CRASH_DUMP
      handling).
      
      
      This patch (of 14)
      
      The config options for kexec and crash features are consolidated
      into new file kernel/Kconfig.kexec. Under the "General Setup" submenu
      is a new submenu "Kexec and crash handling". All the kexec and
      crash options that were once in the arch-dependent submenu "Processor
      type and features" are now consolidated in the new submenu.
      
      The following options are impacted:
      
       - KEXEC
       - KEXEC_FILE
       - KEXEC_SIG
       - KEXEC_SIG_FORCE
       - KEXEC_BZIMAGE_VERIFY_SIG
       - KEXEC_JUMP
       - CRASH_DUMP
      
      The three main options are KEXEC, KEXEC_FILE and CRASH_DUMP.
      
      Architectures specify support of certain KEXEC and CRASH features with
      similarly named new ARCH_SUPPORTS_<option> config options.
      
      Architectures can utilize the new ARCH_SELECTS_<option> config
      options to specify additional components when <option> is enabled.
      
      To summarize, the ARCH_SUPPORTS_<option> permits the <option> to be
      enabled, and the ARCH_SELECTS_<option> handles side effects (ie.
      select statements).
      
      Link: https://lkml.kernel.org/r/20230712161545.87870-1-eric.devolder@oracle.com
      Link: https://lkml.kernel.org/r/20230712161545.87870-2-eric.devolder@oracle.com
      
      
      Signed-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Cc. "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com> # for x86
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Juerg Haefliger <juerg.haefliger@canonical.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Marc Aurèle La France <tsi@tuyoix.net>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Cc: Sami Tolvanen <samitolvanen@google.com>
      Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
      Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: WANG Xuerui <kernel@xen0n.name>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xin Li <xin3.li@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zhen Lei <thunder.leizhen@huawei.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      89cde455
    • Kefeng Wang's avatar
      mm: remove arguments of show_mem() · 527ed4f7
      Kefeng Wang authored
      All callers of show_mem() pass 0 and NULL, so we can remove the two
      arguments by directly calling __show_mem(0, NULL, MAX_NR_ZONES - 1) in
      show_mem().
      
      Link: https://lkml.kernel.org/r/20230630062253.189440-1-wangkefeng.wang@huawei.com
      
      
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      527ed4f7
  10. Aug 15, 2023
    • Loic Poulain's avatar
      init: Add support for rootwait timeout parameter · 45071e1c
      Loic Poulain authored
      
      Add an optional timeout arg to 'rootwait' as the maximum time in
      seconds to wait for the root device to show up before attempting
      forced mount of the root filesystem.
      
      Use case:
      In case of device mapper usage for the rootfs (e.g. root=/dev/dm-0),
      if the mapper is not able to create the virtual block for any reason
      (wrong arguments, bad dm-verity signature, etc), the `rootwait` param
      causes the kernel to wait forever. It may however be desirable to only
      wait for a given time and then panic (force mount) to cause device reset.
      This gives the bootloader a chance to detect the problem and to take some
      measures, such as marking the booted partition as bad (for A/B case) or
      entering a recovery mode.
      
      In success case, mounting happens as soon as the root device is ready,
      unlike the existing 'rootdelay' parameter which performs an unconditional
      pause.
      
      Signed-off-by: default avatarLoic Poulain <loic.poulain@linaro.org>
      Message-Id: <20230813082349.513386-1-loic.poulain@linaro.org>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      45071e1c
  11. Aug 08, 2023
    • Tejun Heo's avatar
      workqueue: Initialize unbound CPU pods later in the boot · 2930155b
      Tejun Heo authored
      
      During boot, to initialize unbound CPU pods, wq_pod_init() was called from
      workqueue_init(). This is early enough for NUMA nodes to be set up but
      before SMP is brought up and CPU topology information is populated.
      
      Workqueue is in the process of improving CPU locality for unbound workqueues
      and will need access to topology information during pod init. This adds a
      new init function workqueue_init_topology() which is called after CPU
      topology information is available and replaces wq_pod_init().
      
      As unbound CPU pods are now initialized after workqueues are activated, we
      need to revisit the workqueues to apply the pod configuration. Workqueues
      which are created before workqueue_init_topology() are set up so that they
      always use the default worker pool. After pods are set up in
      workqueue_init_topology(), wq_update_pod() is called on all existing
      workqueues to update the pool associations accordingly.
      
      Note that wq_update_pod_attrs_buf allocation is moved to
      workqueue_init_early(). This isn't necessary right now but enables further
      generalization of pod handling in the future.
      
      This patch changes the initialization sequence but the end result should be
      the same.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      2930155b
  12. Aug 02, 2023
  13. Jun 25, 2023
  14. Jun 16, 2023
  15. Jun 10, 2023
    • Angus Chen's avatar
      init: add bdev fs printk if mount_block_root failed · 6aee6723
      Angus Chen authored
      Booting with the QEMU command line:
      "qemu-system-x86_64 -append root=/dev/vda rootfstype=ext4 ..."
      will panic if ext4 is not builtin and a request to load the ext4 module
      fails.
      
      [    1.729006] VFS: Cannot open root device "vda" or unknown-block(253,0): error -19
      [    1.730603] Please append a correct "root=" boot option; here are the available partitions:
      [    1.732323] fd00          256000 vda
      [    1.732329]  driver: virtio_blk
      [    1.734194] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(253,0)
      [    1.734771] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-rc2+ #53
      [    1.735194] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
      [    1.735772] Call Trace:
      [    1.735950]  <TASK>
      [    1.736113]  dump_stack_lvl+0x32/0x50
      [    1.736367]  panic+0x108/0x310
      [    1.736570]  mount_block_root+0x161/0x310
      [    1.736849]  ? rdinit_setup+0x40/0x40
      [    1.737088]  prepare_namespace+0x10c/0x180
      [    1.737393]  kernel_init_freeable+0x354/0x450
      [    1.737707]  ? rest_init+0xd0/0xd0
      [    1.737945]  kernel_init+0x16/0x130
      [    1.738196]  ret_from_fork+0x1f/0x30
      
      As a hint, print all the bdev fstypes which are available.
      
      [akpm@linux-foundation.org: fix spelling in printk message]
      Link: https://lkml.kernel.org/r/20230518035321.1672-1-angus.chen@jaguarmicro.com
      
      
      Signed-off-by: default avatarAngus Chen <angus.chen@jaguarmicro.com>
      Acked-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6aee6723
    • Arnd Bergmann's avatar
      init: move cifs_root_data() prototype into linux/mount.h · 73648e6f
      Arnd Bergmann authored
      cifs_root_data() is defined in cifs and called from early init code, but
      lacks a global prototype:
      
      fs/cifs/cifsroot.c:83:12: error: no previous prototype for 'cifs_root_data'
      
      Move the declaration from do_mounts.c into an appropriate header.
      
      Link: https://lkml.kernel.org/r/20230517131102.934196-13-arnd@kernel.org
      
      
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      73648e6f
    • Arnd Bergmann's avatar
      init: consolidate prototypes in linux/init.h · ad1a4830
      Arnd Bergmann authored
      The init/main.c file contains some extern declarations for functions
      defined in architecture code, and it defines some other functions that are
      called from architecture code with a custom prototype.  Both of those
      result in warnings with 'make W=1':
      
      init/calibrate.c:261:37: error: no previous prototype for 'calibrate_delay_is_known' [-Werror=missing-prototypes]
      init/main.c:790:20: error: no previous prototype for 'mem_encrypt_init' [-Werror=missing-prototypes]
      init/main.c:792:20: error: no previous prototype for 'poking_init' [-Werror=missing-prototypes]
      arch/arm64/kernel/irq.c:122:13: error: no previous prototype for 'init_IRQ' [-Werror=missing-prototypes]
      arch/arm64/kernel/time.c:55:13: error: no previous prototype for 'time_init' [-Werror=missing-prototypes]
      arch/x86/kernel/process.c:935:13: error: no previous prototype for 'arch_post_acpi_subsys_init' [-Werror=missing-prototypes]
      init/calibrate.c:261:37: error: no previous prototype for 'calibrate_delay_is_known' [-Werror=missing-prototypes]
      kernel/fork.c:991:20: error: no previous prototype for 'arch_task_cache_init' [-Werror=missing-prototypes]
      
      Add prototypes for all of these in include/linux/init.h or another
      appropriate header, and remove the duplicate declarations from
      architecture specific code.
      
      [sfr@canb.auug.org.au: declare time_init_early()]
        Link: https://lkml.kernel.org/r/20230519124311.5167221c@canb.auug.org.au
      Link: https://lkml.kernel.org/r/20230517131102.934196-12-arnd@kernel.org
      
      
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ad1a4830
  16. Jun 09, 2023
    • Nhat Pham's avatar
      cachestat: implement cachestat syscall · cf264e13
      Nhat Pham authored
      There is currently no good way to query the page cache state of large file
      sets and directory trees.  There is mincore(), but it scales poorly: the
      kernel writes out a lot of bitmap data that userspace has to aggregate,
      when the user really doesn not care about per-page information in that
      case.  The user also needs to mmap and unmap each file as it goes along,
      which can be quite slow as well.
      
      Some use cases where this information could come in handy:
        * Allowing database to decide whether to perform an index scan or
          direct table queries based on the in-memory cache state of the
          index.
        * Visibility into the writeback algorithm, for performance issues
          diagnostic.
        * Workload-aware writeback pacing: estimating IO fulfilled by page
          cache (and IO to be done) within a range of a file, allowing for
          more frequent syncing when and where there is IO capacity, and
          batching when there is not.
        * Computing memory usage of large files/directory trees, analogous to
          the du tool for disk usage.
      
      More information about these use cases could be found in the following
      thread:
      
      https://lore.kernel.org/lkml/20230315170934.GA97793@cmpxchg.org/
      
      This patch implements a new syscall that queries cache state of a file and
      summarizes the number of cached pages, number of dirty pages, number of
      pages marked for writeback, number of (recently) evicted pages, etc.  in a
      given range.  Currently, the syscall is only wired in for x86
      architecture.
      
      NAME
          cachestat - query the page cache statistics of a file.
      
      SYNOPSIS
          #include <sys/mman.h>
      
          struct cachestat_range {
              __u64 off;
              __u64 len;
          };
      
          struct cachestat {
              __u64 nr_cache;
              __u64 nr_dirty;
              __u64 nr_writeback;
              __u64 nr_evicted;
              __u64 nr_recently_evicted;
          };
      
          int cachestat(unsigned int fd, struct cachestat_range *cstat_range,
              struct cachestat *cstat, unsigned int flags);
      
      DESCRIPTION
          cachestat() queries the number of cached pages, number of dirty
          pages, number of pages marked for writeback, number of evicted
          pages, number of recently evicted pages, in the bytes range given by
          `off` and `len`.
      
          An evicted page is a page that is previously in the page cache but
          has been evicted since. A page is recently evicted if its last
          eviction was recent enough that its reentry to the cache would
          indicate that it is actively being used by the system, and that
          there is memory pressure on the system.
      
          These values are returned in a cachestat struct, whose address is
          given by the `cstat` argument.
      
          The `off` and `len` arguments must be non-negative integers. If
          `len` > 0, the queried range is [`off`, `off` + `len`]. If `len` ==
          0, we will query in the range from `off` to the end of the file.
      
          The `flags` argument is unused for now, but is included for future
          extensibility. User should pass 0 (i.e no flag specified).
      
          Currently, hugetlbfs is not supported.
      
          Because the status of a page can change after cachestat() checks it
          but before it returns to the application, the returned values may
          contain stale information.
      
      RETURN VALUE
          On success, cachestat returns 0. On error, -1 is returned, and errno
          is set to indicate the error.
      
      ERRORS
          EFAULT cstat or cstat_args points to an invalid address.
      
          EINVAL invalid flags.
      
          EBADF  invalid file descriptor.
      
          EOPNOTSUPP file descriptor is of a hugetlbfs file
      
      [nphamcs@gmail.com: replace rounddown logic with the existing helper]
        Link: https://lkml.kernel.org/r/20230504022044.3675469-1-nphamcs@gmail.com
      Link: https://lkml.kernel.org/r/20230503013608.2431726-3-nphamcs@gmail.com
      
      
      Signed-off-by: default avatarNhat Pham <nphamcs@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Brian Foster <bfoster@redhat.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cf264e13
  17. Jun 05, 2023
  18. May 16, 2023
  19. Apr 23, 2023
    • Linus Torvalds's avatar
      gcc: disable '-Warray-bounds' for gcc-13 too · 0da6e5fd
      Linus Torvalds authored
      
      We started disabling '-Warray-bounds' for gcc-12 originally on s390,
      because it resulted in some warnings that weren't realistically fixable
      (commit 8b202ee2: "s390: disable -Warray-bounds").
      
      That s390-specific issue was then found to be less common elsewhere, but
      generic (see f0be87c4: "gcc-12: disable '-Warray-bounds' universally
      for now"), and then later expanded the version check was expanded to
      gcc-11 (5a41237a: "gcc: disable -Warray-bounds for gcc-11 too").
      
      And it turns out that I was much too optimistic in thinking that it's
      all going to go away, and here we are with gcc-13 showing all the same
      issues.  So instead of expanding this one version at a time, let's just
      disable it for gcc-11+, and put an end limit to it only when we actually
      find a solution.
      
      Yes, I'm sure some of this is because the kernel just does odd things
      (like our "container_of()" use, but also knowingly playing games with
      things like linker tables and array layouts).
      
      And yes, some of the warnings are likely signs of real bugs, but when
      there are hundreds of false positives, that doesn't really help.
      
      Oh well.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0da6e5fd
  20. Apr 16, 2023
  21. Apr 14, 2023
Loading