Skip to content
Snippets Groups Projects
  1. May 14, 2024
    • Masahiro Yamada's avatar
      Makefile: remove redundant tool coverage variables · 7f7f6f7a
      Masahiro Yamada authored
      
      Now Kbuild provides reasonable defaults for objtool, sanitizers, and
      profilers.
      
      Remove redundant variables.
      
      Note:
      
      This commit changes the coverage for some objects:
      
        - include arch/mips/vdso/vdso-image.o into UBSAN, GCOV, KCOV
        - include arch/sparc/vdso/vdso-image-*.o into UBSAN
        - include arch/sparc/vdso/vma.o into UBSAN
        - include arch/x86/entry/vdso/extable.o into KASAN, KCSAN, UBSAN, GCOV, KCOV
        - include arch/x86/entry/vdso/vdso-image-*.o into KASAN, KCSAN, UBSAN, GCOV, KCOV
        - include arch/x86/entry/vdso/vdso32-setup.o into KASAN, KCSAN, UBSAN, GCOV, KCOV
        - include arch/x86/entry/vdso/vma.o into GCOV, KCOV
        - include arch/x86/um/vdso/vma.o into KASAN, GCOV, KCOV
      
      I believe these are positive effects because all of them are kernel
      space objects.
      
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Tested-by: default avatarRoberto Sassu <roberto.sassu@huawei.com>
      7f7f6f7a
  2. May 09, 2024
  3. May 06, 2024
  4. Apr 30, 2024
  5. Apr 26, 2024
  6. Apr 24, 2024
  7. Apr 16, 2024
  8. Apr 12, 2024
  9. Apr 11, 2024
  10. Apr 09, 2024
  11. Apr 05, 2024
  12. Mar 31, 2024
  13. Mar 26, 2024
  14. Mar 25, 2024
    • Qais Yousef's avatar
      sched/fair: Check if a task has a fitting CPU when updating misfit · 22d56074
      Qais Yousef authored
      
      If a misfit task is affined to a subset of the possible CPUs, we need to
      verify that one of these CPUs can fit it. Otherwise the load balancer
      code will continuously trigger needlessly leading the balance_interval
      to increase in return and eventually end up with a situation where real
      imbalances take a long time to address because of this impossible
      imbalance situation.
      
      This can happen in Android world where it's common for background tasks
      to be restricted to little cores.
      
      Similarly if we can't fit the biggest core, triggering misfit is
      pointless as it is the best we can ever get on this system.
      
      To be able to detect that; we use asym_cap_list to iterate through
      capacities in the system to see if the task is able to run at a higher
      capacity level based on its p->cpus_ptr. We do that when the affinity
      change, a fair task is forked, or when a task switched to fair policy.
      We store the max_allowed_capacity in task_struct to allow for cheap
      comparison in the fast path.
      
      Improve check_misfit_status() function by removing redundant checks.
      misfit_task_load will be 0 if the task can't move to a bigger CPU. And
      nohz_balancer_kick() already checks for cpu_check_capacity() before
      calling check_misfit_status().
      
      Test:
      =====
      
      Add
      
      	trace_printk("balance_interval = %lu\n", interval)
      
      in get_sd_balance_interval().
      
      run
      	if [ "$MASK" != "0" ]; then
      		adb shell "taskset -a $MASK cat /dev/zero > /dev/null"
      	fi
      	sleep 10
      	// parse ftrace buffer counting the occurrence of each valaue
      
      Where MASK is either:
      
      	* 0: no busy task running
      	* 1: busy task is pinned to 1 cpu; handled today to not cause
      	  misfit
      	* f: busy task pinned to little cores, simulates busy background
      	  task, demonstrates the problem to be fixed
      
      Results:
      ========
      
      Note how occurrence of balance_interval = 128 overshoots for MASK = f.
      
      BEFORE
      ------
      
      	MASK=0
      
      		   1 balance_interval = 175
      		 120 balance_interval = 128
      		 846 balance_interval = 64
      		  55 balance_interval = 63
      		 215 balance_interval = 32
      		   2 balance_interval = 31
      		   2 balance_interval = 16
      		   4 balance_interval = 8
      		1870 balance_interval = 4
      		  65 balance_interval = 2
      
      	MASK=1
      
      		  27 balance_interval = 175
      		  37 balance_interval = 127
      		 840 balance_interval = 64
      		 167 balance_interval = 63
      		 449 balance_interval = 32
      		  84 balance_interval = 31
      		 304 balance_interval = 16
      		1156 balance_interval = 8
      		2781 balance_interval = 4
      		 428 balance_interval = 2
      
      	MASK=f
      
      		   1 balance_interval = 175
      		1328 balance_interval = 128
      		  44 balance_interval = 64
      		 101 balance_interval = 63
      		  25 balance_interval = 32
      		   5 balance_interval = 31
      		  23 balance_interval = 16
      		  23 balance_interval = 8
      		4306 balance_interval = 4
      		 177 balance_interval = 2
      
      AFTER
      -----
      
      Note how the high values almost disappear for all MASK values. The
      system has background tasks that could trigger the problem without
      simulate it even with MASK=0.
      
      	MASK=0
      
      		 103 balance_interval = 63
      		  19 balance_interval = 31
      		 194 balance_interval = 8
      		4827 balance_interval = 4
      		 179 balance_interval = 2
      
      	MASK=1
      
      		 131 balance_interval = 63
      		   1 balance_interval = 31
      		  87 balance_interval = 8
      		3600 balance_interval = 4
      		   7 balance_interval = 2
      
      	MASK=f
      
      		   8 balance_interval = 127
      		 182 balance_interval = 63
      		   3 balance_interval = 31
      		   9 balance_interval = 16
      		 415 balance_interval = 8
      		3415 balance_interval = 4
      		  21 balance_interval = 2
      
      Signed-off-by: default avatarQais Yousef <qyousef@layalina.io>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Link: https://lore.kernel.org/r/20240324004552.999936-3-qyousef@layalina.io
      22d56074
  15. Mar 22, 2024
  16. Mar 05, 2024
  17. Mar 04, 2024
  18. Mar 01, 2024
    • Christian Brauner's avatar
      pidfd: add pidfs · cb12fd8e
      Christian Brauner authored
      This moves pidfds from the anonymous inode infrastructure to a tiny
      pseudo filesystem. This has been on my todo for quite a while as it will
      unblock further work that we weren't able to do simply because of the
      very justified limitations of anonymous inodes. Moving pidfds to a tiny
      pseudo filesystem allows:
      
      * statx() on pidfds becomes useful for the first time.
      * pidfds can be compared simply via statx() and then comparing inode
        numbers.
      * pidfds have unique inode numbers for the system lifetime.
      * struct pid is now stashed in inode->i_private instead of
        file->private_data. This means it is now possible to introduce
        concepts that operate on a process once all file descriptors have been
        closed. A concrete example is kill-on-last-close.
      * file->private_data is freed up for per-file options for pidfds.
      * Each struct pid will refer to a different inode but the same struct
        pid will refer to the same inode if it's opened multiple times. In
        contrast to now where each struct pid refers to the same inode. Even
        if we were to move to anon_inode_create_getfile() which creates new
        inodes we'd still be associating the same struct pid with multiple
        different inodes.
      
      The tiny pseudo filesystem is not visible anywhere in userspace exactly
      like e.g., pipefs and sockfs. There's no lookup, there's no complex
      inode operations, nothing. Dentries and inodes are always deleted when
      the last pidfd is closed.
      
      We allocate a new inode for each struct pid and we reuse that inode for
      all pidfds. We use iget_locked() to find that inode again based on the
      inode number which isn't recycled. We allocate a new dentry for each
      pidfd that uses the same inode. That is similar to anonymous inodes
      which reuse the same inode for thousands of dentries. For pidfds we're
      talking way less than that. There usually won't be a lot of concurrent
      openers of the same struct pid. They can probably often be counted on
      two hands. I know that systemd does use separate pidfd for the same
      struct pid for various complex process tracking issues. So I think with
      that things actually become way simpler. Especially because we don't
      have to care about lookup. Dentries and inodes continue to be always
      deleted.
      
      The code is entirely optional and fairly small. If it's not selected we
      fallback to anonymous inodes. Heavily inspired by nsfs which uses a
      similar stashing mechanism just for namespaces.
      
      Link: https://lore.kernel.org/r/20240213-vfs-pidfd_fs-v1-2-f863f58cfce1@kernel.org
      
      
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      cb12fd8e
  19. Feb 27, 2024
  20. Feb 25, 2024
  21. Feb 24, 2024
    • Baoquan He's avatar
      crash: split crash dumping code out from kexec_core.c · 02aff848
      Baoquan He authored
      Currently, KEXEC_CORE select CRASH_CORE automatically because crash codes
      need be built in to avoid compiling error when building kexec code even
      though the crash dumping functionality is not enabled. E.g
      --------------------
      CONFIG_CRASH_CORE=y
      CONFIG_KEXEC_CORE=y
      CONFIG_KEXEC=y
      CONFIG_KEXEC_FILE=y
      ---------------------
      
      After splitting out crashkernel reservation code and vmcoreinfo exporting
      code, there's only crash related code left in kernel/crash_core.c. Now
      move crash related codes from kexec_core.c to crash_core.c and only build it
      in when CONFIG_CRASH_DUMP=y.
      
      And also wrap up crash codes inside CONFIG_CRASH_DUMP ifdeffery scope,
      or replace inappropriate CONFIG_KEXEC_CORE ifdef with CONFIG_CRASH_DUMP
      ifdef in generic kernel files.
      
      With these changes, crash_core codes are abstracted from kexec codes and
      can be disabled at all if only kexec reboot feature is wanted.
      
      Link: https://lkml.kernel.org/r/20240124051254.67105-5-bhe@redhat.com
      
      
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Pingfan Liu <piliu@redhat.com>
      Cc: Klara Modin <klarasmodin@gmail.com>
      Cc: Michael Kelley <mhklinux@outlook.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Yang Li <yang.lee@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      02aff848
  22. Feb 22, 2024
    • Geert Uytterhoeven's avatar
      init: remove obsolete arch_call_rest_init() wrapper · ac4db926
      Geert Uytterhoeven authored
      Since commit 3570ee04 ("s390/smp: keep the original lowcore for
      CPU 0"), there is no longer any architecture that needs to override
      arch_call_rest_init().
      
      Remove the weak wrapper around rest_init(), call rest_init() directly, and
      make rest_init() static.
      
      Link: https://lkml.kernel.org/r/aa10868bfb176eef4abb8bb4a710b85330792694.1706106183.git.geert@linux-m68k.org
      
      
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Josh Poimboeuf <jpoimboe@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ac4db926
    • Christophe Leroy's avatar
      arm64, powerpc, riscv, s390, x86: ptdump: refactor CONFIG_DEBUG_WX · a5e8131a
      Christophe Leroy authored
      All architectures using the core ptdump functionality also implement
      CONFIG_DEBUG_WX, and they all do it more or less the same way, with a
      function called debug_checkwx() that is called by mark_rodata_ro(), which
      is a substitute to ptdump_check_wx() when CONFIG_DEBUG_WX is set and a
      no-op otherwise.
      
      Refactor by centrally defining debug_checkwx() in linux/ptdump.h and call
      debug_checkwx() immediately after calling mark_rodata_ro() instead of
      calling it at the end of every mark_rodata_ro().
      
      On x86_32, mark_rodata_ro() first checks __supported_pte_mask has _PAGE_NX
      before calling debug_checkwx().  Now the check is inside the callee
      ptdump_walk_pgd_level_checkwx().
      
      On powerpc_64, mark_rodata_ro() bails out early before calling
      ptdump_check_wx() when the MMU doesn't have KERNEL_RO feature.  The check
      is now also done in ptdump_check_wx() as it is called outside
      mark_rodata_ro().
      
      Link: https://lkml.kernel.org/r/a59b102d7964261d31ead0316a9f18628e4e7a8e.1706610398.git.christophe.leroy@csgroup.eu
      
      
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Reviewed-by: default avatarAlexandre Ghiti <alexghiti@rivosinc.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Phong Tran <tranmanphong@gmail.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a5e8131a
  23. Feb 20, 2024
  24. Feb 16, 2024
    • Tejun Heo's avatar
      workqueue, irq_work: Build fix for !CONFIG_IRQ_WORK · fd0a68a2
      Tejun Heo authored
      
      2f34d733 ("workqueue: Fix queue_work_on() with BH workqueues") added
      irq_work usage to workqueue; however, it turns out irq_work is actually
      optional and the change breaks build on configuration which doesn't have
      CONFIG_IRQ_WORK enabled.
      
      Fix build by making workqueue use irq_work only when CONFIG_SMP and enabling
      CONFIG_IRQ_WORK when CONFIG_SMP is set. It's reasonable to argue that it may
      be better to just always enable it. However, this still saves a small bit of
      memory for tiny UP configs and also the least amount of change, so, for now,
      let's keep it conditional.
      
      Verified to do the right thing for x86_64 allnoconfig and defconfig, and
      aarch64 allnoconfig, allnoconfig + prink disable (SMP but nothing selects
      IRQ_WORK) and a modified aarch64 Kconfig where !SMP and nothing selects
      IRQ_WORK.
      
      v2: `depends on SMP` leads to Kconfig warnings when CONFIG_IRQ_WORK is
          selected by something else when !CONFIG_SMP. Use `def_bool y if SMP`
          instead.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Tested-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      Fixes: 2f34d733 ("workqueue: Fix queue_work_on() with BH workqueues")
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      fd0a68a2
  25. Feb 15, 2024
  26. Feb 09, 2024
  27. Feb 08, 2024
Loading