Skip to content
Snippets Groups Projects
  1. Jul 14, 2023
  2. Jul 13, 2023
  3. Jul 12, 2023
    • Mohamed Khalfella's avatar
      tracing/histograms: Add histograms to hist_vars if they have referenced variables · 6018b585
      Mohamed Khalfella authored
      Hist triggers can have referenced variables without having direct
      variables fields. This can be the case if referenced variables are added
      for trigger actions. In this case the newly added references will not
      have field variables. Not taking such referenced variables into
      consideration can result in a bug where it would be possible to remove
      hist trigger with variables being refenced. This will result in a bug
      that is easily reproducable like so
      
      $ cd /sys/kernel/tracing
      $ echo 'synthetic_sys_enter char[] comm; long id' >> synthetic_events
      $ echo 'hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger
      $ echo 'hist:keys=common_pid.execname,id.syscall:onmatch(raw_syscalls.sys_enter).synthetic_sys_enter($comm, id)' >> events/raw_syscalls/sys_enter/trigger
      $ echo '!hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger
      
      [  100.263533] ==================================================================
      [  100.264634] BUG: KASAN: slab-use-after-free in resolve_var_refs+0xc7/0x180
      [  100.265520] Read of size 8 at addr ffff88810375d0f0 by task bash/439
      [  100.266320]
      [  100.266533] CPU: 2 PID: 439 Comm: bash Not tainted 6.5.0-rc1 #4
      [  100.267277] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
      [  100.268561] Call Trace:
      [  100.268902]  <TASK>
      [  100.269189]  dump_stack_lvl+0x4c/0x70
      [  100.269680]  print_report+0xc5/0x600
      [  100.270165]  ? resolve_var_refs+0xc7/0x180
      [  100.270697]  ? kasan_complete_mode_report_info+0x80/0x1f0
      [  100.271389]  ? resolve_var_refs+0xc7/0x180
      [  100.271913]  kasan_report+0xbd/0x100
      [  100.272380]  ? resolve_var_refs+0xc7/0x180
      [  100.272920]  __asan_load8+0x71/0xa0
      [  100.273377]  resolve_var_refs+0xc7/0x180
      [  100.273888]  event_hist_trigger+0x749/0x860
      [  100.274505]  ? kasan_save_stack+0x2a/0x50
      [  100.275024]  ? kasan_set_track+0x29/0x40
      [  100.275536]  ? __pfx_event_hist_trigger+0x10/0x10
      [  100.276138]  ? ksys_write+0xd1/0x170
      [  100.276607]  ? do_syscall_64+0x3c/0x90
      [  100.277099]  ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      [  100.277771]  ? destroy_hist_data+0x446/0x470
      [  100.278324]  ? event_hist_trigger_parse+0xa6c/0x3860
      [  100.278962]  ? __pfx_event_hist_trigger_parse+0x10/0x10
      [  100.279627]  ? __kasan_check_write+0x18/0x20
      [  100.280177]  ? mutex_unlock+0x85/0xd0
      [  100.280660]  ? __pfx_mutex_unlock+0x10/0x10
      [  100.281200]  ? kfree+0x7b/0x120
      [  100.281619]  ? ____kasan_slab_free+0x15d/0x1d0
      [  100.282197]  ? event_trigger_write+0xac/0x100
      [  100.282764]  ? __kasan_slab_free+0x16/0x20
      [  100.283293]  ? __kmem_cache_free+0x153/0x2f0
      [  100.283844]  ? sched_mm_cid_remote_clear+0xb1/0x250
      [  100.284550]  ? __pfx_sched_mm_cid_remote_clear+0x10/0x10
      [  100.285221]  ? event_trigger_write+0xbc/0x100
      [  100.285781]  ? __kasan_check_read+0x15/0x20
      [  100.286321]  ? __bitmap_weight+0x66/0xa0
      [  100.286833]  ? _find_next_bit+0x46/0xe0
      [  100.287334]  ? task_mm_cid_work+0x37f/0x450
      [  100.287872]  event_triggers_call+0x84/0x150
      [  100.288408]  trace_event_buffer_commit+0x339/0x430
      [  100.289073]  ? ring_buffer_event_data+0x3f/0x60
      [  100.292189]  trace_event_raw_event_sys_enter+0x8b/0xe0
      [  100.295434]  syscall_trace_enter.constprop.0+0x18f/0x1b0
      [  100.298653]  syscall_enter_from_user_mode+0x32/0x40
      [  100.301808]  do_syscall_64+0x1a/0x90
      [  100.304748]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      [  100.307775] RIP: 0033:0x7f686c75c1cb
      [  100.310617] Code: 73 01 c3 48 8b 0d 65 3c 10 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 21 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 3c 10 00 f7 d8 64 89 01 48
      [  100.317847] RSP: 002b:00007ffc60137a38 EFLAGS: 00000246 ORIG_RAX: 0000000000000021
      [  100.321200] RAX: ffffffffffffffda RBX: 000055f566469ea0 RCX: 00007f686c75c1cb
      [  100.324631] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000000000000000a
      [  100.328104] RBP: 00007ffc60137ac0 R08: 00007f686c818460 R09: 000000000000000a
      [  100.331509] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
      [  100.334992] R13: 0000000000000007 R14: 000000000000000a R15: 0000000000000007
      [  100.338381]  </TASK>
      
      We hit the bug because when second hist trigger has was created
      has_hist_vars() returned false because hist trigger did not have
      variables. As a result of that save_hist_vars() was not called to add
      the trigger to trace_array->hist_vars. Later on when we attempted to
      remove the first histogram find_any_var_ref() failed to detect it is
      being used because it did not find the second trigger in hist_vars list.
      
      With this change we wait until trigger actions are created so we can take
      into consideration if hist trigger has variable references. Also, now we
      check the return value of save_hist_vars() and fail trigger creation if
      save_hist_vars() fails.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20230712223021.636335-1-mkhalfella@purestorage.com
      
      
      
      Cc: stable@vger.kernel.org
      Fixes: 067fe038 ("tracing: Add variable reference handling to hist triggers")
      Signed-off-by: default avatarMohamed Khalfella <mkhalfella@purestorage.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      6018b585
    • Yonghong Song's avatar
      kallsyms: strip LTO-only suffixes from promoted global functions · 8cc32a9b
      Yonghong Song authored
      Commit 6eb4bd92 ("kallsyms: strip LTO suffixes from static functions")
      stripped all function/variable suffixes started with '.' regardless
      of whether those suffixes are generated at LTO mode or not. In fact,
      as far as I know, in LTO mode, when a static function/variable is
      promoted to the global scope, '.llvm.<...>' suffix is added.
      
      The existing mechanism breaks live patch for a LTO kernel even if
      no <symbol>.llvm.<...> symbols are involved. For example, for the following
      kernel symbols:
        $ grep bpf_verifier_vlog /proc/kallsyms
        ffffffff81549f60 t bpf_verifier_vlog
        ffffffff8268b430 d bpf_verifier_vlog._entry
        ffffffff8282a958 d bpf_verifier_vlog._entry_ptr
        ffffffff82e12a1f d bpf_verifier_vlog.__already_done
      'bpf_verifier_vlog' is a static function. '_entry', '_entry_ptr' and
      '__already_done' are static variables used inside 'bpf_verifier_vlog',
      so llvm promotes them to file-level static with prefix 'bpf_verifier_vlog.'.
      Note that the func-level to file-level static function promotion also
      happens without LTO.
      
      Given a symbol name 'bpf_verifier_vlog', with LTO kernel, current mechanism will
      return 4 symbols to live patch subsystem which current live patching
      subsystem cannot handle it. With non-LTO kernel, only one symbol
      is returned.
      
      In [1], we have a lengthy discussion, the suggestion is to separate two
      cases:
        (1). new symbols with suffix which are generated regardless of whether
             LTO is enabled or not, and
        (2). new symbols with suffix generated only when LTO is enabled.
      
      The cleanup_symbol_name() should only remove suffixes for case (2).
      Case (1) should not be changed so it can work uniformly with or without LTO.
      
      This patch removed LTO-only suffix '.llvm.<...>' so live patching and
      tracing should work the same way for non-LTO kernel.
      The cleanup_symbol_name() in scripts/kallsyms.c is also changed to have the same
      filtering pattern so both kernel and kallsyms tool have the same
      expectation on the order of symbols.
      
       [1] https://lore.kernel.org/live-patching/20230615170048.2382735-1-song@kernel.org/T/#u
      
      
      
      Fixes: 6eb4bd92 ("kallsyms: strip LTO suffixes from static functions")
      Reported-by: default avatarSong Liu <song@kernel.org>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Reviewed-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Acked-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230628181926.4102448-1-yhs@fb.com
      
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      8cc32a9b
    • Steven Rostedt (Google)'s avatar
      tracing: Stop FORTIFY_SOURCE complaining about stack trace caller · bec3c25c
      Steven Rostedt (Google) authored
      The stack_trace event is an event created by the tracing subsystem to
      store stack traces. It originally just contained a hard coded array of 8
      words to hold the stack, and a "size" to know how many entries are there.
      This is exported to user space as:
      
      name: kernel_stack
      ID: 4
      format:
      	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
      	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
      	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
      	field:int common_pid;	offset:4;	size:4;	signed:1;
      
      	field:int size;	offset:8;	size:4;	signed:1;
      	field:unsigned long caller[8];	offset:16;	size:64;	signed:0;
      
      print fmt: "\t=> %ps\n\t=> %ps\n\t=> %ps\n" "\t=> %ps\n\t=> %ps\n\t=> %ps\n" "\t=> %ps\n\t=> %ps\n",i
       (void *)REC->caller[0], (void *)REC->caller[1], (void *)REC->caller[2],
       (void *)REC->caller[3], (void *)REC->caller[4], (void *)REC->caller[5],
       (void *)REC->caller[6], (void *)REC->caller[7]
      
      Where the user space tracers could parse the stack. The library was
      updated for this specific event to only look at the size, and not the
      array. But some older users still look at the array (note, the older code
      still checks to make sure the array fits inside the event that it read.
      That is, if only 4 words were saved, the parser would not read the fifth
      word because it will see that it was outside of the event size).
      
      This event was changed a while ago to be more dynamic, and would save a
      full stack even if it was greater than 8 words. It does this by simply
      allocating more ring buffer to hold the extra words. Then it copies in the
      stack via:
      
      	memcpy(&entry->caller, fstack->calls, size);
      
      As the entry is struct stack_entry, that is created by a macro to both
      create the structure and export this to user space, it still had the caller
      field of entry defined as: unsigned long caller[8].
      
      When the stack is greater than 8, the FORTIFY_SOURCE code notices that the
      amount being copied is greater than the source array and complains about
      it. It has no idea that the source is pointing to the ring buffer with the
      required allocation.
      
      To hide this from the FORTIFY_SOURCE logic, pointer arithmetic is used:
      
      	ptr = ring_buffer_event_data(event);
      	entry = ptr;
      	ptr += offsetof(typeof(*entry), caller);
      	memcpy(ptr, fstack->calls, size);
      
      Link: https://lore.kernel.org/all/20230612160748.4082850-1-svens@linux.ibm.com/
      Link: https://lore.kernel.org/linux-trace-kernel/20230712105235.5fc441aa@gandalf.local.home
      
      
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Reported-by: default avatarSven Schnelle <svens@linux.ibm.com>
      Tested-by: default avatarSven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      bec3c25c
    • Zheng Yejian's avatar
      ftrace: Fix possible warning on checking all pages used in ftrace_process_locs() · 26efd79c
      Zheng Yejian authored
      As comments in ftrace_process_locs(), there may be NULL pointers in
      mcount_loc section:
       > Some architecture linkers will pad between
       > the different mcount_loc sections of different
       > object files to satisfy alignments.
       > Skip any NULL pointers.
      
      After commit 20e5227e ("ftrace: allow NULL pointers in mcount_loc"),
      NULL pointers will be accounted when allocating ftrace pages but skipped
      before adding into ftrace pages, this may result in some pages not being
      used. Then after commit 706c81f8 ("ftrace: Remove extra helper
      functions"), warning may occur at:
        WARN_ON(pg->next);
      
      To fix it, only warn for case that no pointers skipped but pages not used
      up, then free those unused pages after releasing ftrace_lock.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20230712060452.3175675-1-zhengyejian1@huawei.com
      
      
      
      Cc: stable@vger.kernel.org
      Fixes: 706c81f8 ("ftrace: Remove extra helper functions")
      Suggested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      26efd79c
    • Zheng Yejian's avatar
      ring-buffer: Fix deadloop issue on reading trace_pipe · 7e42907f
      Zheng Yejian authored
      Soft lockup occurs when reading file 'trace_pipe':
      
        watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [cat:4488]
        [...]
        RIP: 0010:ring_buffer_empty_cpu+0xed/0x170
        RSP: 0018:ffff88810dd6fc48 EFLAGS: 00000246
        RAX: 0000000000000000 RBX: 0000000000000246 RCX: ffffffff93d1aaeb
        RDX: ffff88810a280040 RSI: 0000000000000008 RDI: ffff88811164b218
        RBP: ffff88811164b218 R08: 0000000000000000 R09: ffff88815156600f
        R10: ffffed102a2acc01 R11: 0000000000000001 R12: 0000000051651901
        R13: 0000000000000000 R14: ffff888115e49500 R15: 0000000000000000
        [...]
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f8d853c2000 CR3: 000000010dcd8000 CR4: 00000000000006e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         __find_next_entry+0x1a8/0x4b0
         ? peek_next_entry+0x250/0x250
         ? down_write+0xa5/0x120
         ? down_write_killable+0x130/0x130
         trace_find_next_entry_inc+0x3b/0x1d0
         tracing_read_pipe+0x423/0xae0
         ? tracing_splice_read_pipe+0xcb0/0xcb0
         vfs_read+0x16b/0x490
         ksys_read+0x105/0x210
         ? __ia32_sys_pwrite64+0x200/0x200
         ? switch_fpu_return+0x108/0x220
         do_syscall_64+0x33/0x40
         entry_SYSCALL_64_after_hwframe+0x61/0xc6
      
      Through the vmcore, I found it's because in tracing_read_pipe(),
      ring_buffer_empty_cpu() found some buffer is not empty but then it
      cannot read anything due to "rb_num_of_entries() == 0" always true,
      Then it infinitely loop the procedure due to user buffer not been
      filled, see following code path:
      
        tracing_read_pipe() {
          ... ...
          waitagain:
            tracing_wait_pipe() // 1. find non-empty buffer here
            trace_find_next_entry_inc()  // 2. loop here try to find an entry
              __find_next_entry()
                ring_buffer_empty_cpu();  // 3. find non-empty buffer
                peek_next_entry()  // 4. but peek always return NULL
                  ring_buffer_peek()
                    rb_buffer_peek()
                      rb_get_reader_page()
                        // 5. because rb_num_of_entries() == 0 always true here
                        //    then return NULL
            // 6. user buffer not been filled so goto 'waitgain'
            //    and eventually leads to an deadloop in kernel!!!
        }
      
      By some analyzing, I found that when resetting ringbuffer, the 'entries'
      of its pages are not all cleared (see rb_reset_cpu()). Then when reducing
      the ringbuffer, and if some reduced pages exist dirty 'entries' data, they
      will be added into 'cpu_buffer->overrun' (see rb_remove_pages()), which
      cause wrong 'overrun' count and eventually cause the deadloop issue.
      
      To fix it, we need to clear every pages in rb_reset_cpu().
      
      Link: https://lore.kernel.org/linux-trace-kernel/20230708225144.3785600-1-zhengyejian1@huawei.com
      
      
      
      Cc: stable@vger.kernel.org
      Fixes: a5fb8331 ("ring-buffer: Fix uninitialized read_stamp")
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      7e42907f
    • Arnd Bergmann's avatar
      tracing: arm64: Avoid missing-prototype warnings · 7d8b31b7
      Arnd Bergmann authored
      These are all tracing W=1 warnings in arm64 allmodconfig about missing
      prototypes:
      
      kernel/trace/trace_kprobe_selftest.c:7:5: error: no previous prototype for 'kprobe_trace_selftest_target' [-Werror=missing-pro
      totypes]
      kernel/trace/ftrace.c:329:5: error: no previous prototype for '__register_ftrace_function' [-Werror=missing-prototypes]
      kernel/trace/ftrace.c:372:5: error: no previous prototype for '__unregister_ftrace_function' [-Werror=missing-prototypes]
      kernel/trace/ftrace.c:4130:15: error: no previous prototype for 'arch_ftrace_match_adjust' [-Werror=missing-prototypes]
      kernel/trace/fgraph.c:243:15: error: no previous prototype for 'ftrace_return_to_handler' [-Werror=missing-prototypes]
      kernel/trace/fgraph.c:358:6: error: no previous prototype for 'ftrace_graph_sleep_time_control' [-Werror=missing-prototypes]
      arch/arm64/kernel/ftrace.c:460:6: error: no previous prototype for 'prepare_ftrace_return' [-Werror=missing-prototypes]
      arch/arm64/kernel/ptrace.c:2172:5: error: no previous prototype for 'syscall_trace_enter' [-Werror=missing-prototypes]
      arch/arm64/kernel/ptrace.c:2195:6: error: no previous prototype for 'syscall_trace_exit' [-Werror=missing-prototypes]
      
      Move the declarations to an appropriate header where they can be seen
      by the caller and callee, and make sure the headers are included where
      needed.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20230517125215.930689-1-arnd@kernel.org
      
      
      
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Florent Revest <revest@chromium.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      [ Fixed ftrace_return_to_handler() to handle CONFIG_HAVE_FUNCTION_GRAPH_RETVAL case ]
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      7d8b31b7
    • Pu Lehui's avatar
      bpf: cpumap: Fix memory leak in cpu_map_update_elem · 43690164
      Pu Lehui authored
      
      Syzkaller reported a memory leak as follows:
      
      BUG: memory leak
      unreferenced object 0xff110001198ef748 (size 192):
        comm "syz-executor.3", pid 17672, jiffies 4298118891 (age 9.906s)
        hex dump (first 32 bytes):
          00 00 00 00 4a 19 00 00 80 ad e3 e4 fe ff c0 00  ....J...........
          00 b2 d3 0c 01 00 11 ff 28 f5 8e 19 01 00 11 ff  ........(.......
        backtrace:
          [<ffffffffadd28087>] __cpu_map_entry_alloc+0xf7/0xb00
          [<ffffffffadd28d8e>] cpu_map_update_elem+0x2fe/0x3d0
          [<ffffffffadc6d0fd>] bpf_map_update_value.isra.0+0x2bd/0x520
          [<ffffffffadc7349b>] map_update_elem+0x4cb/0x720
          [<ffffffffadc7d983>] __se_sys_bpf+0x8c3/0xb90
          [<ffffffffb029cc80>] do_syscall_64+0x30/0x40
          [<ffffffffb0400099>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
      
      BUG: memory leak
      unreferenced object 0xff110001198ef528 (size 192):
        comm "syz-executor.3", pid 17672, jiffies 4298118891 (age 9.906s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffffadd281f0>] __cpu_map_entry_alloc+0x260/0xb00
          [<ffffffffadd28d8e>] cpu_map_update_elem+0x2fe/0x3d0
          [<ffffffffadc6d0fd>] bpf_map_update_value.isra.0+0x2bd/0x520
          [<ffffffffadc7349b>] map_update_elem+0x4cb/0x720
          [<ffffffffadc7d983>] __se_sys_bpf+0x8c3/0xb90
          [<ffffffffb029cc80>] do_syscall_64+0x30/0x40
          [<ffffffffb0400099>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
      
      BUG: memory leak
      unreferenced object 0xff1100010fd93d68 (size 8):
        comm "syz-executor.3", pid 17672, jiffies 4298118891 (age 9.906s)
        hex dump (first 8 bytes):
          00 00 00 00 00 00 00 00                          ........
        backtrace:
          [<ffffffffade5db3e>] kvmalloc_node+0x11e/0x170
          [<ffffffffadd28280>] __cpu_map_entry_alloc+0x2f0/0xb00
          [<ffffffffadd28d8e>] cpu_map_update_elem+0x2fe/0x3d0
          [<ffffffffadc6d0fd>] bpf_map_update_value.isra.0+0x2bd/0x520
          [<ffffffffadc7349b>] map_update_elem+0x4cb/0x720
          [<ffffffffadc7d983>] __se_sys_bpf+0x8c3/0xb90
          [<ffffffffb029cc80>] do_syscall_64+0x30/0x40
          [<ffffffffb0400099>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
      
      In the cpu_map_update_elem flow, when kthread_stop is called before
      calling the threadfn of rcpu->kthread, since the KTHREAD_SHOULD_STOP bit
      of kthread has been set by kthread_stop, the threadfn of rcpu->kthread
      will never be executed, and rcpu->refcnt will never be 0, which will
      lead to the allocated rcpu, rcpu->queue and rcpu->queue->queue cannot be
      released.
      
      Calling kthread_stop before executing kthread's threadfn will return
      -EINTR. We can complete the release of memory resources in this state.
      
      Fixes: 6710e112 ("bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP")
      Signed-off-by: default avatarPu Lehui <pulehui@huawei.com>
      Acked-by: default avatarJesper Dangaard Brouer <hawk@kernel.org>
      Acked-by: default avatarHou Tao <houtao1@huawei.com>
      Link: https://lore.kernel.org/r/20230711115848.2701559-1-pulehui@huaweicloud.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      43690164
  4. Jul 11, 2023
  5. Jul 10, 2023
  6. Jul 08, 2023
  7. Jul 06, 2023
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Fix max stack depth check for async callbacks · 5415ccd5
      Kumar Kartikeya Dwivedi authored
      
      The check_max_stack_depth pass happens after the verifier's symbolic
      execution, and attempts to walk the call graph of the BPF program,
      ensuring that the stack usage stays within bounds for all possible call
      chains. There are two cases to consider: bpf_pseudo_func and
      bpf_pseudo_call. In the former case, the callback pointer is loaded into
      a register, and is assumed that it is passed to some helper later which
      calls it (however there is no way to be sure), but the check remains
      conservative and accounts the stack usage anyway. For this particular
      case, asynchronous callbacks are skipped as they execute asynchronously
      when their corresponding event fires.
      
      The case of bpf_pseudo_call is simpler and we know that the call is
      definitely made, hence the stack depth of the subprog is accounted for.
      
      However, the current check still skips an asynchronous callback even if
      a bpf_pseudo_call was made for it. This is erroneous, as it will miss
      accounting for the stack usage of the asynchronous callback, which can
      be used to breach the maximum stack depth limit.
      
      Fix this by only skipping asynchronous callbacks when the instruction is
      not a pseudo call to the subprog.
      
      Fixes: 7ddc80a4 ("bpf: Teach stack depth check about async callbacks.")
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20230705144730.235802-2-memxor@gmail.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5415ccd5
  8. Jul 05, 2023
  9. Jul 04, 2023
    • Linus Torvalds's avatar
      module: fix init_module_from_file() error handling · f1962207
      Linus Torvalds authored
      Vegard Nossum pointed out two different problems with the error handling
      in init_module_from_file():
      
       (a) the idempotent loading code didn't clean up properly in some error
           cases, leaving the on-stack 'struct idempotent' element still in
           the hash table
      
       (b) failure to read the module file would nonsensically update the
           'invalid_kread_bytes' stat counter with the error value
      
      The first error is quite nasty, in that it can then cause subsequent
      idempotent loads of that same file to access stale stack contents of the
      previous failure.  The case may not happen in any normal situation
      (explaining all the "Tested-by's on the original change), and requires
      admin privileges, but syzkaller triggers random bad behavior as a
      result:
      
          BUG: soft lockup in sys_finit_module
          BUG: unable to handle kernel paging request in init_module_from_file
          general protection fault in init_module_from_file
          INFO: task hung in init_module_from_file
          KASAN: out-of-bounds Read in init_module_from_file
          KASAN: slab-out-of-bounds Read in init_module_from_file
          ...
      
      The second error is fairly benign and just leads to nonsensical stats
      (and has been around since the debug stats were added).
      
      Vegard also provided a patch for the idempotent loading issue, but I'd
      rather re-organize the code and make it more legible using another level
      of helper functions than add the usual "goto out" error handling.
      
      Link: https://lore.kernel.org/lkml/20230704100852.23452-1-vegard.nossum@oracle.com/
      
      
      Fixes: 9b9879fc ("modules: catch concurrent module loads, treat them as idempotent")
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Reported-by: default avatarHarshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
      Reported-by: default avatar <syzbot+9c2bdc9d24e4a7abe741@syzkaller.appspotmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f1962207
  10. Jul 03, 2023
  11. Jul 01, 2023
  12. Jun 30, 2023
    • Kees Cook's avatar
      pid: Replace struct pid 1-element array with flex-array · b69f0aeb
      Kees Cook authored
      For pid namespaces, struct pid uses a dynamically sized array member,
      "numbers".  This was implemented using the ancient 1-element fake
      flexible array, which has been deprecated for decades.
      
      Replace it with a C99 flexible array, refactor the array size
      calculations to use struct_size(), and address elements via indexes.
      Note that the static initializer (which defines a single element) works
      as-is, and requires no special handling.
      
      Without this, CONFIG_UBSAN_BOUNDS (and potentially
      CONFIG_FORTIFY_SOURCE) will trigger bounds checks:
      
        https://lore.kernel.org/lkml/20230517-bushaltestelle-super-e223978c1ba6@brauner
      
      
      
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Xu <jeffxu@google.com>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Daniel Verkamp <dverkamp@chromium.org>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Jeff Xu <jeffxu@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Reported-by: default avatar <syzbot+ac3b41786a2d0565b6d5@syzkaller.appspotmail.com>
      [brauner: dropped unrelated changes and remove 0 with NULL cast]
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b69f0aeb
    • Douglas Anderson's avatar
      kdb: Handle LF in the command parser · 1ed05558
      Douglas Anderson authored
      
      The main kdb command parser only handles CR (ASCII 13 AKA '\r') today,
      but not LF (ASCII 10 AKA '\n'). That means that the kdb command parser
      can handle terminals that send just CR or that send CR+LF but can't
      handle terminals that send just LF.
      
      The fact that kdb didn't handle LF in the command parser tripped up a
      tool I tried to use with it. Specifically, I was trying to send a
      command to my device to resume it from kdb using a ChromeOS tool like:
        dut-control cpu_uart_cmd:"g"
      That tool only terminates lines with LF, not CR+LF.
      
      Arguably the ChromeOS tool should be fixed. After all, officially kdb
      seems to be designed such that CR+LF is the official line ending
      transmitted over the wire and that internally a line ending is just
      '\n' (LF). Some evidence:
      * uart_poll_put_char(), which is used by kdb, notices a '\n' and
        converts it to '\r\n'.
      * kdb functions specifically use '\r' to get a carriage return without
        a newline. You can see this in the pager where kdb will write a '\r'
        and then write over the pager prompt.
      
      However, all that being said there's no real harm in accepting LF as a
      command terminator in the kdb parser and doing so seems like it would
      improve compatibility. After this, I'd expect that things would work
      OK-ish with a remote terminal that used any of CR, CR+LF, or LF as a
      line ending. Someone using CR as a line ending might get some ugliness
      where kdb wasn't able to overwrite the last line, but basic commands
      would work. Someone using just LF as a line ending would probably also
      work OK.
      
      A few other notes:
      - It can be noted that "bash" running on an "agetty" handles LF as a
        line termination with no complaints.
      - Historically, kdb's "pager" actually handled either CR or LF fine. A
        very quick inspection would make one think that kdb's pager actually
        could have paged down two lines instead of one for anyone using
        CR+LF, but this is generally avoided because of kdb_input_flush().
      - Conceivably one could argue that some of this special case logic
        belongs in uart_poll_get_char() since uart_poll_put_char() handles
        the '\n' => '\r\n' conversion. I would argue that perhaps we should
        eventually do the opposite and move the '\n' => '\r\n' out of
        uart_poll_put_char(). Having that conversion at such a low level
        could interfere if we ever want to transfer binary data. In
        addition, if we truly made uart_poll_get_char() the inverse of
        uart_poll_put_char() it would convert back to '\n' and (ironically)
        kdb's parser currently only looks for '\r' to find the end of a
        command.
      
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Link: https://lore.kernel.org/r/20230628125612.1.I5cc6c3d916195f5bcfdf5b75d823f2037707f5dc@changeid
      
      
      Signed-off-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      1ed05558
    • Andy Shevchenko's avatar
      irqdomain: Use return value of strreplace() · 67a4e1a3
      Andy Shevchenko authored
      
      Since strreplace() returns the pointer to the string itself, use it
      directly.
      
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20230628150251.17832-1-andriy.shevchenko@linux.intel.com
      67a4e1a3
  13. Jun 29, 2023
Loading