Skip to content
Snippets Groups Projects
  1. Nov 10, 2023
    • Stanislav Fomichev's avatar
      net: set SOCK_RCU_FREE before inserting socket into hashtable · 871019b2
      Stanislav Fomichev authored
      
      We've started to see the following kernel traces:
      
       WARNING: CPU: 83 PID: 0 at net/core/filter.c:6641 sk_lookup+0x1bd/0x1d0
      
       Call Trace:
        <IRQ>
        __bpf_skc_lookup+0x10d/0x120
        bpf_sk_lookup+0x48/0xd0
        bpf_sk_lookup_tcp+0x19/0x20
        bpf_prog_<redacted>+0x37c/0x16a3
        cls_bpf_classify+0x205/0x2e0
        tcf_classify+0x92/0x160
        __netif_receive_skb_core+0xe52/0xf10
        __netif_receive_skb_list_core+0x96/0x2b0
        napi_complete_done+0x7b5/0xb70
        <redacted>_poll+0x94/0xb0
        net_rx_action+0x163/0x1d70
        __do_softirq+0xdc/0x32e
        asm_call_irq_on_stack+0x12/0x20
        </IRQ>
        do_softirq_own_stack+0x36/0x50
        do_softirq+0x44/0x70
      
      __inet_hash can race with lockless (rcu) readers on the other cpus:
      
        __inet_hash
          __sk_nulls_add_node_rcu
          <- (bpf triggers here)
          sock_set_flag(SOCK_RCU_FREE)
      
      Let's move the SOCK_RCU_FREE part up a bit, before we are inserting
      the socket into hashtables. Note, that the race is really harmless;
      the bpf callers are handling this situation (where listener socket
      doesn't have SOCK_RCU_FREE set) correctly, so the only
      annoyance is a WARN_ONCE.
      
      More details from Eric regarding SOCK_RCU_FREE timeline:
      
      Commit 3b24d854 ("tcp/dccp: do not touch listener sk_refcnt under
      synflood") added SOCK_RCU_FREE. At that time, the precise location of
      sock_set_flag(sk, SOCK_RCU_FREE) did not matter, because the thread calling
      __inet_hash() owns a reference on sk. SOCK_RCU_FREE was only tested
      at dismantle time.
      
      Commit 6acc9b43 ("bpf: Add helper to retrieve socket in BPF")
      started checking SOCK_RCU_FREE _after_ the lookup to infer whether
      the refcount has been taken care of.
      
      Fixes: 6acc9b43 ("bpf: Add helper to retrieve socket in BPF")
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      871019b2
  2. Nov 09, 2023
    • Eric Dumazet's avatar
      net_sched: sch_fq: better validate TCA_FQ_WEIGHTS and TCA_FQ_PRIOMAP · f1a3b283
      Eric Dumazet authored
      
      syzbot was able to trigger the following report while providing
      too small TCA_FQ_WEIGHTS attribute [1]
      
      Fix is to use NLA_POLICY_EXACT_LEN() to ensure user space
      provided correct sizes.
      
      Apply the same fix to TCA_FQ_PRIOMAP.
      
      [1]
      BUG: KMSAN: uninit-value in fq_load_weights net/sched/sch_fq.c:960 [inline]
      BUG: KMSAN: uninit-value in fq_change+0x1348/0x2fe0 net/sched/sch_fq.c:1071
      fq_load_weights net/sched/sch_fq.c:960 [inline]
      fq_change+0x1348/0x2fe0 net/sched/sch_fq.c:1071
      fq_init+0x68e/0x780 net/sched/sch_fq.c:1159
      qdisc_create+0x12f3/0x1be0 net/sched/sch_api.c:1326
      tc_modify_qdisc+0x11ef/0x2c20
      rtnetlink_rcv_msg+0x16a6/0x1840 net/core/rtnetlink.c:6558
      netlink_rcv_skb+0x371/0x650 net/netlink/af_netlink.c:2545
      rtnetlink_rcv+0x34/0x40 net/core/rtnetlink.c:6576
      netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline]
      netlink_unicast+0xf47/0x1250 net/netlink/af_netlink.c:1368
      netlink_sendmsg+0x1238/0x13d0 net/netlink/af_netlink.c:1910
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg net/socket.c:745 [inline]
      ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2588
      ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2642
      __sys_sendmsg net/socket.c:2671 [inline]
      __do_sys_sendmsg net/socket.c:2680 [inline]
      __se_sys_sendmsg net/socket.c:2678 [inline]
      __x64_sys_sendmsg+0x307/0x490 net/socket.c:2678
      do_syscall_x64 arch/x86/entry/common.c:51 [inline]
      do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82
      entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Uninit was created at:
      slab_post_alloc_hook+0x129/0xa70 mm/slab.h:768
      slab_alloc_node mm/slub.c:3478 [inline]
      kmem_cache_alloc_node+0x5e9/0xb10 mm/slub.c:3523
      kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:560
      __alloc_skb+0x318/0x740 net/core/skbuff.c:651
      alloc_skb include/linux/skbuff.h:1286 [inline]
      netlink_alloc_large_skb net/netlink/af_netlink.c:1214 [inline]
      netlink_sendmsg+0xb34/0x13d0 net/netlink/af_netlink.c:1885
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg net/socket.c:745 [inline]
      ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2588
      ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2642
      __sys_sendmsg net/socket.c:2671 [inline]
      __do_sys_sendmsg net/socket.c:2680 [inline]
      __se_sys_sendmsg net/socket.c:2678 [inline]
      __x64_sys_sendmsg+0x307/0x490 net/socket.c:2678
      do_syscall_x64 arch/x86/entry/common.c:51 [inline]
      do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82
      entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      CPU: 1 PID: 5001 Comm: syz-executor300 Not tainted 6.6.0-syzkaller-12401-g8f6f76a6a29f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
      
      Fixes: 29f834aa ("net_sched: sch_fq: add 3 bands and WRR scheduling")
      Fixes: 49e7265f ("net_sched: sch_fq: add TCA_FQ_WEIGHTS attribute")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarJamal Hadi <Salim&lt;jhs@mojatatu.com>
      Link: https://lore.kernel.org/r/20231107160440.1992526-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f1a3b283
    • Jakub Kicinski's avatar
      net: kcm: fill in MODULE_DESCRIPTION() · 31356547
      Jakub Kicinski authored
      W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
      
      Link: https://lore.kernel.org/r/20231108020305.537293-1-kuba@kernel.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      31356547
    • Vlad Buslov's avatar
      net/sched: act_ct: Always fill offloading tuple iifidx · 9bc64bd0
      Vlad Buslov authored
      
      Referenced commit doesn't always set iifidx when offloading the flow to
      hardware. Fix the following cases:
      
      - nf_conn_act_ct_ext_fill() is called before extension is created with
      nf_conn_act_ct_ext_add() in tcf_ct_act(). This can cause rule offload with
      unspecified iifidx when connection is offloaded after only single
      original-direction packet has been processed by tc data path. Always fill
      the new nf_conn_act_ct_ext instance after creating it in
      nf_conn_act_ct_ext_add().
      
      - Offloading of unidirectional UDP NEW connections is now supported, but ct
      flow iifidx field is not updated when connection is promoted to
      bidirectional which can result reply-direction iifidx to be zero when
      refreshing the connection. Fill in the extension and update flow iifidx
      before calling flow_offload_refresh().
      
      Fixes: 9795ded7 ("net/sched: act_ct: Fill offloading tuple iifidx")
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Fixes: 6a9bad00 ("net/sched: act_ct: offload UDP NEW connections")
      Link: https://lore.kernel.org/r/20231103151410.764271-1-vladbu@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9bc64bd0
  3. Nov 08, 2023
    • Florian Westphal's avatar
      netfilter: nat: fix ipv6 nat redirect with mapped and scoped addresses · 80abbe8a
      Florian Westphal authored
      
      The ipv6 redirect target was derived from the ipv4 one, i.e. its
      identical to a 'dnat' with the first (primary) address assigned to the
      network interface.  The code has been moved around to make it usable
      from nf_tables too, but its still the same as it was back when this
      was added in 2012.
      
      IPv6, however, has different types of addresses, if the 'wrong' address
      comes first the redirection does not work.
      
      In Daniels case, the addresses are:
        inet6 ::ffff:192 ...
        inet6 2a01: ...
      
      ... so the function attempts to redirect to the mapped address.
      
      Add more checks before the address is deemed correct:
      1. If the packets' daddr is scoped, search for a scoped address too
      2. skip tentative addresses
      3. skip mapped addresses
      
      Use the first address that appears to match our needs.
      
      Reported-by: default avatarDaniel Huhardeaux <tech@tootai.net>
      Closes: https://lore.kernel.org/netfilter/71be06b8-6aa0-4cf9-9e0b-e2839b01b22f@tootai.net/
      
      
      Fixes: 115e23ac ("netfilter: ip6tables: add REDIRECT target")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      80abbe8a
    • Maciej Żenczykowski's avatar
      netfilter: xt_recent: fix (increase) ipv6 literal buffer length · 7b308feb
      Maciej Żenczykowski authored
      
      in6_pton() supports 'low-32-bit dot-decimal representation'
      (this is useful with DNS64/NAT64 networks for example):
      
        # echo +aaaa:bbbb:cccc:dddd:eeee:ffff:1.2.3.4 > /proc/self/net/xt_recent/DEFAULT
        # cat /proc/self/net/xt_recent/DEFAULT
        src=aaaa:bbbb:cccc:dddd:eeee:ffff:0102:0304 ttl: 0 last_seen: 9733848829 oldest_pkt: 1 9733848829
      
      but the provided buffer is too short:
      
        # echo +aaaa:bbbb:cccc:dddd:eeee:ffff:255.255.255.255 > /proc/self/net/xt_recent/DEFAULT
        -bash: echo: write error: Invalid argument
      
      Fixes: 079aa88f ("netfilter: xt_recent: IPv6 support")
      Signed-off-by: default avatarMaciej Żenczykowski <zenczykowski@gmail.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7b308feb
    • Florian Westphal's avatar
      ipvs: add missing module descriptions · 17cd01e4
      Florian Westphal authored
      
      W=1 builds warn on missing MODULE_DESCRIPTION, add them.
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      17cd01e4
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: remove catchall element in GC sync path · 93995bf4
      Pablo Neira Ayuso authored
      
      The expired catchall element is not deactivated and removed from GC sync
      path. This path holds mutex so just call nft_setelem_data_deactivate()
      and nft_setelem_catchall_remove() before queueing the GC work.
      
      Fixes: 4a9e12ea ("netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC")
      Reported-by: default avatarlonial con <kongln9170@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      93995bf4
    • Florian Westphal's avatar
      netfilter: add missing module descriptions · 94090b23
      Florian Westphal authored
      
      W=1 builds warn on missing MODULE_DESCRIPTION, add them.
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      94090b23
    • Shigeru Yoshida's avatar
      virtio/vsock: Fix uninit-value in virtio_transport_recv_pkt() · 34c4effa
      Shigeru Yoshida authored
      
      KMSAN reported the following uninit-value access issue:
      
      =====================================================
      BUG: KMSAN: uninit-value in virtio_transport_recv_pkt+0x1dfb/0x26a0 net/vmw_vsock/virtio_transport_common.c:1421
       virtio_transport_recv_pkt+0x1dfb/0x26a0 net/vmw_vsock/virtio_transport_common.c:1421
       vsock_loopback_work+0x3bb/0x5a0 net/vmw_vsock/vsock_loopback.c:120
       process_one_work kernel/workqueue.c:2630 [inline]
       process_scheduled_works+0xff6/0x1e60 kernel/workqueue.c:2703
       worker_thread+0xeca/0x14d0 kernel/workqueue.c:2784
       kthread+0x3cc/0x520 kernel/kthread.c:388
       ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
      
      Uninit was stored to memory at:
       virtio_transport_space_update net/vmw_vsock/virtio_transport_common.c:1274 [inline]
       virtio_transport_recv_pkt+0x1ee8/0x26a0 net/vmw_vsock/virtio_transport_common.c:1415
       vsock_loopback_work+0x3bb/0x5a0 net/vmw_vsock/vsock_loopback.c:120
       process_one_work kernel/workqueue.c:2630 [inline]
       process_scheduled_works+0xff6/0x1e60 kernel/workqueue.c:2703
       worker_thread+0xeca/0x14d0 kernel/workqueue.c:2784
       kthread+0x3cc/0x520 kernel/kthread.c:388
       ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
      
      Uninit was created at:
       slab_post_alloc_hook+0x105/0xad0 mm/slab.h:767
       slab_alloc_node mm/slub.c:3478 [inline]
       kmem_cache_alloc_node+0x5a2/0xaf0 mm/slub.c:3523
       kmalloc_reserve+0x13c/0x4a0 net/core/skbuff.c:559
       __alloc_skb+0x2fd/0x770 net/core/skbuff.c:650
       alloc_skb include/linux/skbuff.h:1286 [inline]
       virtio_vsock_alloc_skb include/linux/virtio_vsock.h:66 [inline]
       virtio_transport_alloc_skb+0x90/0x11e0 net/vmw_vsock/virtio_transport_common.c:58
       virtio_transport_reset_no_sock net/vmw_vsock/virtio_transport_common.c:957 [inline]
       virtio_transport_recv_pkt+0x1279/0x26a0 net/vmw_vsock/virtio_transport_common.c:1387
       vsock_loopback_work+0x3bb/0x5a0 net/vmw_vsock/vsock_loopback.c:120
       process_one_work kernel/workqueue.c:2630 [inline]
       process_scheduled_works+0xff6/0x1e60 kernel/workqueue.c:2703
       worker_thread+0xeca/0x14d0 kernel/workqueue.c:2784
       kthread+0x3cc/0x520 kernel/kthread.c:388
       ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
      
      CPU: 1 PID: 10664 Comm: kworker/1:5 Not tainted 6.6.0-rc3-00146-g9f3ebbef746f #3
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
      Workqueue: vsock-loopback vsock_loopback_work
      =====================================================
      
      The following simple reproducer can cause the issue described above:
      
      int main(void)
      {
        int sock;
        struct sockaddr_vm addr = {
          .svm_family = AF_VSOCK,
          .svm_cid = VMADDR_CID_ANY,
          .svm_port = 1234,
        };
      
        sock = socket(AF_VSOCK, SOCK_STREAM, 0);
        connect(sock, (struct sockaddr *)&addr, sizeof(addr));
        return 0;
      }
      
      This issue occurs because the `buf_alloc` and `fwd_cnt` fields of the
      `struct virtio_vsock_hdr` are not initialized when a new skb is allocated
      in `virtio_transport_init_hdr()`. This patch resolves the issue by
      initializing these fields during allocation.
      
      Fixes: 71dc9ec9 ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
      Reported-and-tested-by: default avatar <syzbot+0c8ce1da0ac31abbadcd@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=0c8ce1da0ac31abbadcd
      
      
      Signed-off-by: default avatarShigeru Yoshida <syoshida@redhat.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Link: https://lore.kernel.org/r/20231104150531.257952-1-syoshida@redhat.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      34c4effa
  4. Nov 07, 2023
  5. Nov 06, 2023
    • D. Wythe's avatar
      net/smc: put sk reference if close work was canceled · aa96fbd6
      D. Wythe authored
      
      Note that we always hold a reference to sock when attempting
      to submit close_work. Therefore, if we have successfully
      canceled close_work from pending, we MUST release that reference
      to avoid potential leaks.
      
      Fixes: 42bfba9e ("net/smc: immediate termination for SMCD link groups")
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Reviewed-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa96fbd6
    • D. Wythe's avatar
      net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc · c5bf605b
      D. Wythe authored
      
      This patch re-fix the issues mentioned by commit 22a825c5
      ("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()").
      
      Blocking sending message do solve the issues though, but it also
      prevents the peer to receive the final message. Besides, in logic,
      whether the sndbuf_desc is NULL or not have no impact on the processing
      of cdc message sending.
      
      Hence that, this patch allows the cdc message sending but to check the
      sndbuf_desc with care in smc_cdc_tx_handler().
      
      Fixes: 22a825c5 ("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()")
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Reviewed-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5bf605b
    • D. Wythe's avatar
      net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT · 5211c972
      D. Wythe authored
      
      Considering scenario:
      
      				smc_cdc_rx_handler
      __smc_release
      				sock_set_flag
      smc_close_active()
      sock_set_flag
      
      __set_bit(DEAD)			__set_bit(DONE)
      
      Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
      if the DEAD flag lost, the state SMC_CLOSED  will be never be reached
      in smc_close_passive_work:
      
      if (sock_flag(sk, SOCK_DEAD) &&
      	smc_close_sent_any_close(conn)) {
      	sk->sk_state = SMC_CLOSED;
      } else {
      	/* just shutdown, but not yet closed locally */
      	sk->sk_state = SMC_APPFINCLOSEWAIT;
      }
      
      Replace sock_set_flags or __set_bit to set_bit will fix this problem.
      Since set_bit is atomic.
      
      Fixes: b38d7324 ("smc: socket closing and linkgroup cleanup")
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Reviewed-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5211c972
    • Kuniyuki Iwashima's avatar
      tcp: Fix SYN option room calculation for TCP-AO. · 0a8e987d
      Kuniyuki Iwashima authored
      
      When building SYN packet in tcp_syn_options(), MSS, TS, WS, and
      SACKPERM are used without checking the remaining bytes in the
      options area.
      
      To keep that logic as is, we limit the TCP-AO MAC length in
      tcp_ao_parse_crypto().  Currently, the limit is calculated as below.
      
        MAX_TCP_OPTION_SPACE - TCPOLEN_TSTAMP_ALIGNED
                             - TCPOLEN_WSCALE_ALIGNED
                             - TCPOLEN_SACKPERM_ALIGNED
      
      This looks confusing as (1) we pack SACKPERM into the leading
      2-bytes of the aligned 12-bytes of TS and (2) TCPOLEN_MSS_ALIGNED
      is not used.  Fortunately, the calculated limit is not wrong as
      TCPOLEN_SACKPERM_ALIGNED and TCPOLEN_MSS_ALIGNED are the same value.
      
      However, we should use the proper constant in the formula.
      
        MAX_TCP_OPTION_SPACE - TCPOLEN_MSS_ALIGNED
                             - TCPOLEN_TSTAMP_ALIGNED
                             - TCPOLEN_WSCALE_ALIGNED
      
      Fixes: 4954f17d ("net/tcp: Introduce TCP_AO setsockopt()s")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarDmitry Safonov <dima@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a8e987d
    • Jamal Hadi Salim's avatar
      net, sched: Fix SKB_NOT_DROPPED_YET splat under debug config · 40cb2fdf
      Jamal Hadi Salim authored
      
      Getting the following splat [1] with CONFIG_DEBUG_NET=y and this
      reproducer [2]. Problem seems to be that classifiers clear 'struct
      tcf_result::drop_reason', thereby triggering the warning in
      __kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0).
      
      Fixed by disambiguating a legit error from a verdict with a bogus drop_reason
      
      [1]
      WARNING: CPU: 0 PID: 181 at net/core/skbuff.c:1082 kfree_skb_reason+0x38/0x130
      Modules linked in:
      CPU: 0 PID: 181 Comm: mausezahn Not tainted 6.6.0-rc6-custom-ge43e6d9582e0 #682
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
      RIP: 0010:kfree_skb_reason+0x38/0x130
      [...]
      Call Trace:
       <IRQ>
       __netif_receive_skb_core.constprop.0+0x837/0xdb0
       __netif_receive_skb_one_core+0x3c/0x70
       process_backlog+0x95/0x130
       __napi_poll+0x25/0x1b0
       net_rx_action+0x29b/0x310
       __do_softirq+0xc0/0x29b
       do_softirq+0x43/0x60
       </IRQ>
      
      [2]
      
      ip link add name veth0 type veth peer name veth1
      ip link set dev veth0 up
      ip link set dev veth1 up
      tc qdisc add dev veth1 clsact
      tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
      mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1
      
      Ido reported:
      
        [...] getting the following splat [1] with CONFIG_DEBUG_NET=y and this
        reproducer [2]. Problem seems to be that classifiers clear 'struct
        tcf_result::drop_reason', thereby triggering the warning in
        __kfree_skb_reason() due to reason being 'SKB_NOT_DROPPED_YET' (0). [...]
      
        [1]
        WARNING: CPU: 0 PID: 181 at net/core/skbuff.c:1082 kfree_skb_reason+0x38/0x130
        Modules linked in:
        CPU: 0 PID: 181 Comm: mausezahn Not tainted 6.6.0-rc6-custom-ge43e6d9582e0 #682
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014
        RIP: 0010:kfree_skb_reason+0x38/0x130
        [...]
        Call Trace:
         <IRQ>
         __netif_receive_skb_core.constprop.0+0x837/0xdb0
         __netif_receive_skb_one_core+0x3c/0x70
         process_backlog+0x95/0x130
         __napi_poll+0x25/0x1b0
         net_rx_action+0x29b/0x310
         __do_softirq+0xc0/0x29b
         do_softirq+0x43/0x60
         </IRQ>
      
        [2]
        #!/bin/bash
      
        ip link add name veth0 type veth peer name veth1
        ip link set dev veth0 up
        ip link set dev veth1 up
        tc qdisc add dev veth1 clsact
        tc filter add dev veth1 ingress pref 1 proto all flower dst_mac 00:11:22:33:44:55 action drop
        mausezahn veth0 -a own -b 00:11:22:33:44:55 -q -c 1
      
      What happens is that inside most classifiers the tcf_result is copied over
      from a filter template e.g. *res = f->res which then implicitly overrides
      the prior SKB_DROP_REASON_TC_{INGRESS,EGRESS} default drop code which was
      set via sch_handle_{ingress,egress}() for kfree_skb_reason().
      
      Commit text above copied verbatim from Daniel. The general idea of the patch
      is not very different from what Ido originally posted but instead done at the
      cls_api codepath.
      
      Fixes: 54a59aed ("net, sched: Make tc-related drop reason more flexible")
      Reported-by: default avatarIdo Schimmel <idosch@idosch.org>
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Link: https://lore.kernel.org/netdev/ZTjY959R+AFXf3Xy@shredder
      
      
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40cb2fdf
  6. Nov 03, 2023
  7. Nov 02, 2023
  8. Nov 01, 2023
    • felix's avatar
      SUNRPC: Fix RPC client cleaned up the freed pipefs dentries · bfca5fb4
      felix authored
      
      RPC client pipefs dentries cleanup is in separated rpc_remove_pipedir()
      workqueue,which takes care about pipefs superblock locking.
      In some special scenarios, when kernel frees the pipefs sb of the
      current client and immediately alloctes a new pipefs sb,
      rpc_remove_pipedir function would misjudge the existence of pipefs
      sb which is not the one it used to hold. As a result,
      the rpc_remove_pipedir would clean the released freed pipefs dentries.
      
      To fix this issue, rpc_remove_pipedir should check whether the
      current pipefs sb is consistent with the original pipefs sb.
      
      This error can be catched by KASAN:
      =========================================================
      [  250.497700] BUG: KASAN: slab-use-after-free in dget_parent+0x195/0x200
      [  250.498315] Read of size 4 at addr ffff88800a2ab804 by task kworker/0:18/106503
      [  250.500549] Workqueue: events rpc_free_client_work
      [  250.501001] Call Trace:
      [  250.502880]  kasan_report+0xb6/0xf0
      [  250.503209]  ? dget_parent+0x195/0x200
      [  250.503561]  dget_parent+0x195/0x200
      [  250.503897]  ? __pfx_rpc_clntdir_depopulate+0x10/0x10
      [  250.504384]  rpc_rmdir_depopulate+0x1b/0x90
      [  250.504781]  rpc_remove_client_dir+0xf5/0x150
      [  250.505195]  rpc_free_client_work+0xe4/0x230
      [  250.505598]  process_one_work+0x8ee/0x13b0
      ...
      [   22.039056] Allocated by task 244:
      [   22.039390]  kasan_save_stack+0x22/0x50
      [   22.039758]  kasan_set_track+0x25/0x30
      [   22.040109]  __kasan_slab_alloc+0x59/0x70
      [   22.040487]  kmem_cache_alloc_lru+0xf0/0x240
      [   22.040889]  __d_alloc+0x31/0x8e0
      [   22.041207]  d_alloc+0x44/0x1f0
      [   22.041514]  __rpc_lookup_create_exclusive+0x11c/0x140
      [   22.041987]  rpc_mkdir_populate.constprop.0+0x5f/0x110
      [   22.042459]  rpc_create_client_dir+0x34/0x150
      [   22.042874]  rpc_setup_pipedir_sb+0x102/0x1c0
      [   22.043284]  rpc_client_register+0x136/0x4e0
      [   22.043689]  rpc_new_client+0x911/0x1020
      [   22.044057]  rpc_create_xprt+0xcb/0x370
      [   22.044417]  rpc_create+0x36b/0x6c0
      ...
      [   22.049524] Freed by task 0:
      [   22.049803]  kasan_save_stack+0x22/0x50
      [   22.050165]  kasan_set_track+0x25/0x30
      [   22.050520]  kasan_save_free_info+0x2b/0x50
      [   22.050921]  __kasan_slab_free+0x10e/0x1a0
      [   22.051306]  kmem_cache_free+0xa5/0x390
      [   22.051667]  rcu_core+0x62c/0x1930
      [   22.051995]  __do_softirq+0x165/0x52a
      [   22.052347]
      [   22.052503] Last potentially related work creation:
      [   22.052952]  kasan_save_stack+0x22/0x50
      [   22.053313]  __kasan_record_aux_stack+0x8e/0xa0
      [   22.053739]  __call_rcu_common.constprop.0+0x6b/0x8b0
      [   22.054209]  dentry_free+0xb2/0x140
      [   22.054540]  __dentry_kill+0x3be/0x540
      [   22.054900]  shrink_dentry_list+0x199/0x510
      [   22.055293]  shrink_dcache_parent+0x190/0x240
      [   22.055703]  do_one_tree+0x11/0x40
      [   22.056028]  shrink_dcache_for_umount+0x61/0x140
      [   22.056461]  generic_shutdown_super+0x70/0x590
      [   22.056879]  kill_anon_super+0x3a/0x60
      [   22.057234]  rpc_kill_sb+0x121/0x200
      
      Fixes: 0157d021 ("SUNRPC: handle RPC client pipefs dentries by network namespace aware routines")
      Signed-off-by: default avatarfelix <fuzhen5@huawei.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      bfca5fb4
    • Dan Carpenter's avatar
      SUNRPC: Add an IS_ERR() check back to where it was · 4f3ed837
      Dan Carpenter authored
      
      This IS_ERR() check was deleted during in a cleanup because, at the time,
      the rpcb_call_async() function could not return an error pointer.  That
      changed in commit 25cf32ad ("SUNRPC: Handle allocation failure in
      rpc_new_task()") and now it can return an error pointer.  Put the check
      back.
      
      A related revert was done in commit 13bd9014 ("Revert "SUNRPC:
      Remove unreachable error condition"").
      
      Fixes: 037e910b ("SUNRPC: Remove unreachable error condition in rpcb_getport_async()")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      4f3ed837
  9. Oct 31, 2023
  10. Oct 28, 2023
  11. Oct 27, 2023
    • Yonghong Song's avatar
      net: bpf: Use sockopt_lock_sock() in ip_sock_set_tos() · 06497763
      Yonghong Song authored
      
      With latest sync from net-next tree, bpf-next has a bpf selftest failure:
        [root@arch-fb-vm1 bpf]# ./test_progs -t setget_sockopt
        ...
        [   76.194349] ============================================
        [   76.194682] WARNING: possible recursive locking detected
        [   76.195039] 6.6.0-rc7-g37884503df08-dirty #67 Tainted: G        W  OE
        [   76.195518] --------------------------------------------
        [   76.195852] new_name/154 is trying to acquire lock:
        [   76.196159] ffff8c3e06ad8d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: ip_sock_set_tos+0x19/0x30
        [   76.196669]
        [   76.196669] but task is already holding lock:
        [   76.197028] ffff8c3e06ad8d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_listen+0x21/0x70
        [   76.197517]
        [   76.197517] other info that might help us debug this:
        [   76.197919]  Possible unsafe locking scenario:
        [   76.197919]
        [   76.198287]        CPU0
        [   76.198444]        ----
        [   76.198600]   lock(sk_lock-AF_INET);
        [   76.198831]   lock(sk_lock-AF_INET);
        [   76.199062]
        [   76.199062]  *** DEADLOCK ***
        [   76.199062]
        [   76.199420]  May be due to missing lock nesting notation
        [   76.199420]
        [   76.199879] 2 locks held by new_name/154:
        [   76.200131]  #0: ffff8c3e06ad8d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_listen+0x21/0x70
        [   76.200644]  #1: ffffffff90f96a40 (rcu_read_lock){....}-{1:2}, at: __cgroup_bpf_run_filter_sock_ops+0x55/0x290
        [   76.201268]
        [   76.201268] stack backtrace:
        [   76.201538] CPU: 4 PID: 154 Comm: new_name Tainted: G        W  OE      6.6.0-rc7-g37884503df08-dirty #67
        [   76.202134] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
        [   76.202699] Call Trace:
        [   76.202858]  <TASK>
        [   76.203002]  dump_stack_lvl+0x4b/0x80
        [   76.203239]  __lock_acquire+0x740/0x1ec0
        [   76.203503]  lock_acquire+0xc1/0x2a0
        [   76.203766]  ? ip_sock_set_tos+0x19/0x30
        [   76.204050]  ? sk_stream_write_space+0x12a/0x230
        [   76.204389]  ? lock_release+0xbe/0x260
        [   76.204661]  lock_sock_nested+0x32/0x80
        [   76.204942]  ? ip_sock_set_tos+0x19/0x30
        [   76.205208]  ip_sock_set_tos+0x19/0x30
        [   76.205452]  do_ip_setsockopt+0x4b3/0x1580
        [   76.205719]  __bpf_setsockopt+0x62/0xa0
        [   76.205963]  bpf_sock_ops_setsockopt+0x11/0x20
        [   76.206247]  bpf_prog_630217292049c96e_bpf_test_sockopt_int+0xbc/0x123
        [   76.206660]  bpf_prog_493685a3bae00bbd_bpf_test_ip_sockopt+0x49/0x4b
        [   76.207055]  bpf_prog_b0bcd27f269aeea0_skops_sockopt+0x44c/0xec7
        [   76.207437]  __cgroup_bpf_run_filter_sock_ops+0xda/0x290
        [   76.207829]  __inet_listen_sk+0x108/0x1b0
        [   76.208122]  inet_listen+0x48/0x70
        [   76.208373]  __sys_listen+0x74/0xb0
        [   76.208630]  __x64_sys_listen+0x16/0x20
        [   76.208911]  do_syscall_64+0x3f/0x90
        [   76.209174]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
        ...
      
      Both ip_sock_set_tos() and inet_listen() calls lock_sock(sk) which
      caused a dead lock.
      
      To fix the issue, use sockopt_lock_sock() in ip_sock_set_tos()
      instead. sockopt_lock_sock() will avoid lock_sock() if it is in bpf
      context.
      
      Fixes: 878d951c ("inet: lock the socket in ip_sock_set_tos()")
      Suggested-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20231027182424.1444845-1-yonghong.song@linux.dev
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      06497763
    • Kuniyuki Iwashima's avatar
      af_unix: Remove module remnants. · 3a04927f
      Kuniyuki Iwashima authored
      
      Since commit 97154bcf ("af_unix: Kconfig: make CONFIG_UNIX bool"),
      af_unix.c is no longer built as module.
      
      Let's remove unnecessary #if condition, exitcall, and module macros.
      
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20231026212305.45545-1-kuniyu@amazon.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3a04927f
Loading