Skip to content
Snippets Groups Projects
  1. Sep 14, 2023
    • Kuniyuki Iwashima's avatar
      kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg(). · a22730b1
      Kuniyuki Iwashima authored
      
      syzkaller found a memory leak in kcm_sendmsg(), and commit c821a88b
      ("kcm: Fix memory leak in error path of kcm_sendmsg()") suppressed it by
      updating kcm_tx_msg(head)->last_skb if partial data is copied so that the
      following sendmsg() will resume from the skb.
      
      However, we cannot know how many bytes were copied when we get the error.
      Thus, we could mess up the MSG_MORE queue.
      
      When kcm_sendmsg() fails for SOCK_DGRAM, we should purge the queue as we
      do so for UDP by udp_flush_pending_frames().
      
      Even without this change, when the error occurred, the following sendmsg()
      resumed from a wrong skb and the queue was messed up.  However, we have
      yet to get such a report, and only syzkaller stumbled on it.  So, this
      can be changed safely.
      
      Note this does not change SOCK_SEQPACKET behaviour.
      
      Fixes: c821a88b ("kcm: Fix memory leak in error path of kcm_sendmsg()")
      Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230912022753.33327-1-kuniyu@amazon.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a22730b1
  2. Sep 13, 2023
  3. Sep 12, 2023
    • Liu Jian's avatar
      net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict() · cfaa80c9
      Liu Jian authored
      
      I got the below warning when do fuzzing test:
      BUG: KASAN: null-ptr-deref in scatterwalk_copychunks+0x320/0x470
      Read of size 4 at addr 0000000000000008 by task kworker/u8:1/9
      
      CPU: 0 PID: 9 Comm: kworker/u8:1 Tainted: G           OE
      Hardware name: linux,dummy-virt (DT)
      Workqueue: pencrypt_parallel padata_parallel_worker
      Call trace:
       dump_backtrace+0x0/0x420
       show_stack+0x34/0x44
       dump_stack+0x1d0/0x248
       __kasan_report+0x138/0x140
       kasan_report+0x44/0x6c
       __asan_load4+0x94/0xd0
       scatterwalk_copychunks+0x320/0x470
       skcipher_next_slow+0x14c/0x290
       skcipher_walk_next+0x2fc/0x480
       skcipher_walk_first+0x9c/0x110
       skcipher_walk_aead_common+0x380/0x440
       skcipher_walk_aead_encrypt+0x54/0x70
       ccm_encrypt+0x13c/0x4d0
       crypto_aead_encrypt+0x7c/0xfc
       pcrypt_aead_enc+0x28/0x84
       padata_parallel_worker+0xd0/0x2dc
       process_one_work+0x49c/0xbdc
       worker_thread+0x124/0x880
       kthread+0x210/0x260
       ret_from_fork+0x10/0x18
      
      This is because the value of rec_seq of tls_crypto_info configured by the
      user program is too large, for example, 0xffffffffffffff. In addition, TLS
      is asynchronously accelerated. When tls_do_encryption() returns
      -EINPROGRESS and sk->sk_err is set to EBADMSG due to rec_seq overflow,
      skmsg is released before the asynchronous encryption process ends. As a
      result, the UAF problem occurs during the asynchronous processing of the
      encryption module.
      
      If the operation is asynchronous and the encryption module returns
      EINPROGRESS, do not free the record information.
      
      Fixes: 635d9398 ("net/tls: free record only on encryption error")
      Signed-off-by: default avatarLiu Jian <liujian56@huawei.com>
      Reviewed-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Link: https://lore.kernel.org/r/20230909081434.2324940-1-liujian56@huawei.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cfaa80c9
  4. Sep 11, 2023
    • Shigeru Yoshida's avatar
      kcm: Fix memory leak in error path of kcm_sendmsg() · c821a88b
      Shigeru Yoshida authored
      
      syzbot reported a memory leak like below:
      
      BUG: memory leak
      unreferenced object 0xffff88810b088c00 (size 240):
        comm "syz-executor186", pid 5012, jiffies 4294943306 (age 13.680s)
        hex dump (first 32 bytes):
          00 89 08 0b 81 88 ff ff 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff83e5d5ff>] __alloc_skb+0x1ef/0x230 net/core/skbuff.c:634
          [<ffffffff84606e59>] alloc_skb include/linux/skbuff.h:1289 [inline]
          [<ffffffff84606e59>] kcm_sendmsg+0x269/0x1050 net/kcm/kcmsock.c:815
          [<ffffffff83e479c6>] sock_sendmsg_nosec net/socket.c:725 [inline]
          [<ffffffff83e479c6>] sock_sendmsg+0x56/0xb0 net/socket.c:748
          [<ffffffff83e47f55>] ____sys_sendmsg+0x365/0x470 net/socket.c:2494
          [<ffffffff83e4c389>] ___sys_sendmsg+0xc9/0x130 net/socket.c:2548
          [<ffffffff83e4c536>] __sys_sendmsg+0xa6/0x120 net/socket.c:2577
          [<ffffffff84ad7bb8>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          [<ffffffff84ad7bb8>] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
          [<ffffffff84c0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      In kcm_sendmsg(), kcm_tx_msg(head)->last_skb is used as a cursor to append
      newly allocated skbs to 'head'. If some bytes are copied, an error occurred,
      and jumped to out_error label, 'last_skb' is left unmodified. A later
      kcm_sendmsg() will use an obsoleted 'last_skb' reference, corrupting the
      'head' frag_list and causing the leak.
      
      This patch fixes this issue by properly updating the last allocated skb in
      'last_skb'.
      
      Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module")
      Reported-and-tested-by: default avatar <syzbot+6f98de741f7dbbfc4ccb@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=6f98de741f7dbbfc4ccb
      
      
      Signed-off-by: default avatarShigeru Yoshida <syoshida@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c821a88b
    • Ziyang Xuan's avatar
      hsr: Fix uninit-value access in fill_frame_info() · 484b4833
      Ziyang Xuan authored
      
      Syzbot reports the following uninit-value access problem.
      
      =====================================================
      BUG: KMSAN: uninit-value in fill_frame_info net/hsr/hsr_forward.c:601 [inline]
      BUG: KMSAN: uninit-value in hsr_forward_skb+0x9bd/0x30f0 net/hsr/hsr_forward.c:616
       fill_frame_info net/hsr/hsr_forward.c:601 [inline]
       hsr_forward_skb+0x9bd/0x30f0 net/hsr/hsr_forward.c:616
       hsr_dev_xmit+0x192/0x330 net/hsr/hsr_device.c:223
       __netdev_start_xmit include/linux/netdevice.h:4889 [inline]
       netdev_start_xmit include/linux/netdevice.h:4903 [inline]
       xmit_one net/core/dev.c:3544 [inline]
       dev_hard_start_xmit+0x247/0xa10 net/core/dev.c:3560
       __dev_queue_xmit+0x34d0/0x52a0 net/core/dev.c:4340
       dev_queue_xmit include/linux/netdevice.h:3082 [inline]
       packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
       packet_snd net/packet/af_packet.c:3087 [inline]
       packet_sendmsg+0x8b1d/0x9f30 net/packet/af_packet.c:3119
       sock_sendmsg_nosec net/socket.c:730 [inline]
       sock_sendmsg net/socket.c:753 [inline]
       __sys_sendto+0x781/0xa30 net/socket.c:2176
       __do_sys_sendto net/socket.c:2188 [inline]
       __se_sys_sendto net/socket.c:2184 [inline]
       __ia32_sys_sendto+0x11f/0x1c0 net/socket.c:2184
       do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
       __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178
       do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203
       do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246
       entry_SYSENTER_compat_after_hwframe+0x70/0x82
      
      Uninit was created at:
       slab_post_alloc_hook+0x12f/0xb70 mm/slab.h:767
       slab_alloc_node mm/slub.c:3478 [inline]
       kmem_cache_alloc_node+0x577/0xa80 mm/slub.c:3523
       kmalloc_reserve+0x148/0x470 net/core/skbuff.c:559
       __alloc_skb+0x318/0x740 net/core/skbuff.c:644
       alloc_skb include/linux/skbuff.h:1286 [inline]
       alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6299
       sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2794
       packet_alloc_skb net/packet/af_packet.c:2936 [inline]
       packet_snd net/packet/af_packet.c:3030 [inline]
       packet_sendmsg+0x70e8/0x9f30 net/packet/af_packet.c:3119
       sock_sendmsg_nosec net/socket.c:730 [inline]
       sock_sendmsg net/socket.c:753 [inline]
       __sys_sendto+0x781/0xa30 net/socket.c:2176
       __do_sys_sendto net/socket.c:2188 [inline]
       __se_sys_sendto net/socket.c:2184 [inline]
       __ia32_sys_sendto+0x11f/0x1c0 net/socket.c:2184
       do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
       __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178
       do_fast_syscall_32+0x37/0x80 arch/x86/entry/common.c:203
       do_SYSENTER_32+0x1f/0x30 arch/x86/entry/common.c:246
       entry_SYSENTER_compat_after_hwframe+0x70/0x82
      
      It is because VLAN not yet supported in hsr driver. Return error
      when protocol is ETH_P_8021Q in fill_frame_info() now to fix it.
      
      Fixes: 451d8123 ("net: prp: add packet handling support")
      Reported-by: default avatar <syzbot+bf7e6250c7ce248f3ec9@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=bf7e6250c7ce248f3ec9
      
      
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      484b4833
  5. Sep 10, 2023
    • Guangguan Wang's avatar
      net/smc: use smc_lgr_list.lock to protect smc_lgr_list.list iterate in smcr_port_add · f5146e3e
      Guangguan Wang authored
      
      While doing smcr_port_add, there maybe linkgroup add into or delete
      from smc_lgr_list.list at the same time, which may result kernel crash.
      So, use smc_lgr_list.lock to protect smc_lgr_list.list iterate in
      smcr_port_add.
      
      The crash calltrace show below:
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD 0 P4D 0
      Oops: 0000 [#1] SMP NOPTI
      CPU: 0 PID: 559726 Comm: kworker/0:92 Kdump: loaded Tainted: G
      Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 449e491 04/01/2014
      Workqueue: events smc_ib_port_event_work [smc]
      RIP: 0010:smcr_port_add+0xa6/0xf0 [smc]
      RSP: 0000:ffffa5a2c8f67de0 EFLAGS: 00010297
      RAX: 0000000000000001 RBX: ffff9935e0650000 RCX: 0000000000000000
      RDX: 0000000000000010 RSI: ffff9935e0654290 RDI: ffff9935c8560000
      RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9934c0401918
      R10: 0000000000000000 R11: ffffffffb4a5c278 R12: ffff99364029aae4
      R13: ffff99364029aa00 R14: 00000000ffffffed R15: ffff99364029ab08
      FS:  0000000000000000(0000) GS:ffff994380600000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000f06a10003 CR4: 0000000002770ef0
      PKRU: 55555554
      Call Trace:
       smc_ib_port_event_work+0x18f/0x380 [smc]
       process_one_work+0x19b/0x340
       worker_thread+0x30/0x370
       ? process_one_work+0x340/0x340
       kthread+0x114/0x130
       ? __kthread_cancel_work+0x50/0x50
       ret_from_fork+0x1f/0x30
      
      Fixes: 1f90a05d ("net/smc: add smcr_port_add() and smcr_link_up() processing")
      Signed-off-by: default avatarGuangguan Wang <guangguan.wang@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5146e3e
    • Guangguan Wang's avatar
      net/smc: bugfix for smcr v2 server connect success statistic · 6912e724
      Guangguan Wang authored
      
      In the macro SMC_STAT_SERV_SUCC_INC, the smcd_version is used
      to determin whether to increase the v1 statistic or the v2
      statistic. It is correct for SMCD. But for SMCR, smcr_version
      should be used.
      
      Signed-off-by: default avatarGuangguan Wang <guangguan.wang@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6912e724
  6. Sep 08, 2023
    • Liu Jian's avatar
      net: ipv4: fix one memleak in __inet_del_ifa() · ac28b1ec
      Liu Jian authored
      
      I got the below warning when do fuzzing test:
      unregister_netdevice: waiting for bond0 to become free. Usage count = 2
      
      It can be repoduced via:
      
      ip link add bond0 type bond
      sysctl -w net.ipv4.conf.bond0.promote_secondaries=1
      ip addr add 4.117.174.103/0 scope 0x40 dev bond0
      ip addr add 192.168.100.111/255.255.255.254 scope 0 dev bond0
      ip addr add 0.0.0.4/0 scope 0x40 secondary dev bond0
      ip addr del 4.117.174.103/0 scope 0x40 dev bond0
      ip link delete bond0 type bond
      
      In this reproduction test case, an incorrect 'last_prim' is found in
      __inet_del_ifa(), as a result, the secondary address(0.0.0.4/0 scope 0x40)
      is lost. The memory of the secondary address is leaked and the reference of
      in_device and net_device is leaked.
      
      Fix this problem:
      Look for 'last_prim' starting at location of the deleted IP and inserting
      the promoted IP into the location of 'last_prim'.
      
      Fixes: 0ff60a45 ("[IPV4]: Fix secondary IP addresses after promotion")
      Signed-off-by: default avatarLiu Jian <liujian56@huawei.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac28b1ec
  7. Sep 06, 2023
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: Unbreak audit log reset · 9b5ba5c9
      Pablo Neira Ayuso authored
      
      Deliver audit log from __nf_tables_dump_rules(), table dereference at
      the end of the table list loop might point to the list head, leading to
      this crash.
      
      [ 4137.407349] BUG: unable to handle page fault for address: 00000000001f3c50
      [ 4137.407357] #PF: supervisor read access in kernel mode
      [ 4137.407359] #PF: error_code(0x0000) - not-present page
      [ 4137.407360] PGD 0 P4D 0
      [ 4137.407363] Oops: 0000 [#1] PREEMPT SMP PTI
      [ 4137.407365] CPU: 4 PID: 500177 Comm: nft Not tainted 6.5.0+ #277
      [ 4137.407369] RIP: 0010:string+0x49/0xd0
      [ 4137.407374] Code: ff 77 36 45 89 d1 31 f6 49 01 f9 66 45 85 d2 75 19 eb 1e 49 39 f8 76 02 88 07 48 83 c7 01 83 c6 01 48 83 c2 01 4c 39 cf 74 07 <0f> b6 02 84 c0 75 e2 4c 89 c2 e9 58 e5 ff ff 48 c7 c0 0e b2 ff 81
      [ 4137.407377] RSP: 0018:ffff8881179737f0 EFLAGS: 00010286
      [ 4137.407379] RAX: 00000000001f2c50 RBX: ffff888117973848 RCX: ffff0a00ffffff04
      [ 4137.407380] RDX: 00000000001f3c50 RSI: 0000000000000000 RDI: 0000000000000000
      [ 4137.407381] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff
      [ 4137.407383] R10: ffffffffffffffff R11: ffff88813584d200 R12: 0000000000000000
      [ 4137.407384] R13: ffffffffa15cf709 R14: 0000000000000000 R15: ffffffffa15cf709
      [ 4137.407385] FS:  00007fcfc18bb580(0000) GS:ffff88840e700000(0000) knlGS:0000000000000000
      [ 4137.407387] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 4137.407388] CR2: 00000000001f3c50 CR3: 00000001055b2001 CR4: 00000000001706e0
      [ 4137.407390] Call Trace:
      [ 4137.407392]  <TASK>
      [ 4137.407393]  ? __die+0x1b/0x60
      [ 4137.407397]  ? page_fault_oops+0x6b/0xa0
      [ 4137.407399]  ? exc_page_fault+0x60/0x120
      [ 4137.407403]  ? asm_exc_page_fault+0x22/0x30
      [ 4137.407408]  ? string+0x49/0xd0
      [ 4137.407410]  vsnprintf+0x257/0x4f0
      [ 4137.407414]  kvasprintf+0x3e/0xb0
      [ 4137.407417]  kasprintf+0x3e/0x50
      [ 4137.407419]  nf_tables_dump_rules+0x1c0/0x360 [nf_tables]
      [ 4137.407439]  ? __alloc_skb+0xc3/0x170
      [ 4137.407442]  netlink_dump+0x170/0x330
      [ 4137.407447]  __netlink_dump_start+0x227/0x300
      [ 4137.407449]  nf_tables_getrule+0x205/0x390 [nf_tables]
      
      Deliver audit log only once at the end of the rule dump+reset for
      consistency with the set dump+reset.
      
      Ensure audit reset access to table under rcu read side lock. The table
      list iteration holds rcu read lock side, but recent audit code
      dereferences table object out of the rcu read lock side.
      
      Fixes: ea078ae9 ("netfilter: nf_tables: Audit log rule reset")
      Fixes: 7e9be112 ("netfilter: nf_tables: Audit log setelem reset")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      9b5ba5c9
    • Kyle Zeng's avatar
      netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c · 050d91c0
      Kyle Zeng authored
      
      The missing IP_SET_HASH_WITH_NET0 macro in ip_set_hash_netportnet can
      lead to the use of wrong `CIDR_POS(c)` for calculating array offsets,
      which can lead to integer underflow. As a result, it leads to slab
      out-of-bound access.
      This patch adds back the IP_SET_HASH_WITH_NET0 macro to
      ip_set_hash_netportnet to address the issue.
      
      Fixes: 886503f3 ("netfilter: ipset: actually allow allowable CIDR 0 in hash:net,port,net")
      Suggested-by: default avatarJozsef Kadlecsik <kadlec@netfilter.org>
      Signed-off-by: default avatarKyle Zeng <zengyhkyle@gmail.com>
      Acked-by: default avatarJozsef Kadlecsik <kadlec@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      050d91c0
    • Pablo Neira Ayuso's avatar
      netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction · 2ee52ae9
      Pablo Neira Ayuso authored
      
      New elements in this transaction might expired before such transaction
      ends. Skip sync GC for such elements otherwise commit path might walk
      over an already released object. Once transaction is finished, async GC
      will collect such expired element.
      
      Fixes: f6c383b8 ("netfilter: nf_tables: adapt set backend to use GC transaction API")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      2ee52ae9
    • Wander Lairson Costa's avatar
      netfilter: nfnetlink_osf: avoid OOB read · f4f8a780
      Wander Lairson Costa authored
      
      The opt_num field is controlled by user mode and is not currently
      validated inside the kernel. An attacker can take advantage of this to
      trigger an OOB read and potentially leak information.
      
      BUG: KASAN: slab-out-of-bounds in nf_osf_match_one+0xbed/0xd10 net/netfilter/nfnetlink_osf.c:88
      Read of size 2 at addr ffff88804bc64272 by task poc/6431
      
      CPU: 1 PID: 6431 Comm: poc Not tainted 6.0.0-rc4 #1
      Call Trace:
       nf_osf_match_one+0xbed/0xd10 net/netfilter/nfnetlink_osf.c:88
       nf_osf_find+0x186/0x2f0 net/netfilter/nfnetlink_osf.c:281
       nft_osf_eval+0x37f/0x590 net/netfilter/nft_osf.c:47
       expr_call_ops_eval net/netfilter/nf_tables_core.c:214
       nft_do_chain+0x2b0/0x1490 net/netfilter/nf_tables_core.c:264
       nft_do_chain_ipv4+0x17c/0x1f0 net/netfilter/nft_chain_filter.c:23
       [..]
      
      Also add validation to genre, subtype and version fields.
      
      Fixes: 11eeef41 ("netfilter: passive OS fingerprint xtables match")
      Reported-by: default avatarLucas Leong <wmliang@infosec.exchange>
      Signed-off-by: default avatarWander Lairson Costa <wander@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      f4f8a780
    • Florian Westphal's avatar
      netfilter: nftables: exthdr: fix 4-byte stack OOB write · fd94d9da
      Florian Westphal authored
      
      If priv->len is a multiple of 4, then dst[len / 4] can write past
      the destination array which leads to stack corruption.
      
      This construct is necessary to clean the remainder of the register
      in case ->len is NOT a multiple of the register size, so make it
      conditional just like nft_payload.c does.
      
      The bug was added in 4.1 cycle and then copied/inherited when
      tcp/sctp and ip option support was added.
      
      Bug reported by Zero Day Initiative project (ZDI-CAN-21950,
      ZDI-CAN-21951, ZDI-CAN-21961).
      
      Fixes: 49499c3e ("netfilter: nf_tables: switch registers to 32 bit addressing")
      Fixes: 935b7f64 ("netfilter: nft_exthdr: add TCP option matching")
      Fixes: 133dc203 ("netfilter: nft_exthdr: Support SCTP chunks")
      Fixes: dbb5281a ("netfilter: nf_tables: add support for matching IPv4 options")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      fd94d9da
    • Quan Tian's avatar
      net/ipv6: SKB symmetric hash should incorporate transport ports · a5e2151f
      Quan Tian authored
      
      __skb_get_hash_symmetric() was added to compute a symmetric hash over
      the protocol, addresses and transport ports, by commit eb70db87
      ("packet: Use symmetric hash for PACKET_FANOUT_HASH."). It uses
      flow_keys_dissector_symmetric_keys as the flow_dissector to incorporate
      IPv4 addresses, IPv6 addresses and ports. However, it should not specify
      the flag as FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL, which stops further
      dissection when an IPv6 flow label is encountered, making transport
      ports not being incorporated in such case.
      
      As a consequence, the symmetric hash is based on 5-tuple for IPv4 but
      3-tuple for IPv6 when flow label is present. It caused a few problems,
      e.g. when nft symhash and openvswitch l4_sym rely on the symmetric hash
      to perform load balancing as different L4 flows between two given IPv6
      addresses would always get the same symmetric hash, leading to uneven
      traffic distribution.
      
      Removing the use of FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL makes sure the
      symmetric hash is based on 5-tuple for both IPv4 and IPv6 consistently.
      
      Fixes: eb70db87 ("packet: Use symmetric hash for PACKET_FANOUT_HASH.")
      Reported-by: default avatarLars Ekman <uablrek@gmail.com>
      Closes: https://github.com/antrea-io/antrea/issues/5457
      
      
      Signed-off-by: default avatarQuan Tian <qtian@vmware.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5e2151f
  8. Sep 05, 2023
    • Eric Dumazet's avatar
      igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU · c3b704d4
      Eric Dumazet authored
      This is a follow up of commit 915d975b ("net: deal with integer
      overflows in kmalloc_reserve()") based on David Laight feedback.
      
      Back in 2010, I failed to realize malicious users could set dev->mtu
      to arbitrary values. This mtu has been since limited to 0x7fffffff but
      regardless of how big dev->mtu is, it makes no sense for igmpv3_newpack()
      to allocate more than IP_MAX_MTU and risk various skb fields overflows.
      
      Fixes: 57e1ab6e ("igmp: refine skb allocations")
      Link: https://lore.kernel.org/netdev/d273628df80f45428e739274ab9ecb72@AcuMS.aculab.com/
      
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDavid Laight <David.Laight@ACULAB.COM>
      Cc: Kyle Zeng <zengyhkyle@gmail.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3b704d4
    • Shigeru Yoshida's avatar
      kcm: Destroy mutex in kcm_exit_net() · 6ad40b36
      Shigeru Yoshida authored
      
      kcm_exit_net() should call mutex_destroy() on knet->mutex. This is especially
      needed if CONFIG_DEBUG_MUTEXES is enabled.
      
      Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module")
      Signed-off-by: default avatarShigeru Yoshida <syoshida@redhat.com>
      Link: https://lore.kernel.org/r/20230902170708.1727999-1-syoshida@redhat.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6ad40b36
    • valis's avatar
      net: sched: sch_qfq: Fix UAF in qfq_dequeue() · 8fc134fe
      valis authored
      
      When the plug qdisc is used as a class of the qfq qdisc it could trigger a
      UAF. This issue can be reproduced with following commands:
      
        tc qdisc add dev lo root handle 1: qfq
        tc class add dev lo parent 1: classid 1:1 qfq weight 1 maxpkt 512
        tc qdisc add dev lo parent 1:1 handle 2: plug
        tc filter add dev lo parent 1: basic classid 1:1
        ping -c1 127.0.0.1
      
      and boom:
      
      [  285.353793] BUG: KASAN: slab-use-after-free in qfq_dequeue+0xa7/0x7f0
      [  285.354910] Read of size 4 at addr ffff8880bad312a8 by task ping/144
      [  285.355903]
      [  285.356165] CPU: 1 PID: 144 Comm: ping Not tainted 6.5.0-rc3+ #4
      [  285.357112] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
      [  285.358376] Call Trace:
      [  285.358773]  <IRQ>
      [  285.359109]  dump_stack_lvl+0x44/0x60
      [  285.359708]  print_address_description.constprop.0+0x2c/0x3c0
      [  285.360611]  kasan_report+0x10c/0x120
      [  285.361195]  ? qfq_dequeue+0xa7/0x7f0
      [  285.361780]  qfq_dequeue+0xa7/0x7f0
      [  285.362342]  __qdisc_run+0xf1/0x970
      [  285.362903]  net_tx_action+0x28e/0x460
      [  285.363502]  __do_softirq+0x11b/0x3de
      [  285.364097]  do_softirq.part.0+0x72/0x90
      [  285.364721]  </IRQ>
      [  285.365072]  <TASK>
      [  285.365422]  __local_bh_enable_ip+0x77/0x90
      [  285.366079]  __dev_queue_xmit+0x95f/0x1550
      [  285.366732]  ? __pfx_csum_and_copy_from_iter+0x10/0x10
      [  285.367526]  ? __pfx___dev_queue_xmit+0x10/0x10
      [  285.368259]  ? __build_skb_around+0x129/0x190
      [  285.368960]  ? ip_generic_getfrag+0x12c/0x170
      [  285.369653]  ? __pfx_ip_generic_getfrag+0x10/0x10
      [  285.370390]  ? csum_partial+0x8/0x20
      [  285.370961]  ? raw_getfrag+0xe5/0x140
      [  285.371559]  ip_finish_output2+0x539/0xa40
      [  285.372222]  ? __pfx_ip_finish_output2+0x10/0x10
      [  285.372954]  ip_output+0x113/0x1e0
      [  285.373512]  ? __pfx_ip_output+0x10/0x10
      [  285.374130]  ? icmp_out_count+0x49/0x60
      [  285.374739]  ? __pfx_ip_finish_output+0x10/0x10
      [  285.375457]  ip_push_pending_frames+0xf3/0x100
      [  285.376173]  raw_sendmsg+0xef5/0x12d0
      [  285.376760]  ? do_syscall_64+0x40/0x90
      [  285.377359]  ? __static_call_text_end+0x136578/0x136578
      [  285.378173]  ? do_syscall_64+0x40/0x90
      [  285.378772]  ? kasan_enable_current+0x11/0x20
      [  285.379469]  ? __pfx_raw_sendmsg+0x10/0x10
      [  285.380137]  ? __sock_create+0x13e/0x270
      [  285.380673]  ? __sys_socket+0xf3/0x180
      [  285.381174]  ? __x64_sys_socket+0x3d/0x50
      [  285.381725]  ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      [  285.382425]  ? __rcu_read_unlock+0x48/0x70
      [  285.382975]  ? ip4_datagram_release_cb+0xd8/0x380
      [  285.383608]  ? __pfx_ip4_datagram_release_cb+0x10/0x10
      [  285.384295]  ? preempt_count_sub+0x14/0xc0
      [  285.384844]  ? __list_del_entry_valid+0x76/0x140
      [  285.385467]  ? _raw_spin_lock_bh+0x87/0xe0
      [  285.386014]  ? __pfx__raw_spin_lock_bh+0x10/0x10
      [  285.386645]  ? release_sock+0xa0/0xd0
      [  285.387148]  ? preempt_count_sub+0x14/0xc0
      [  285.387712]  ? freeze_secondary_cpus+0x348/0x3c0
      [  285.388341]  ? aa_sk_perm+0x177/0x390
      [  285.388856]  ? __pfx_aa_sk_perm+0x10/0x10
      [  285.389441]  ? check_stack_object+0x22/0x70
      [  285.390032]  ? inet_send_prepare+0x2f/0x120
      [  285.390603]  ? __pfx_inet_sendmsg+0x10/0x10
      [  285.391172]  sock_sendmsg+0xcc/0xe0
      [  285.391667]  __sys_sendto+0x190/0x230
      [  285.392168]  ? __pfx___sys_sendto+0x10/0x10
      [  285.392727]  ? kvm_clock_get_cycles+0x14/0x30
      [  285.393328]  ? set_normalized_timespec64+0x57/0x70
      [  285.393980]  ? _raw_spin_unlock_irq+0x1b/0x40
      [  285.394578]  ? __x64_sys_clock_gettime+0x11c/0x160
      [  285.395225]  ? __pfx___x64_sys_clock_gettime+0x10/0x10
      [  285.395908]  ? _copy_to_user+0x3e/0x60
      [  285.396432]  ? exit_to_user_mode_prepare+0x1a/0x120
      [  285.397086]  ? syscall_exit_to_user_mode+0x22/0x50
      [  285.397734]  ? do_syscall_64+0x71/0x90
      [  285.398258]  __x64_sys_sendto+0x74/0x90
      [  285.398786]  do_syscall_64+0x64/0x90
      [  285.399273]  ? exit_to_user_mode_prepare+0x1a/0x120
      [  285.399949]  ? syscall_exit_to_user_mode+0x22/0x50
      [  285.400605]  ? do_syscall_64+0x71/0x90
      [  285.401124]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      [  285.401807] RIP: 0033:0x495726
      [  285.402233] Code: ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 09
      [  285.404683] RSP: 002b:00007ffcc25fb618 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [  285.405677] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 0000000000495726
      [  285.406628] RDX: 0000000000000040 RSI: 0000000002518750 RDI: 0000000000000000
      [  285.407565] RBP: 00000000005205ef R08: 00000000005f8838 R09: 000000000000001c
      [  285.408523] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000002517634
      [  285.409460] R13: 00007ffcc25fb6f0 R14: 0000000000000003 R15: 0000000000000000
      [  285.410403]  </TASK>
      [  285.410704]
      [  285.410929] Allocated by task 144:
      [  285.411402]  kasan_save_stack+0x1e/0x40
      [  285.411926]  kasan_set_track+0x21/0x30
      [  285.412442]  __kasan_slab_alloc+0x55/0x70
      [  285.412973]  kmem_cache_alloc_node+0x187/0x3d0
      [  285.413567]  __alloc_skb+0x1b4/0x230
      [  285.414060]  __ip_append_data+0x17f7/0x1b60
      [  285.414633]  ip_append_data+0x97/0xf0
      [  285.415144]  raw_sendmsg+0x5a8/0x12d0
      [  285.415640]  sock_sendmsg+0xcc/0xe0
      [  285.416117]  __sys_sendto+0x190/0x230
      [  285.416626]  __x64_sys_sendto+0x74/0x90
      [  285.417145]  do_syscall_64+0x64/0x90
      [  285.417624]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      [  285.418306]
      [  285.418531] Freed by task 144:
      [  285.418960]  kasan_save_stack+0x1e/0x40
      [  285.419469]  kasan_set_track+0x21/0x30
      [  285.419988]  kasan_save_free_info+0x27/0x40
      [  285.420556]  ____kasan_slab_free+0x109/0x1a0
      [  285.421146]  kmem_cache_free+0x1c2/0x450
      [  285.421680]  __netif_receive_skb_core+0x2ce/0x1870
      [  285.422333]  __netif_receive_skb_one_core+0x97/0x140
      [  285.423003]  process_backlog+0x100/0x2f0
      [  285.423537]  __napi_poll+0x5c/0x2d0
      [  285.424023]  net_rx_action+0x2be/0x560
      [  285.424510]  __do_softirq+0x11b/0x3de
      [  285.425034]
      [  285.425254] The buggy address belongs to the object at ffff8880bad31280
      [  285.425254]  which belongs to the cache skbuff_head_cache of size 224
      [  285.426993] The buggy address is located 40 bytes inside of
      [  285.426993]  freed 224-byte region [ffff8880bad31280, ffff8880bad31360)
      [  285.428572]
      [  285.428798] The buggy address belongs to the physical page:
      [  285.429540] page:00000000f4b77674 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xbad31
      [  285.430758] flags: 0x100000000000200(slab|node=0|zone=1)
      [  285.431447] page_type: 0xffffffff()
      [  285.431934] raw: 0100000000000200 ffff88810094a8c0 dead000000000122 0000000000000000
      [  285.432757] raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
      [  285.433562] page dumped because: kasan: bad access detected
      [  285.434144]
      [  285.434320] Memory state around the buggy address:
      [  285.434828]  ffff8880bad31180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  285.435580]  ffff8880bad31200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  285.436264] >ffff8880bad31280: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  285.436777]                                   ^
      [  285.437106]  ffff8880bad31300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      [  285.437616]  ffff8880bad31380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  285.438126] ==================================================================
      [  285.438662] Disabling lock debugging due to kernel taint
      
      Fix this by:
      1. Changing sch_plug's .peek handler to qdisc_peek_dequeued(), a
      function compatible with non-work-conserving qdiscs
      2. Checking the return value of qdisc_dequeue_peeked() in sch_qfq.
      
      Fixes: 462dbc91 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
      Reported-by: default avatarvalis <sec@valis.email>
      Signed-off-by: default avatarvalis <sec@valis.email>
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Link: https://lore.kernel.org/r/20230901162237.11525-1-jhs@mojatatu.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8fc134fe
  9. Sep 04, 2023
    • Kuniyuki Iwashima's avatar
      af_unix: Fix data race around sk->sk_err. · b1928129
      Kuniyuki Iwashima authored
      
      As with sk->sk_shutdown shown in the previous patch, sk->sk_err can be
      read locklessly by unix_dgram_sendmsg().
      
      Let's use READ_ONCE() for sk_err as well.
      
      Note that the writer side is marked by commit cc04410a ("af_unix:
      annotate lockless accesses to sk->sk_err").
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1928129
    • Kuniyuki Iwashima's avatar
      af_unix: Fix data-races around sk->sk_shutdown. · afe8764f
      Kuniyuki Iwashima authored
      
      sk->sk_shutdown is changed under unix_state_lock(sk), but
      unix_dgram_sendmsg() calls two functions to read sk_shutdown locklessly.
      
        sock_alloc_send_pskb
        `- sock_wait_for_wmem
      
      Let's use READ_ONCE() there.
      
      Note that the writer side was marked by commit e1d09c2c ("af_unix:
      Fix data races around sk->sk_shutdown.").
      
      BUG: KCSAN: data-race in sock_alloc_send_pskb / unix_release_sock
      
      write (marked) to 0xffff8880069af12c of 1 bytes by task 1 on cpu 1:
       unix_release_sock+0x75c/0x910 net/unix/af_unix.c:631
       unix_release+0x59/0x80 net/unix/af_unix.c:1053
       __sock_release+0x7d/0x170 net/socket.c:654
       sock_close+0x19/0x30 net/socket.c:1386
       __fput+0x2a3/0x680 fs/file_table.c:384
       ____fput+0x15/0x20 fs/file_table.c:412
       task_work_run+0x116/0x1a0 kernel/task_work.c:179
       resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204
       __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
       syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297
       do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      
      read to 0xffff8880069af12c of 1 bytes by task 28650 on cpu 0:
       sock_alloc_send_pskb+0xd2/0x620 net/core/sock.c:2767
       unix_dgram_sendmsg+0x2f8/0x14f0 net/unix/af_unix.c:1944
       unix_seqpacket_sendmsg net/unix/af_unix.c:2308 [inline]
       unix_seqpacket_sendmsg+0xba/0x130 net/unix/af_unix.c:2292
       sock_sendmsg_nosec net/socket.c:725 [inline]
       sock_sendmsg+0x148/0x160 net/socket.c:748
       ____sys_sendmsg+0x4e4/0x610 net/socket.c:2494
       ___sys_sendmsg+0xc6/0x140 net/socket.c:2548
       __sys_sendmsg+0x94/0x140 net/socket.c:2577
       __do_sys_sendmsg net/socket.c:2586 [inline]
       __se_sys_sendmsg net/socket.c:2584 [inline]
       __x64_sys_sendmsg+0x45/0x50 net/socket.c:2584
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      
      value changed: 0x00 -> 0x03
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 28650 Comm: systemd-coredum Not tainted 6.4.0-11989-g6843306689af #6
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      afe8764f
    • Kuniyuki Iwashima's avatar
      af_unix: Fix data-race around unix_tot_inflight. · ade32bd8
      Kuniyuki Iwashima authored
      
      unix_tot_inflight is changed under spin_lock(unix_gc_lock), but
      unix_release_sock() reads it locklessly.
      
      Let's use READ_ONCE() for unix_tot_inflight.
      
      Note that the writer side was marked by commit 9d6d7f1c ("af_unix:
      annote lockless accesses to unix_tot_inflight & gc_in_progress")
      
      BUG: KCSAN: data-race in unix_inflight / unix_release_sock
      
      write (marked) to 0xffffffff871852b8 of 4 bytes by task 123 on cpu 1:
       unix_inflight+0x130/0x180 net/unix/scm.c:64
       unix_attach_fds+0x137/0x1b0 net/unix/scm.c:123
       unix_scm_to_skb net/unix/af_unix.c:1832 [inline]
       unix_dgram_sendmsg+0x46a/0x14f0 net/unix/af_unix.c:1955
       sock_sendmsg_nosec net/socket.c:724 [inline]
       sock_sendmsg+0x148/0x160 net/socket.c:747
       ____sys_sendmsg+0x4e4/0x610 net/socket.c:2493
       ___sys_sendmsg+0xc6/0x140 net/socket.c:2547
       __sys_sendmsg+0x94/0x140 net/socket.c:2576
       __do_sys_sendmsg net/socket.c:2585 [inline]
       __se_sys_sendmsg net/socket.c:2583 [inline]
       __x64_sys_sendmsg+0x45/0x50 net/socket.c:2583
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      read to 0xffffffff871852b8 of 4 bytes by task 4891 on cpu 0:
       unix_release_sock+0x608/0x910 net/unix/af_unix.c:671
       unix_release+0x59/0x80 net/unix/af_unix.c:1058
       __sock_release+0x7d/0x170 net/socket.c:653
       sock_close+0x19/0x30 net/socket.c:1385
       __fput+0x179/0x5e0 fs/file_table.c:321
       ____fput+0x15/0x20 fs/file_table.c:349
       task_work_run+0x116/0x1a0 kernel/task_work.c:179
       resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204
       __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
       syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297
       do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      value changed: 0x00000000 -> 0x00000001
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 4891 Comm: systemd-coredum Not tainted 6.4.0-rc5-01219-gfa0e21fa4443 #5
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      
      Fixes: 9305cfa4 ("[AF_UNIX]: Make unix_tot_inflight counter non-atomic")
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ade32bd8
    • Kuniyuki Iwashima's avatar
      af_unix: Fix data-races around user->unix_inflight. · 0bc36c06
      Kuniyuki Iwashima authored
      
      user->unix_inflight is changed under spin_lock(unix_gc_lock),
      but too_many_unix_fds() reads it locklessly.
      
      Let's annotate the write/read accesses to user->unix_inflight.
      
      BUG: KCSAN: data-race in unix_attach_fds / unix_inflight
      
      write to 0xffffffff8546f2d0 of 8 bytes by task 44798 on cpu 1:
       unix_inflight+0x157/0x180 net/unix/scm.c:66
       unix_attach_fds+0x147/0x1e0 net/unix/scm.c:123
       unix_scm_to_skb net/unix/af_unix.c:1827 [inline]
       unix_dgram_sendmsg+0x46a/0x14f0 net/unix/af_unix.c:1950
       unix_seqpacket_sendmsg net/unix/af_unix.c:2308 [inline]
       unix_seqpacket_sendmsg+0xba/0x130 net/unix/af_unix.c:2292
       sock_sendmsg_nosec net/socket.c:725 [inline]
       sock_sendmsg+0x148/0x160 net/socket.c:748
       ____sys_sendmsg+0x4e4/0x610 net/socket.c:2494
       ___sys_sendmsg+0xc6/0x140 net/socket.c:2548
       __sys_sendmsg+0x94/0x140 net/socket.c:2577
       __do_sys_sendmsg net/socket.c:2586 [inline]
       __se_sys_sendmsg net/socket.c:2584 [inline]
       __x64_sys_sendmsg+0x45/0x50 net/socket.c:2584
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      
      read to 0xffffffff8546f2d0 of 8 bytes by task 44814 on cpu 0:
       too_many_unix_fds net/unix/scm.c:101 [inline]
       unix_attach_fds+0x54/0x1e0 net/unix/scm.c:110
       unix_scm_to_skb net/unix/af_unix.c:1827 [inline]
       unix_dgram_sendmsg+0x46a/0x14f0 net/unix/af_unix.c:1950
       unix_seqpacket_sendmsg net/unix/af_unix.c:2308 [inline]
       unix_seqpacket_sendmsg+0xba/0x130 net/unix/af_unix.c:2292
       sock_sendmsg_nosec net/socket.c:725 [inline]
       sock_sendmsg+0x148/0x160 net/socket.c:748
       ____sys_sendmsg+0x4e4/0x610 net/socket.c:2494
       ___sys_sendmsg+0xc6/0x140 net/socket.c:2548
       __sys_sendmsg+0x94/0x140 net/socket.c:2577
       __do_sys_sendmsg net/socket.c:2586 [inline]
       __se_sys_sendmsg net/socket.c:2584 [inline]
       __x64_sys_sendmsg+0x45/0x50 net/socket.c:2584
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x6e/0xd8
      
      value changed: 0x000000000000000c -> 0x000000000000000d
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 44814 Comm: systemd-coredum Not tainted 6.4.0-11989-g6843306689af #6
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      
      Fixes: 712f4aad ("unix: properly account for FDs passed over unix sockets")
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Acked-by: default avatarWilly Tarreau <w@1wt.eu>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0bc36c06
    • John Fastabend's avatar
      bpf, sockmap: Fix skb refcnt race after locking changes · a454d84e
      John Fastabend authored
      
      There is a race where skb's from the sk_psock_backlog can be referenced
      after userspace side has already skb_consumed() the sk_buff and its refcnt
      dropped to zer0 causing use after free.
      
      The flow is the following:
      
        while ((skb = skb_peek(&psock->ingress_skb))
          sk_psock_handle_Skb(psock, skb, ..., ingress)
          if (!ingress) ...
          sk_psock_skb_ingress
             sk_psock_skb_ingress_enqueue(skb)
                msg->skb = skb
                sk_psock_queue_msg(psock, msg)
          skb_dequeue(&psock->ingress_skb)
      
      The sk_psock_queue_msg() puts the msg on the ingress_msg queue. This is
      what the application reads when recvmsg() is called. An application can
      read this anytime after the msg is placed on the queue. The recvmsg hook
      will also read msg->skb and then after user space reads the msg will call
      consume_skb(skb) on it effectively free'ing it.
      
      But, the race is in above where backlog queue still has a reference to
      the skb and calls skb_dequeue(). If the skb_dequeue happens after the
      user reads and free's the skb we have a use after free.
      
      The !ingress case does not suffer from this problem because it uses
      sendmsg_*(sk, msg) which does not pass the sk_buff further down the
      stack.
      
      The following splat was observed with 'test_progs -t sockmap_listen':
      
        [ 1022.710250][ T2556] general protection fault, ...
        [...]
        [ 1022.712830][ T2556] Workqueue: events sk_psock_backlog
        [ 1022.713262][ T2556] RIP: 0010:skb_dequeue+0x4c/0x80
        [ 1022.713653][ T2556] Code: ...
        [...]
        [ 1022.720699][ T2556] Call Trace:
        [ 1022.720984][ T2556]  <TASK>
        [ 1022.721254][ T2556]  ? die_addr+0x32/0x80^M
        [ 1022.721589][ T2556]  ? exc_general_protection+0x25a/0x4b0
        [ 1022.722026][ T2556]  ? asm_exc_general_protection+0x22/0x30
        [ 1022.722489][ T2556]  ? skb_dequeue+0x4c/0x80
        [ 1022.722854][ T2556]  sk_psock_backlog+0x27a/0x300
        [ 1022.723243][ T2556]  process_one_work+0x2a7/0x5b0
        [ 1022.723633][ T2556]  worker_thread+0x4f/0x3a0
        [ 1022.723998][ T2556]  ? __pfx_worker_thread+0x10/0x10
        [ 1022.724386][ T2556]  kthread+0xfd/0x130
        [ 1022.724709][ T2556]  ? __pfx_kthread+0x10/0x10
        [ 1022.725066][ T2556]  ret_from_fork+0x2d/0x50
        [ 1022.725409][ T2556]  ? __pfx_kthread+0x10/0x10
        [ 1022.725799][ T2556]  ret_from_fork_asm+0x1b/0x30
        [ 1022.726201][ T2556]  </TASK>
      
      To fix we add an skb_get() before passing the skb to be enqueued in the
      engress queue. This bumps the skb->users refcnt so that consume_skb()
      and kfree_skb will not immediately free the sk_buff. With this we can
      be sure the skb is still around when we do the dequeue. Then we just
      need to decrement the refcnt or free the skb in the backlog case which
      we do by calling kfree_skb() on the ingress case as well as the sendmsg
      case.
      
      Before locking change from fixes tag we had the sock locked so we
      couldn't race with user and there was no issue here.
      
      Fixes: 799aa7f9 ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
      Reported-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarXu Kuohai <xukuohai@huawei.com>
      Tested-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/bpf/20230901202137.214666-1-john.fastabend@gmail.com
      a454d84e
    • Alex Henrie's avatar
      net: ipv6/addrconf: avoid integer underflow in ipv6_create_tempaddr · f31867d0
      Alex Henrie authored
      
      The existing code incorrectly casted a negative value (the result of a
      subtraction) to an unsigned value without checking. For example, if
      /proc/sys/net/ipv6/conf/*/temp_prefered_lft was set to 1, the preferred
      lifetime would jump to 4 billion seconds. On my machine and network the
      shortest lifetime that avoided underflow was 3 seconds.
      
      Fixes: 76506a98 ("IPv6: fix DESYNC_FACTOR")
      Signed-off-by: default avatarAlex Henrie <alexhenrie24@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f31867d0
    • Eric Dumazet's avatar
      net: deal with integer overflows in kmalloc_reserve() · 915d975b
      Eric Dumazet authored
      
      Blamed commit changed:
          ptr = kmalloc(size);
          if (ptr)
            size = ksize(ptr);
      
      to:
          size = kmalloc_size_roundup(size);
          ptr = kmalloc(size);
      
      This allowed various crash as reported by syzbot [1]
      and Kyle Zeng.
      
      Problem is that if @size is bigger than 0x80000001,
      kmalloc_size_roundup(size) returns 2^32.
      
      kmalloc_reserve() uses a 32bit variable (obj_size),
      so 2^32 is truncated to 0.
      
      kmalloc(0) returns ZERO_SIZE_PTR which is not handled by
      skb allocations.
      
      Following trace can be triggered if a netdev->mtu is set
      close to 0x7fffffff
      
      We might in the future limit netdev->mtu to more sensible
      limit (like KMALLOC_MAX_SIZE).
      
      This patch is based on a syzbot report, and also a report
      and tentative fix from Kyle Zeng.
      
      [1]
      BUG: KASAN: user-memory-access in __build_skb_around net/core/skbuff.c:294 [inline]
      BUG: KASAN: user-memory-access in __alloc_skb+0x3c4/0x6e8 net/core/skbuff.c:527
      Write of size 32 at addr 00000000fffffd10 by task syz-executor.4/22554
      
      CPU: 1 PID: 22554 Comm: syz-executor.4 Not tainted 6.1.39-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
      Call trace:
      dump_backtrace+0x1c8/0x1f4 arch/arm64/kernel/stacktrace.c:279
      show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:286
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0x120/0x1a0 lib/dump_stack.c:106
      print_report+0xe4/0x4b4 mm/kasan/report.c:398
      kasan_report+0x150/0x1ac mm/kasan/report.c:495
      kasan_check_range+0x264/0x2a4 mm/kasan/generic.c:189
      memset+0x40/0x70 mm/kasan/shadow.c:44
      __build_skb_around net/core/skbuff.c:294 [inline]
      __alloc_skb+0x3c4/0x6e8 net/core/skbuff.c:527
      alloc_skb include/linux/skbuff.h:1316 [inline]
      igmpv3_newpack+0x104/0x1088 net/ipv4/igmp.c:359
      add_grec+0x81c/0x1124 net/ipv4/igmp.c:534
      igmpv3_send_cr net/ipv4/igmp.c:667 [inline]
      igmp_ifc_timer_expire+0x1b0/0x1008 net/ipv4/igmp.c:810
      call_timer_fn+0x1c0/0x9f0 kernel/time/timer.c:1474
      expire_timers kernel/time/timer.c:1519 [inline]
      __run_timers+0x54c/0x710 kernel/time/timer.c:1790
      run_timer_softirq+0x28/0x4c kernel/time/timer.c:1803
      _stext+0x380/0xfbc
      ____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:79
      call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:891
      do_softirq_own_stack+0x20/0x2c arch/arm64/kernel/irq.c:84
      invoke_softirq kernel/softirq.c:437 [inline]
      __irq_exit_rcu+0x1c0/0x4cc kernel/softirq.c:683
      irq_exit_rcu+0x14/0x78 kernel/softirq.c:695
      el0_interrupt+0x7c/0x2e0 arch/arm64/kernel/entry-common.c:717
      __el0_irq_handler_common+0x18/0x24 arch/arm64/kernel/entry-common.c:724
      el0t_64_irq_handler+0x10/0x1c arch/arm64/kernel/entry-common.c:729
      el0t_64_irq+0x1a0/0x1a4 arch/arm64/kernel/entry.S:584
      
      Fixes: 12d6c1d3 ("skbuff: Proactively round up to kmalloc bucket size")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reported-by: default avatarKyle Zeng <zengyhkyle@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      915d975b
  10. Sep 01, 2023
    • Sriram Yagnaraman's avatar
      ipv6: ignore dst hint for multipath routes · 8423be89
      Sriram Yagnaraman authored
      
      Route hints when the nexthop is part of a multipath group causes packets
      in the same receive batch to be sent to the same nexthop irrespective of
      the multipath hash of the packet. So, do not extract route hint for
      packets whose destination is part of a multipath group.
      
      A new SKB flag IP6SKB_MULTIPATH is introduced for this purpose, set the
      flag when route is looked up in fib6_select_path() and use it in
      ip6_can_use_hint() to check for the existence of the flag.
      
      Fixes: 197dbf24 ("ipv6: introduce and uses route look hints for list input.")
      Signed-off-by: default avatarSriram Yagnaraman <sriram.yagnaraman@est.tech>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8423be89
    • Sriram Yagnaraman's avatar
      ipv4: ignore dst hint for multipath routes · 6ac66cb0
      Sriram Yagnaraman authored
      
      Route hints when the nexthop is part of a multipath group causes packets
      in the same receive batch to be sent to the same nexthop irrespective of
      the multipath hash of the packet. So, do not extract route hint for
      packets whose destination is part of a multipath group.
      
      A new SKB flag IPSKB_MULTIPATH is introduced for this purpose, set the
      flag when route is looked up in ip_mkroute_input() and use it in
      ip_extract_route_hint() to check for the existence of the flag.
      
      Fixes: 02b24941 ("ipv4: use dst hint for ipv4 list receive")
      Signed-off-by: default avatarSriram Yagnaraman <sriram.yagnaraman@est.tech>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ac66cb0
    • Mohamed Khalfella's avatar
      skbuff: skb_segment, Call zero copy functions before using skbuff frags · 2ea35288
      Mohamed Khalfella authored
      
      Commit bf5c25d6 ("skbuff: in skb_segment, call zerocopy functions
      once per nskb") added the call to zero copy functions in skb_segment().
      The change introduced a bug in skb_segment() because skb_orphan_frags()
      may possibly change the number of fragments or allocate new fragments
      altogether leaving nrfrags and frag to point to the old values. This can
      cause a panic with stacktrace like the one below.
      
      [  193.894380] BUG: kernel NULL pointer dereference, address: 00000000000000bc
      [  193.895273] CPU: 13 PID: 18164 Comm: vh-net-17428 Kdump: loaded Tainted: G           O      5.15.123+ #26
      [  193.903919] RIP: 0010:skb_segment+0xb0e/0x12f0
      [  194.021892] Call Trace:
      [  194.027422]  <TASK>
      [  194.072861]  tcp_gso_segment+0x107/0x540
      [  194.082031]  inet_gso_segment+0x15c/0x3d0
      [  194.090783]  skb_mac_gso_segment+0x9f/0x110
      [  194.095016]  __skb_gso_segment+0xc1/0x190
      [  194.103131]  netem_enqueue+0x290/0xb10 [sch_netem]
      [  194.107071]  dev_qdisc_enqueue+0x16/0x70
      [  194.110884]  __dev_queue_xmit+0x63b/0xb30
      [  194.121670]  bond_start_xmit+0x159/0x380 [bonding]
      [  194.128506]  dev_hard_start_xmit+0xc3/0x1e0
      [  194.131787]  __dev_queue_xmit+0x8a0/0xb30
      [  194.138225]  macvlan_start_xmit+0x4f/0x100 [macvlan]
      [  194.141477]  dev_hard_start_xmit+0xc3/0x1e0
      [  194.144622]  sch_direct_xmit+0xe3/0x280
      [  194.147748]  __dev_queue_xmit+0x54a/0xb30
      [  194.154131]  tap_get_user+0x2a8/0x9c0 [tap]
      [  194.157358]  tap_sendmsg+0x52/0x8e0 [tap]
      [  194.167049]  handle_tx_zerocopy+0x14e/0x4c0 [vhost_net]
      [  194.173631]  handle_tx+0xcd/0xe0 [vhost_net]
      [  194.176959]  vhost_worker+0x76/0xb0 [vhost]
      [  194.183667]  kthread+0x118/0x140
      [  194.190358]  ret_from_fork+0x1f/0x30
      [  194.193670]  </TASK>
      
      In this case calling skb_orphan_frags() updated nr_frags leaving nrfrags
      local variable in skb_segment() stale. This resulted in the code hitting
      i >= nrfrags prematurely and trying to move to next frag_skb using
      list_skb pointer, which was NULL, and caused kernel panic. Move the call
      to zero copy functions before using frags and nr_frags.
      
      Fixes: bf5c25d6 ("skbuff: in skb_segment, call zerocopy functions once per nskb")
      Signed-off-by: default avatarMohamed Khalfella <mkhalfella@purestorage.com>
      Reported-by: default avatarAmit Goyal <agoyal@purestorage.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ea35288
    • Eric Dumazet's avatar
      net: annotate data-races around sk->sk_bind_phc · 251cd405
      Eric Dumazet authored
      
      sk->sk_bind_phc is read locklessly. Add corresponding annotations.
      
      Fixes: d463126e ("net: sock: extend SO_TIMESTAMPING for PHC binding")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yangbo Lu <yangbo.lu@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      251cd405
    • Eric Dumazet's avatar
      net: annotate data-races around sk->sk_tsflags · e3390b30
      Eric Dumazet authored
      
      sk->sk_tsflags can be read locklessly, add corresponding annotations.
      
      Fixes: b9f40e21 ("net-timestamp: move timestamp flags out of sk_flags")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3390b30
    • Eric Dumazet's avatar
      mptcp: annotate data-races around msk->rmem_fwd_alloc · 9531e4a8
      Eric Dumazet authored
      
      msk->rmem_fwd_alloc can be read locklessly.
      
      Add mptcp_rmem_fwd_alloc_add(), similar to sk_forward_alloc_add(),
      and appropriate READ_ONCE()/WRITE_ONCE() annotations.
      
      Fixes: 6511882c ("mptcp: allocate fwd memory separately on the rx and tx path")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9531e4a8
    • Eric Dumazet's avatar
      net: annotate data-races around sk->sk_forward_alloc · 5e6300e7
      Eric Dumazet authored
      
      Every time sk->sk_forward_alloc is read locklessly,
      add a READ_ONCE().
      
      Add sk_forward_alloc_add() helper to centralize updates,
      to reduce number of WRITE_ONCE().
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e6300e7
    • Eric Dumazet's avatar
      net: use sk_forward_alloc_get() in sk_get_meminfo() · 66d58f04
      Eric Dumazet authored
      
      inet_sk_diag_fill() has been changed to use sk_forward_alloc_get(),
      but sk_get_meminfo() was forgotten.
      
      Fixes: 292e6077 ("net: introduce sk_forward_alloc_get()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66d58f04
    • Eric Dumazet's avatar
      net/handshake: fix null-ptr-deref in handshake_nl_done_doit() · 82ba0ff7
      Eric Dumazet authored
      
      We should not call trace_handshake_cmd_done_err() if socket lookup has failed.
      
      Also we should call trace_handshake_cmd_done_err() before releasing the file,
      otherwise dereferencing sock->sk can return garbage.
      
      This also reverts 7afc6d0a ("net/handshake: Fix uninitialized local variable")
      
      Unable to handle kernel paging request at virtual address dfff800000000003
      KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
      Mem abort info:
      ESR = 0x0000000096000005
      EC = 0x25: DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
      FSC = 0x05: level 1 translation fault
      Data abort info:
      ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
      CM = 0, WnR = 0, TnD = 0, TagAccess = 0
      GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
      [dfff800000000003] address between user and kernel address ranges
      Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 1 PID: 5986 Comm: syz-executor292 Not tainted 6.5.0-rc7-syzkaller-gfe4469582053 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
      pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      pc : handshake_nl_done_doit+0x198/0x9c8 net/handshake/netlink.c:193
      lr : handshake_nl_done_doit+0x180/0x9c8
      sp : ffff800096e37180
      x29: ffff800096e37200 x28: 1ffff00012dc6e34 x27: dfff800000000000
      x26: ffff800096e373d0 x25: 0000000000000000 x24: 00000000ffffffa8
      x23: ffff800096e373f0 x22: 1ffff00012dc6e38 x21: 0000000000000000
      x20: ffff800096e371c0 x19: 0000000000000018 x18: 0000000000000000
      x17: 0000000000000000 x16: ffff800080516cc4 x15: 0000000000000001
      x14: 1fffe0001b14aa3b x13: 0000000000000000 x12: 0000000000000000
      x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000003
      x8 : 0000000000000003 x7 : ffff800080afe47c x6 : 0000000000000000
      x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff800080a88078
      x2 : 0000000000000001 x1 : 00000000ffffffa8 x0 : 0000000000000000
      Call trace:
      handshake_nl_done_doit+0x198/0x9c8 net/handshake/netlink.c:193
      genl_family_rcv_msg_doit net/netlink/genetlink.c:970 [inline]
      genl_family_rcv_msg net/netlink/genetlink.c:1050 [inline]
      genl_rcv_msg+0x96c/0xc50 net/netlink/genetlink.c:1067
      netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2549
      genl_rcv+0x38/0x50 net/netlink/genetlink.c:1078
      netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
      netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365
      netlink_sendmsg+0x834/0xb18 net/netlink/af_netlink.c:1914
      sock_sendmsg_nosec net/socket.c:725 [inline]
      sock_sendmsg net/socket.c:748 [inline]
      ____sys_sendmsg+0x56c/0x840 net/socket.c:2494
      ___sys_sendmsg net/socket.c:2548 [inline]
      __sys_sendmsg+0x26c/0x33c net/socket.c:2577
      __do_sys_sendmsg net/socket.c:2586 [inline]
      __se_sys_sendmsg net/socket.c:2584 [inline]
      __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2584
      __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
      invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51
      el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136
      do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155
      el0_svc+0x58/0x16c arch/arm64/kernel/entry-common.c:678
      el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:696
      el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
      Code: 12800108 b90043e8 910062b3 d343fe68 (387b6908)
      
      Fixes: 3b3009ea ("net/handshake: Create a NETLINK service for handling handshake requests")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82ba0ff7
  11. Aug 31, 2023
    • Magnus Karlsson's avatar
      xsk: Fix xsk_diag use-after-free error during socket cleanup · 3e019d8a
      Magnus Karlsson authored
      
      Fix a use-after-free error that is possible if the xsk_diag interface
      is used after the socket has been unbound from the device. This can
      happen either due to the socket being closed or the device
      disappearing. In the early days of AF_XDP, the way we tested that a
      socket was not bound to a device was to simply check if the netdevice
      pointer in the xsk socket structure was NULL. Later, a better system
      was introduced by having an explicit state variable in the xsk socket
      struct. For example, the state of a socket that is on the way to being
      closed and has been unbound from the device is XSK_UNBOUND.
      
      The commit in the Fixes tag below deleted the old way of signalling
      that a socket is unbound, setting dev to NULL. This in the belief that
      all code using the old way had been exterminated. That was
      unfortunately not true as the xsk diagnostics code was still using the
      old way and thus does not work as intended when a socket is going
      down. Fix this by introducing a test against the state variable. If
      the socket is in the state XSK_UNBOUND, simply abort the diagnostic's
      netlink operation.
      
      Fixes: 18b1ab7a ("xsk: Fix race at socket teardown")
      Reported-by: default avatar <syzbot+822d1359297e2694f873@syzkaller.appspotmail.com>
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatar <syzbot+822d1359297e2694f873@syzkaller.appspotmail.com>
      Tested-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Reviewed-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Link: https://lore.kernel.org/bpf/20230831100119.17408-1-magnus.karlsson@gmail.com
      3e019d8a
    • Eric Dumazet's avatar
      net: read sk->sk_family once in sk_mc_loop() · a3e0fdf7
      Eric Dumazet authored
      
      syzbot is playing with IPV6_ADDRFORM quite a lot these days,
      and managed to hit the WARN_ON_ONCE(1) in sk_mc_loop()
      
      We have many more similar issues to fix.
      
      WARNING: CPU: 1 PID: 1593 at net/core/sock.c:782 sk_mc_loop+0x165/0x260
      Modules linked in:
      CPU: 1 PID: 1593 Comm: kworker/1:3 Not tainted 6.1.40-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
      Workqueue: events_power_efficient gc_worker
      RIP: 0010:sk_mc_loop+0x165/0x260 net/core/sock.c:782
      Code: 34 1b fd 49 81 c7 18 05 00 00 4c 89 f8 48 c1 e8 03 42 80 3c 20 00 74 08 4c 89 ff e8 25 36 6d fd 4d 8b 37 eb 13 e8 db 33 1b fd <0f> 0b b3 01 eb 34 e8 d0 33 1b fd 45 31 f6 49 83 c6 38 4c 89 f0 48
      RSP: 0018:ffffc90000388530 EFLAGS: 00010246
      RAX: ffffffff846d9b55 RBX: 0000000000000011 RCX: ffff88814f884980
      RDX: 0000000000000102 RSI: ffffffff87ae5160 RDI: 0000000000000011
      RBP: ffffc90000388550 R08: 0000000000000003 R09: ffffffff846d9a65
      R10: 0000000000000002 R11: ffff88814f884980 R12: dffffc0000000000
      R13: ffff88810dbee000 R14: 0000000000000010 R15: ffff888150084000
      FS: 0000000000000000(0000) GS:ffff8881f6b00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000180 CR3: 000000014ee5b000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <IRQ>
      [<ffffffff8507734f>] ip6_finish_output2+0x33f/0x1ae0 net/ipv6/ip6_output.c:83
      [<ffffffff85062766>] __ip6_finish_output net/ipv6/ip6_output.c:200 [inline]
      [<ffffffff85062766>] ip6_finish_output+0x6c6/0xb10 net/ipv6/ip6_output.c:211
      [<ffffffff85061f8c>] NF_HOOK_COND include/linux/netfilter.h:298 [inline]
      [<ffffffff85061f8c>] ip6_output+0x2bc/0x3d0 net/ipv6/ip6_output.c:232
      [<ffffffff852071cf>] dst_output include/net/dst.h:444 [inline]
      [<ffffffff852071cf>] ip6_local_out+0x10f/0x140 net/ipv6/output_core.c:161
      [<ffffffff83618fb4>] ipvlan_process_v6_outbound drivers/net/ipvlan/ipvlan_core.c:483 [inline]
      [<ffffffff83618fb4>] ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:529 [inline]
      [<ffffffff83618fb4>] ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:602 [inline]
      [<ffffffff83618fb4>] ipvlan_queue_xmit+0x1174/0x1be0 drivers/net/ipvlan/ipvlan_core.c:677
      [<ffffffff8361ddd9>] ipvlan_start_xmit+0x49/0x100 drivers/net/ipvlan/ipvlan_main.c:229
      [<ffffffff84763fc0>] netdev_start_xmit include/linux/netdevice.h:4925 [inline]
      [<ffffffff84763fc0>] xmit_one net/core/dev.c:3644 [inline]
      [<ffffffff84763fc0>] dev_hard_start_xmit+0x320/0x980 net/core/dev.c:3660
      [<ffffffff8494c650>] sch_direct_xmit+0x2a0/0x9c0 net/sched/sch_generic.c:342
      [<ffffffff8494d883>] qdisc_restart net/sched/sch_generic.c:407 [inline]
      [<ffffffff8494d883>] __qdisc_run+0xb13/0x1e70 net/sched/sch_generic.c:415
      [<ffffffff8478c426>] qdisc_run+0xd6/0x260 include/net/pkt_sched.h:125
      [<ffffffff84796eac>] net_tx_action+0x7ac/0x940 net/core/dev.c:5247
      [<ffffffff858002bd>] __do_softirq+0x2bd/0x9bd kernel/softirq.c:599
      [<ffffffff814c3fe8>] invoke_softirq kernel/softirq.c:430 [inline]
      [<ffffffff814c3fe8>] __irq_exit_rcu+0xc8/0x170 kernel/softirq.c:683
      [<ffffffff814c3f09>] irq_exit_rcu+0x9/0x20 kernel/softirq.c:695
      
      Fixes: 7ad6848c ("ip: fix mc_loop checks for tunnels with multicast outer addresses")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230830101244.1146934-1-edumazet@google.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a3e0fdf7
    • Eric Dumazet's avatar
      ipv4: annotate data-races around fi->fib_dead · fce92af1
      Eric Dumazet authored
      
      syzbot complained about a data-race in fib_table_lookup() [1]
      
      Add appropriate annotations to document it.
      
      [1]
      BUG: KCSAN: data-race in fib_release_info / fib_table_lookup
      
      write to 0xffff888150f31744 of 1 bytes by task 1189 on cpu 0:
      fib_release_info+0x3a0/0x460 net/ipv4/fib_semantics.c:281
      fib_table_delete+0x8d2/0x900 net/ipv4/fib_trie.c:1777
      fib_magic+0x1c1/0x1f0 net/ipv4/fib_frontend.c:1106
      fib_del_ifaddr+0x8cf/0xa60 net/ipv4/fib_frontend.c:1317
      fib_inetaddr_event+0x77/0x200 net/ipv4/fib_frontend.c:1448
      notifier_call_chain kernel/notifier.c:93 [inline]
      blocking_notifier_call_chain+0x90/0x200 kernel/notifier.c:388
      __inet_del_ifa+0x4df/0x800 net/ipv4/devinet.c:432
      inet_del_ifa net/ipv4/devinet.c:469 [inline]
      inetdev_destroy net/ipv4/devinet.c:322 [inline]
      inetdev_event+0x553/0xaf0 net/ipv4/devinet.c:1606
      notifier_call_chain kernel/notifier.c:93 [inline]
      raw_notifier_call_chain+0x6b/0x1c0 kernel/notifier.c:461
      call_netdevice_notifiers_info net/core/dev.c:1962 [inline]
      call_netdevice_notifiers_mtu+0xd2/0x130 net/core/dev.c:2037
      dev_set_mtu_ext+0x30b/0x3e0 net/core/dev.c:8673
      do_setlink+0x5be/0x2430 net/core/rtnetlink.c:2837
      rtnl_setlink+0x255/0x300 net/core/rtnetlink.c:3177
      rtnetlink_rcv_msg+0x807/0x8c0 net/core/rtnetlink.c:6445
      netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2549
      rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6463
      netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
      netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
      netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1914
      sock_sendmsg_nosec net/socket.c:725 [inline]
      sock_sendmsg net/socket.c:748 [inline]
      sock_write_iter+0x1aa/0x230 net/socket.c:1129
      do_iter_write+0x4b4/0x7b0 fs/read_write.c:860
      vfs_writev+0x1a8/0x320 fs/read_write.c:933
      do_writev+0xf8/0x220 fs/read_write.c:976
      __do_sys_writev fs/read_write.c:1049 [inline]
      __se_sys_writev fs/read_write.c:1046 [inline]
      __x64_sys_writev+0x45/0x50 fs/read_write.c:1046
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffff888150f31744 of 1 bytes by task 21839 on cpu 1:
      fib_table_lookup+0x2bf/0xd50 net/ipv4/fib_trie.c:1585
      fib_lookup include/net/ip_fib.h:383 [inline]
      ip_route_output_key_hash_rcu+0x38c/0x12c0 net/ipv4/route.c:2751
      ip_route_output_key_hash net/ipv4/route.c:2641 [inline]
      __ip_route_output_key include/net/route.h:134 [inline]
      ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2869
      send4+0x1e7/0x500 drivers/net/wireguard/socket.c:61
      wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175
      wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200
      wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
      wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51
      process_one_work+0x434/0x860 kernel/workqueue.c:2600
      worker_thread+0x5f2/0xa10 kernel/workqueue.c:2751
      kthread+0x1d7/0x210 kernel/kthread.c:389
      ret_from_fork+0x2e/0x40 arch/x86/kernel/process.c:145
      ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
      
      value changed: 0x00 -> 0x01
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 21839 Comm: kworker/u4:18 Tainted: G W 6.5.0-syzkaller #0
      
      Fixes: dccd9ecc ("ipv4: Do not use dead fib_info entries.")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20230830095520.1046984-1-edumazet@google.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      fce92af1
    • Eric Dumazet's avatar
      sctp: annotate data-races around sk->sk_wmem_queued · dc9511dd
      Eric Dumazet authored
      
      sk->sk_wmem_queued can be read locklessly from sctp_poll()
      
      Use sk_wmem_queued_add() when the field is changed,
      and add READ_ONCE() annotations in sctp_writeable()
      and sctp_assocs_seq_show()
      
      syzbot reported:
      
      BUG: KCSAN: data-race in sctp_poll / sctp_wfree
      
      read-write to 0xffff888149d77810 of 4 bytes by interrupt on cpu 0:
      sctp_wfree+0x170/0x4a0 net/sctp/socket.c:9147
      skb_release_head_state+0xb7/0x1a0 net/core/skbuff.c:988
      skb_release_all net/core/skbuff.c:1000 [inline]
      __kfree_skb+0x16/0x140 net/core/skbuff.c:1016
      consume_skb+0x57/0x180 net/core/skbuff.c:1232
      sctp_chunk_destroy net/sctp/sm_make_chunk.c:1503 [inline]
      sctp_chunk_put+0xcd/0x130 net/sctp/sm_make_chunk.c:1530
      sctp_datamsg_put+0x29a/0x300 net/sctp/chunk.c:128
      sctp_chunk_free+0x34/0x50 net/sctp/sm_make_chunk.c:1515
      sctp_outq_sack+0xafa/0xd70 net/sctp/outqueue.c:1381
      sctp_cmd_process_sack net/sctp/sm_sideeffect.c:834 [inline]
      sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1366 [inline]
      sctp_side_effects net/sctp/sm_sideeffect.c:1198 [inline]
      sctp_do_sm+0x12c7/0x31b0 net/sctp/sm_sideeffect.c:1169
      sctp_assoc_bh_rcv+0x2b2/0x430 net/sctp/associola.c:1051
      sctp_inq_push+0x108/0x120 net/sctp/inqueue.c:80
      sctp_rcv+0x116e/0x1340 net/sctp/input.c:243
      sctp6_rcv+0x25/0x40 net/sctp/ipv6.c:1120
      ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437
      ip6_input_finish net/ipv6/ip6_input.c:482 [inline]
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491
      dst_input include/net/dst.h:468 [inline]
      ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309
      __netif_receive_skb_one_core net/core/dev.c:5452 [inline]
      __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566
      process_backlog+0x21f/0x380 net/core/dev.c:5894
      __napi_poll+0x60/0x3b0 net/core/dev.c:6460
      napi_poll net/core/dev.c:6527 [inline]
      net_rx_action+0x32b/0x750 net/core/dev.c:6660
      __do_softirq+0xc1/0x265 kernel/softirq.c:553
      run_ksoftirqd+0x17/0x20 kernel/softirq.c:921
      smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164
      kthread+0x1d7/0x210 kernel/kthread.c:389
      ret_from_fork+0x2e/0x40 arch/x86/kernel/process.c:145
      ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
      
      read to 0xffff888149d77810 of 4 bytes by task 17828 on cpu 1:
      sctp_writeable net/sctp/socket.c:9304 [inline]
      sctp_poll+0x265/0x410 net/sctp/socket.c:8671
      sock_poll+0x253/0x270 net/socket.c:1374
      vfs_poll include/linux/poll.h:88 [inline]
      do_pollfd fs/select.c:873 [inline]
      do_poll fs/select.c:921 [inline]
      do_sys_poll+0x636/0xc00 fs/select.c:1015
      __do_sys_ppoll fs/select.c:1121 [inline]
      __se_sys_ppoll+0x1af/0x1f0 fs/select.c:1101
      __x64_sys_ppoll+0x67/0x80 fs/select.c:1101
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x00019e80 -> 0x0000cc80
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 17828 Comm: syz-executor.1 Not tainted 6.5.0-rc7-syzkaller-00185-g28f20a19294d #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarXin Long <lucien.xin@gmail.com>
      Link: https://lore.kernel.org/r/20230830094519.950007-1-edumazet@google.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      dc9511dd
    • Eric Dumazet's avatar
      net/sched: fq_pie: avoid stalls in fq_pie_timer() · 8c21ab1b
      Eric Dumazet authored
      When setting a high number of flows (limit being 65536),
      fq_pie_timer() is currently using too much time as syzbot reported.
      
      Add logic to yield the cpu every 2048 flows (less than 150 usec
      on debug kernels).
      It should also help by not blocking qdisc fast paths for too long.
      Worst case (65536 flows) would need 31 jiffies for a complete scan.
      
      Relevant extract from syzbot report:
      
      rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-.... } 2663 jiffies s: 873 root: 0x1/.
      rcu: blocking rcu_node structures (internal RCU debug):
      Sending NMI from CPU 1 to CPUs 0:
      NMI backtrace for cpu 0
      CPU: 0 PID: 5177 Comm: syz-executor273 Not tainted 6.5.0-syzkaller-00453-g727dbda16b83 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
      RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline]
      RIP: 0010:write_comp_data+0x21/0x90 kernel/kcov.c:236
      Code: 2e 0f 1f 84 00 00 00 00 00 65 8b 05 01 b2 7d 7e 49 89 f1 89 c6 49 89 d2 81 e6 00 01 00 00 49 89 f8 65 48 8b 14 25 80 b9 03 00 <a9> 00 01 ff 00 74 0e 85 f6 74 59 8b 82 04 16 00 00 85 c0 74 4f 8b
      RSP: 0018:ffffc90000007bb8 EFLAGS: 00000206
      RAX: 0000000000000101 RBX: ffffc9000dc0d140 RCX: ffffffff885893b0
      RDX: ffff88807c075940 RSI: 0000000000000100 RDI: 0000000000000001
      RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9000dc0d178
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000555555d54380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f6b442f6130 CR3: 000000006fe1c000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <NMI>
       </NMI>
       <IRQ>
       pie_calculate_probability+0x480/0x850 net/sched/sch_pie.c:415
       fq_pie_timer+0x1da/0x4f0 net/sched/sch_fq_pie.c:387
       call_timer_fn+0x1a0/0x580 kernel/time/timer.c:1700
      
      Fixes: ec97ecf1 ("net: sched: add Flow Queue PIE packet scheduler")
      Link: https://lore.kernel.org/lkml/00000000000017ad3f06040bf394@google.com/
      
      
      Reported-by: default avatar <syzbot+e46fbd5289363464bc13@syzkaller.appspotmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarMichal Kubiak <michal.kubiak@intel.com>
      Reviewed-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Link: https://lore.kernel.org/r/20230829123541.3745013-1-edumazet@google.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8c21ab1b
Loading