Skip to content
Snippets Groups Projects
  1. Apr 08, 2020
  2. Apr 06, 2020
    • Alexander Aring's avatar
      ipv6: rpl: fix loop iteration · a7f9a6f4
      Alexander Aring authored
      
      This patch fix the loop iteration by not walking over the last
      iteration. The cmpri compressing value exempt the last segment. As the
      code shows the last iteration will be overwritten by cmpre value
      handling which is for the last segment.
      
      I think this doesn't end in any bufferoverflows because we work on worst
      case temporary buffer sizes but it ends in not best compression settings
      in some cases.
      
      Fixes: 8610c7c6 ("net: ipv6: add support for rpl sr exthdr")
      Signed-off-by: default avatarAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7f9a6f4
  3. Apr 04, 2020
  4. Apr 03, 2020
    • Geliang Tang's avatar
      mptcp: add some missing pr_fmt defines · c85adced
      Geliang Tang authored
      
      Some of the mptcp logs didn't print out the format string:
      
      [  185.651493] DSS
      [  185.651494] data_fin=0 dsn64=0 use_map=0 ack64=1 use_ack=1
      [  185.651494] data_ack=13792750332298763796
      [  185.651495] MPTCP: msk=00000000c4b81cfc ssk=000000009743af53 data_avail=0 skb=0000000063dc595d
      [  185.651495] MPTCP: msk=00000000c4b81cfc ssk=000000009743af53 status=0
      [  185.651495] MPTCP: msk ack_seq=9bbc894565aa2f9a subflow ack_seq=9bbc894565aa2f9a
      [  185.651496] MPTCP: msk=00000000c4b81cfc ssk=000000009743af53 data_avail=1 skb=0000000012e809e1
      
      So this patch added these missing pr_fmt defines. Then we can get the same
      format string "MPTCP" in all mptcp logs like this:
      
      [  142.795829] MPTCP: DSS
      [  142.795829] MPTCP: data_fin=0 dsn64=0 use_map=0 ack64=1 use_ack=1
      [  142.795829] MPTCP: data_ack=8089704603109242421
      [  142.795830] MPTCP: msk=00000000133a24e0 ssk=000000002e508c64 data_avail=0 skb=00000000d5f230df
      [  142.795830] MPTCP: msk=00000000133a24e0 ssk=000000002e508c64 status=0
      [  142.795831] MPTCP: msk ack_seq=66790290f1199d9b subflow ack_seq=66790290f1199d9b
      [  142.795831] MPTCP: msk=00000000133a24e0 ssk=000000002e508c64 data_avail=1 skb=00000000de5aca2e
      
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c85adced
    • Cong Wang's avatar
      net_sched: fix a missing refcnt in tcindex_init() · a8eab6d3
      Cong Wang authored
      
      The initial refcnt of struct tcindex_data should be 1,
      it is clear that I forgot to set it to 1 in tcindex_init().
      This leads to a dec-after-zero warning.
      
      Reported-by: default avatar <syzbot+8325e509a1bf83ec741d@syzkaller.appspotmail.com>
      Fixes: 304e0242 ("net_sched: add a temporary refcnt for struct tcindex_data")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8eab6d3
    • Hangbin Liu's avatar
      neigh: support smaller retrans_time settting · 19e16d22
      Hangbin Liu authored
      
      Currently, we limited the retrans_time to be greater than HZ/2. i.e.
      setting retrans_time less than 500ms will not work. This makes the user
      unable to achieve a more accurate control for bonding arp fast failover.
      
      Update the sanity check to HZ/100, which is 10ms, to let users have more
      ability on the retrans_time control.
      
      v3: sync the behavior with IPv6 and update all the timer handler
      v2: use HZ instead of hard code number
      
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19e16d22
    • Tonghao Zhang's avatar
      net: openvswitch: use hlist_for_each_entry_rcu instead of hlist_for_each_entry · 64948427
      Tonghao Zhang authored
      
      The struct sw_flow is protected by RCU, when traversing them,
      use hlist_for_each_entry_rcu.
      
      Signed-off-by: default avatarTonghao Zhang <xiangxia.m.yue@gmail.com>
      Tested-by: default avatarGreg Rose <gvrose8192@gmail.com>
      Reviewed-by: default avatarGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      64948427
    • Vincent Bernat's avatar
      net: core: enable SO_BINDTODEVICE for non-root users · c427bfec
      Vincent Bernat authored
      
      Currently, SO_BINDTODEVICE requires CAP_NET_RAW. This change allows a
      non-root user to bind a socket to an interface if it is not already
      bound. This is useful to allow an application to bind itself to a
      specific VRF for outgoing or incoming connections. Currently, an
      application wanting to manage connections through several VRF need to
      be privileged.
      
      Previously, IP_UNICAST_IF and IPV6_UNICAST_IF were added for
      Wine (76e21053 and c4062dfc) specifically for use by
      non-root processes. However, they are restricted to sendmsg() and not
      usable with TCP. Allowing SO_BINDTODEVICE would allow TCP clients to
      get the same privilege. As for TCP servers, outside the VRF use case,
      SO_BINDTODEVICE would only further restrict connections a server could
      accept.
      
      When an application is restricted to a VRF (with `ip vrf exec`), the
      socket is bound to an interface at creation and therefore, a
      non-privileged call to SO_BINDTODEVICE to escape the VRF fails.
      
      When an application bound a socket to SO_BINDTODEVICE and transmit it
      to a non-privileged process through a Unix socket, a tentative to
      change the bound device also fails.
      
      Before:
      
          >>> import socket
          >>> s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
          >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
          Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
          PermissionError: [Errno 1] Operation not permitted
      
      After:
      
          >>> import socket
          >>> s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
          >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
          >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
          Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
          PermissionError: [Errno 1] Operation not permitted
      
      Signed-off-by: default avatarVincent Bernat <vincent@bernat.ch>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c427bfec
  5. Apr 02, 2020
  6. Apr 01, 2020
    • Jarod Wilson's avatar
      ipv6: don't auto-add link-local address to lag ports · 744fdc82
      Jarod Wilson authored
      
      Bonding slave and team port devices should not have link-local addresses
      automatically added to them, as it can interfere with openvswitch being
      able to properly add tc ingress.
      
      Basic reproducer, courtesy of Marcelo:
      
      $ ip link add name bond0 type bond
      $ ip link set dev ens2f0np0 master bond0
      $ ip link set dev ens2f1np2 master bond0
      $ ip link set dev bond0 up
      $ ip a s
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
      group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc
      mq master bond0 state UP group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
      5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc
      mq master bond0 state DOWN group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
      11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc
      noqueue state UP group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
          inet6 fe80::20f:53ff:fe2f:ea40/64 scope link
             valid_lft forever preferred_lft forever
      
      (above trimmed to relevant entries, obviously)
      
      $ sysctl net.ipv6.conf.ens2f0np0.addr_gen_mode=0
      net.ipv6.conf.ens2f0np0.addr_gen_mode = 0
      $ sysctl net.ipv6.conf.ens2f1np2.addr_gen_mode=0
      net.ipv6.conf.ens2f1np2.addr_gen_mode = 0
      
      $ ip a l ens2f0np0
      2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc
      mq master bond0 state UP group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
          inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative
             valid_lft forever preferred_lft forever
      $ ip a l ens2f1np2
      5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc
      mq master bond0 state DOWN group default qlen 1000
          link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
          inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative
             valid_lft forever preferred_lft forever
      
      Looks like addrconf_sysctl_addr_gen_mode() bypasses the original "is
      this a slave interface?" check added by commit c2edacf8, and
      results in an address getting added, while w/the proposed patch added,
      no address gets added. This simply adds the same gating check to another
      code path, and thus should prevent the same devices from erroneously
      obtaining an ipv6 link-local address.
      
      Fixes: d35a00b8 ("net/ipv6: allow sysctl to change link-local address generation mode")
      Reported-by: default avatarMoshe Levi <moshele@mellanox.com>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      CC: Marcelo Ricardo Leitner <mleitner@redhat.com>
      CC: netdev@vger.kernel.org
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      744fdc82
    • Cong Wang's avatar
      net_sched: add a temporary refcnt for struct tcindex_data · 304e0242
      Cong Wang authored
      
      Although we intentionally use an ordered workqueue for all tc
      filter works, the ordering is not guaranteed by RCU work,
      given that tcf_queue_work() is esstenially a call_rcu().
      
      This problem is demostrated by Thomas:
      
        CPU 0:
          tcf_queue_work()
            tcf_queue_work(&r->rwork, tcindex_destroy_rexts_work);
      
        -> Migration to CPU 1
      
        CPU 1:
           tcf_queue_work(&p->rwork, tcindex_destroy_work);
      
      so the 2nd work could be queued before the 1st one, which leads
      to a free-after-free.
      
      Enforcing this order in RCU work is hard as it requires to change
      RCU code too. Fortunately we can workaround this problem in tcindex
      filter by taking a temporary refcnt, we only refcnt it right before
      we begin to destroy it. This simplifies the code a lot as a full
      refcnt requires much more changes in tcindex_set_parms().
      
      Reported-by: default avatar <syzbot+46f513c3033d592409d2@syzkaller.appspotmail.com>
      Fixes: 3d210534 ("net_sched: fix a race condition in tcindex_destroy()")
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      304e0242
  7. Mar 31, 2020
    • Gustavo A. R. Silva's avatar
      net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline · 7f80ccfe
      Gustavo A. R. Silva authored
      
      In case memory resources for buf were allocated, release them before
      return.
      
      Addresses-Coverity-ID: 1492011 ("Resource leak")
      Fixes: a7a29f9c ("net: ipv6: add rpl sr tunnel")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f80ccfe
    • Russell King's avatar
      net: dsa: fix oops while probing Marvell DSA switches · 765bda93
      Russell King authored
      
      Fix an oops in dsa_port_phylink_mac_change() caused by a combination
      of a20f9970 ("net: dsa: Don't instantiate phylink for CPU/DSA
      ports unless needed") and the net-dsa-improve-serdes-integration
      series of patches 65b7a2c8 ("Merge branch
      'net-dsa-improve-serdes-integration'").
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000124
      pgd = c0004000
      [00000124] *pgd=00000000
      Internal error: Oops: 805 [#1] SMP ARM
      Modules linked in: tag_edsa spi_nor mtd xhci_plat_hcd mv88e6xxx(+) xhci_hcd armada_thermal marvell_cesa dsa_core ehci_orion libdes phy_armada38x_comphy at24 mcp3021 sfp evbug spi_orion sff mdio_i2c
      CPU: 1 PID: 214 Comm: irq/55-mv88e6xx Not tainted 5.6.0+ #470
      Hardware name: Marvell Armada 380/385 (Device Tree)
      PC is at phylink_mac_change+0x10/0x88
      LR is at mv88e6352_serdes_irq_status+0x74/0x94 [mv88e6xxx]
      
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      765bda93
    • Bruno Meneguele's avatar
      net/bpfilter: remove superfluous testing message · 41c55ea6
      Bruno Meneguele authored
      
      A testing message was brought by 13d0f7b8 ("net/bpfilter: fix dprintf
      usage for /dev/kmsg") but should've been deleted before patch submission.
      Although it doesn't cause any harm to the code or functionality itself, it's
      totally unpleasant to have it displayed on every loop iteration with no real
      use case. Thus remove it unconditionally.
      
      Fixes: 13d0f7b8 ("net/bpfilter: fix dprintf usage for /dev/kmsg")
      Signed-off-by: default avatarBruno Meneguele <bmeneg@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41c55ea6
    • Ido Schimmel's avatar
      devlink: Allow setting of packet trap group parameters · c064875a
      Ido Schimmel authored
      
      The previous patch allowed device drivers to publish their default
      binding between packet trap policers and packet trap groups. However,
      some users might not be content with this binding and would like to
      change it.
      
      In case user space passed a packet trap policer identifier when setting
      a packet trap group, invoke the appropriate device driver callback and
      pass the new policer identifier.
      
      v2:
      * Check for presence of 'DEVLINK_ATTR_TRAP_POLICER_ID' in
        devlink_trap_group_set() and bail if not present
      * Add extack error message in case trap group was partially modified
      
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c064875a
    • Ido Schimmel's avatar
      devlink: Add packet trap group parameters support · f9f54392
      Ido Schimmel authored
      
      Packet trap groups are used to aggregate logically related packet traps.
      Currently, these groups allow user space to batch operations such as
      setting the trap action of all member traps.
      
      In order to prevent the CPU from being overwhelmed by too many trapped
      packets, it is desirable to bind a packet trap policer to these groups.
      For example, to limit all the packets that encountered an exception
      during routing to 10Kpps.
      
      Allow device drivers to bind default packet trap policers to packet trap
      groups when the latter are registered with devlink.
      
      The next patch will enable user space to change this default binding.
      
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9f54392
    • Ido Schimmel's avatar
      devlink: Add packet trap policers support · 1e8c6619
      Ido Schimmel authored
      
      Devices capable of offloading the kernel's datapath and perform
      functions such as bridging and routing must also be able to send (trap)
      specific packets to the kernel (i.e., the CPU) for processing.
      
      For example, a device acting as a multicast-aware bridge must be able to
      trap IGMP membership reports to the kernel for processing by the bridge
      module.
      
      In most cases, the underlying device is capable of handling packet rates
      that are several orders of magnitude higher compared to those that can
      be handled by the CPU.
      
      Therefore, in order to prevent the underlying device from overwhelming
      the CPU, devices usually include packet trap policers that are able to
      police the trapped packets to rates that can be handled by the CPU.
      
      This patch allows capable device drivers to register their supported
      packet trap policers with devlink. User space can then tune the
      parameters of these policer (currently, rate and burst size) and read
      from the device the number of packets that were dropped by the policer,
      if supported.
      
      Subsequent patches in the series will allow device drivers to create
      default binding between these policers and packet trap groups and allow
      user space to change the binding.
      
      v2:
      * Add 'strict_start_type' in devlink policy
      * Have device drivers provide max/min rate/burst size for each policer.
        Use them to check validity of user provided parameters
      
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e8c6619
  8. Mar 30, 2020
Loading