]> www.infradead.org Git - users/hch/dma-mapping.git/log
users/hch/dma-mapping.git
5 years agoMerge branch 'ionic-fixes'
David S. Miller [Tue, 12 May 2020 19:12:34 +0000 (12:12 -0700)]
Merge branch 'ionic-fixes'

Shannon Nelson says:

====================
ionic fixes

These are a couple more fixes after more fw-upgrade testing.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoionic: call ionic_port_init after fw-upgrade
Shannon Nelson [Mon, 11 May 2020 21:04:45 +0000 (14:04 -0700)]
ionic: call ionic_port_init after fw-upgrade

Since the fw has been re-inited, we need to refresh the port
information dma address so we can see fresh port information.
Let's call ionic_port_init again, and tweak it to allow for
a call to simply refresh the existing dma address.

Fixes: c672412f6172 ("ionic: remove lifs on fw reset")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoionic: leave netdev mac alone after fw-upgrade
Shannon Nelson [Mon, 11 May 2020 21:04:44 +0000 (14:04 -0700)]
ionic: leave netdev mac alone after fw-upgrade

When running in a bond setup, or some other potential
configurations, the netdev mac may have been changed from
the default device mac.  Since the userland doesn't know
about the changes going on under the covers in a fw-upgrade
it doesn't know the re-push the mac filter.  The driver
needs to leave the netdev mac filter alone when rebuilding
after the fw-upgrade.

Fixes: c672412f6172 ("ionic: remove lifs on fw reset")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoptp: fix struct member comment for do_aux_work
Jacob Keller [Mon, 11 May 2020 21:02:15 +0000 (14:02 -0700)]
ptp: fix struct member comment for do_aux_work

The do_aux_work callback had documentation in the structure comment
which referred to it as "do_work".

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomptcp: Initialize map_seq upon subflow establishment
Christoph Paasch [Mon, 11 May 2020 16:24:42 +0000 (09:24 -0700)]
mptcp: Initialize map_seq upon subflow establishment

When the other MPTCP-peer uses 32-bit data-sequence numbers, we rely on
map_seq to indicate how to expand to a 64-bit data-sequence number in
expand_seq() when receiving data.

For new subflows, this field is not initialized, thus results in an
"invalid" mapping being discarded.

Fix this by initializing map_seq upon subflow establishment time.

Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-ipa-fix-cleanup-after-modem-crash'
David S. Miller [Mon, 11 May 2020 21:00:44 +0000 (14:00 -0700)]
Merge branch 'net-ipa-fix-cleanup-after-modem-crash'

Alex Elder says:

====================
net: ipa: fix cleanup after modem crash

The first patch in this series fixes a bug where the size of a data
transfer request was never set, meaning it was 0.  The consequence
of this was that such a transfer request would never complete if
attempted, and led to a hung task timeout.

This data transfer is required for cleaning up IPA hardware state
when recovering from a modem crash.  The code to implement this
cleanup is already present, but its use was commented out because
it hit the bug described above.  So the second patch in this series
enables the use of that "tag process" cleanup code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ipa: use tag process on modem crash
Alex Elder [Thu, 7 May 2020 19:14:04 +0000 (14:14 -0500)]
net: ipa: use tag process on modem crash

One part of recovering from a modem crash is performing a "tag
sequence" of several IPA immediate commands, to clear the hardware
pipeline.  The sequence ends with a data transfer request on the
command endpoint (which is not otherwise done).  Unfortunately,
attempting to do the data transfer led to a hang, so that request
plus two other commands were commented out.

The previous commit fixes the bug that was causing that hang.  And
with that bug fixed we can properly issue the tag sequence when the
modem crashes, to return the hardware to a known state.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ipa: set DMA length in gsi_trans_cmd_add()
Alex Elder [Thu, 7 May 2020 19:14:03 +0000 (14:14 -0500)]
net: ipa: set DMA length in gsi_trans_cmd_add()

When a command gets added to a transaction for the AP->command
channel we set the DMA address of its scatterlist entry, but not
its DMA length.  Fix this bug.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agohinic: fix a bug of ndo_stop
Luo bin [Sun, 10 May 2020 19:01:08 +0000 (19:01 +0000)]
hinic: fix a bug of ndo_stop

if some function in ndo_stop interface returns failure because of
hardware fault, must go on excuting rest steps rather than return
failure directly, otherwise will cause memory leak.And bump the
timeout for SET_FUNC_STATE to ensure that cmd won't return failure
when hw is busy. Otherwise hw may stomp host memory if we free
memory regardless of the return value of SET_FUNC_STATE.

Fixes: 51ba902a16e6 ("net-next/hinic: Initialize hw interface")
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 years agonet: dsa: loop: Add module soft dependency
Florian Fainelli [Sat, 9 May 2020 23:45:44 +0000 (16:45 -0700)]
net: dsa: loop: Add module soft dependency

There is a soft dependency against dsa_loop_bdinfo.ko which sets up the
MDIO device registration, since there are no symbols referenced by
dsa_loop.ko, there is no automatic loading of dsa_loop_bdinfo.ko which
is needed.

Fixes: 98cd1552ea27 ("net: dsa: Mock-up driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 years agonetprio_cgroup: Fix unlimited memory leak of v2 cgroups
Zefan Li [Sat, 9 May 2020 03:32:10 +0000 (11:32 +0800)]
netprio_cgroup: Fix unlimited memory leak of v2 cgroups

If systemd is configured to use hybrid mode which enables the use of
both cgroup v1 and v2, systemd will create new cgroup on both the default
root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach
task to the two cgroups. If the task does some network thing then the v2
cgroup can never be freed after the session exited.

One of our machines ran into OOM due to this memory leak.

In the scenario described above when sk_alloc() is called
cgroup_sk_alloc() thought it's in v2 mode, so it stores
the cgroup pointer in sk->sk_cgrp_data and increments
the cgroup refcnt, but then sock_update_netprioidx()
thought it's in v1 mode, so it stores netprioidx value
in sk->sk_cgrp_data, so the cgroup refcnt will never be freed.

Currently we do the mode switch when someone writes to the ifpriomap
cgroup control file. The easiest fix is to also do the switch when
a task is attached to a new cgroup.

Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
Reported-by: Yang Yingliang <yangyingliang@huawei.com>
Tested-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 years agonet: freescale: select CONFIG_FIXED_PHY where needed
Arnd Bergmann [Sat, 9 May 2020 12:04:52 +0000 (14:04 +0200)]
net: freescale: select CONFIG_FIXED_PHY where needed

I ran into a randconfig build failure with CONFIG_FIXED_PHY=m
and CONFIG_GIANFAR=y:

x86_64-linux-ld: drivers/net/ethernet/freescale/gianfar.o:(.rodata+0x418): undefined reference to `fixed_phy_change_carrier'

It seems the same thing can happen with dpaa and ucc_geth, so change
all three to do an explicit 'select FIXED_PHY'.

The fixed-phy driver actually has an alternative stub function that
theoretically allows building network drivers when fixed-phy is
disabled, but I don't see how that would help here, as the drivers
presumably would not work then.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 years agonet: ipv4: really enforce backoff for redirects
Paolo Abeni [Fri, 8 May 2020 17:28:34 +0000 (19:28 +0200)]
net: ipv4: really enforce backoff for redirects

In commit b406472b5ad7 ("net: ipv4: avoid mixed n_redirects and
rate_tokens usage") I missed the fact that a 0 'rate_tokens' will
bypass the backoff algorithm.

Since rate_tokens is cleared after a redirect silence, and never
incremented on redirects, if the host keeps receiving packets
requiring redirect it will reply ignoring the backoff.

Additionally, the 'rate_last' field will be updated with the
cadence of the ingress packet requiring redirect. If that rate is
high enough, that will prevent the host from generating any
other kind of ICMP messages

The check for a zero 'rate_tokens' value was likely a shortcut
to avoid the more complex backoff algorithm after a redirect
silence period. Address the issue checking for 'n_redirects'
instead, which is incremented on successful redirect, and
does not interfere with other ICMP replies.

Fixes: b406472b5ad7 ("net: ipv4: avoid mixed n_redirects and rate_tokens usage")
Reported-and-tested-by: Colin Walters <walters@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 years agoocteontx2-vf: Fix error return code in otx2vf_probe()
Wei Yongjun [Fri, 8 May 2020 02:15:19 +0000 (02:15 +0000)]
octeontx2-vf: Fix error return code in otx2vf_probe()

Fix to return negative error code -ENOMEM from the alloc failed error
handling case instead of 0, as done elsewhere in this function.

Fixes: 3184fb5ba96e ("octeontx2-vf: Virtual function driver support")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 years agoumh: fix memory leak on execve failure
Vincent Minet [Thu, 7 May 2020 22:14:22 +0000 (00:14 +0200)]
umh: fix memory leak on execve failure

If a UMH process created by fork_usermode_blob() fails to execute,
a pair of struct file allocated by umh_pipe_setup() will leak.

Under normal conditions, the caller (like bpfilter) needs to manage the
lifetime of the UMH and its two pipes. But when fork_usermode_blob()
fails, the caller doesn't really have a way to know what needs to be
done. It seems better to do the cleanup ourselves in this case.

Fixes: 449325b52b7a ("umh: introduce fork_usermode_blob() helper")
Signed-off-by: Vincent Minet <v.minet@criteo.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Jakub Kicinski [Sat, 9 May 2020 01:58:39 +0000 (18:58 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2020-05-09

The following pull-request contains BPF updates for your *net* tree.

We've added 4 non-merge commits during the last 9 day(s) which contain
a total of 4 files changed, 11 insertions(+), 6 deletions(-).

The main changes are:

1) Fix msg_pop_data() helper incorrectly setting an sge length in some
   cases as well as fixing bpf_tcp_ingress() wrongly accounting bytes
   in sg.size, from John Fastabend.

2) Fix to return an -EFAULT error when copy_to_user() of the value
   fails in map_lookup_and_delete_elem(), from Wei Yongjun.

3) Fix sk_psock refcnt leak in tcp_bpf_recvmsg(), from Xiyu Yang.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 years agodpaa2-eth: prevent array underflow in update_cls_rule()
Dan Carpenter [Fri, 8 May 2020 14:37:20 +0000 (17:37 +0300)]
dpaa2-eth: prevent array underflow in update_cls_rule()

The "location" is controlled by the user via the ethtool_set_rxnfc()
function.  This update_cls_rule() function checks for array overflows
but it doesn't check if the value is negative.  I have changed the type
to unsigned to prevent array underflows.

Fixes: afb90dbb5f78 ("dpaa2-eth: Add ethtool support for flow classification")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 years agonfp: abm: fix error return code in nfp_abm_vnic_alloc()
Wei Yongjun [Fri, 8 May 2020 07:27:35 +0000 (07:27 +0000)]
nfp: abm: fix error return code in nfp_abm_vnic_alloc()

Fix to return negative error code -ENOMEM from the kzalloc() error
handling case instead of 0, as done elsewhere in this function.

Fixes: 174ab544e3bc ("nfp: abm: add cls_u32 offload for simple band classification")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 years agor8169: re-establish support for RTL8401 chip version
Heiner Kallweit [Fri, 8 May 2020 06:24:14 +0000 (08:24 +0200)]
r8169: re-establish support for RTL8401 chip version

r8169 never had native support for the RTL8401, however it reportedly
worked with the fallback to RTL8101e [0]. Therefore let's add this
as an explicit assignment.

[0] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956868

Fixes: b4cc2dcc9c7c ("r8169: remove default chip versions")
Reported-by: Camaleón <noelamac@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 years agonet: tcp: fix rx timestamp behavior for tcp_recvmsg
Kelly Littlepage [Fri, 8 May 2020 19:58:46 +0000 (19:58 +0000)]
net: tcp: fix rx timestamp behavior for tcp_recvmsg

The stated intent of the original commit is to is to "return the timestamp
corresponding to the highest sequence number data returned." The current
implementation returns the timestamp for the last byte of the last fully
read skb, which is not necessarily the last byte in the recv buffer. This
patch converts behavior to the original definition, and to the behavior of
the previous draft versions of commit 98aaa913b4ed ("tcp: Extend
SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg") which also match this
behavior.

Fixes: 98aaa913b4ed ("tcp: Extend SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg")
Co-developed-by: Iris Liu <iris@onechronos.com>
Signed-off-by: Iris Liu <iris@onechronos.com>
Signed-off-by: Kelly Littlepage <kelly@onechronos.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 years agonet: fix a potential recursive NETDEV_FEAT_CHANGE
Cong Wang [Thu, 7 May 2020 19:19:03 +0000 (12:19 -0700)]
net: fix a potential recursive NETDEV_FEAT_CHANGE

syzbot managed to trigger a recursive NETDEV_FEAT_CHANGE event
between bonding master and slave. I managed to find a reproducer
for this:

  ip li set bond0 up
  ifenslave bond0 eth0
  brctl addbr br0
  ethtool -K eth0 lro off
  brctl addif br0 bond0
  ip li set br0 up

When a NETDEV_FEAT_CHANGE event is triggered on a bonding slave,
it captures this and calls bond_compute_features() to fixup its
master's and other slaves' features. However, when syncing with
its lower devices by netdev_sync_lower_features() this event is
triggered again on slaves when the LRO feature fails to change,
so it goes back and forth recursively until the kernel stack is
exhausted.

Commit 17b85d29e82c intentionally lets __netdev_update_features()
return -1 for such a failure case, so we have to just rely on
the existing check inside netdev_sync_lower_features() and skip
NETDEV_FEAT_CHANGE event only for this specific failure case.

Fixes: fd867d51f889 ("net/core: generic support for disabling netdev features down stack")
Reported-by: syzbot+e73ceacfd8560cc8a3ca@syzkaller.appspotmail.com
Reported-by: syzbot+c2fb6f9ddcea95ba49b5@syzkaller.appspotmail.com
Cc: Jarod Wilson <jarod@redhat.com>
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jann Horn <jannh@google.com>
Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomptcp: set correct vfs info for subflows
Paolo Abeni [Thu, 7 May 2020 16:53:24 +0000 (18:53 +0200)]
mptcp: set correct vfs info for subflows

When a subflow is created via mptcp_subflow_create_socket(),
a new 'struct socket' is allocated, with a new i_ino value.

When inspecting TCP sockets via the procfs and or the diag
interface, the above ones are not related to the process owning
the MPTCP master socket, even if they are a logical part of it
('ss -p' shows an empty process field)

Additionally, subflows created by the path manager get
the uid/gid from the running workqueue.

Subflows are part of the owning MPTCP master socket, let's
adjust the vfs info to reflect this.

After this patch, 'ss' correctly displays subflows as belonging
to the msk socket creator.

Fixes: 2303f994b3e1 ("mptcp: Associate MPTCP context with TCP socket")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: microchip: encx24j600: add missed kthread_stop
Chuhong Yuan [Thu, 7 May 2020 15:13:20 +0000 (23:13 +0800)]
net: microchip: encx24j600: add missed kthread_stop

This driver calls kthread_run() in probe, but forgets to call
kthread_stop() in probe failure and remove.
Add the missed kthread_stop() to fix it.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoRevert "ipv6: add mtu lock check in __ip6_rt_update_pmtu"
Maciej Żenczykowski [Tue, 5 May 2020 18:57:23 +0000 (11:57 -0700)]
Revert "ipv6: add mtu lock check in __ip6_rt_update_pmtu"

This reverts commit 19bda36c4299ce3d7e5bce10bebe01764a655a6d:

| ipv6: add mtu lock check in __ip6_rt_update_pmtu
|
| Prior to this patch, ipv6 didn't do mtu lock check in ip6_update_pmtu.
| It leaded to that mtu lock doesn't really work when receiving the pkt
| of ICMPV6_PKT_TOOBIG.
|
| This patch is to add mtu lock check in __ip6_rt_update_pmtu just as ipv4
| did in __ip_rt_update_pmtu.

The above reasoning is incorrect.  IPv6 *requires* icmp based pmtu to work.
There's already a comment to this effect elsewhere in the kernel:

  $ git grep -p -B1 -A3 'RTAX_MTU lock'
  net/ipv6/route.c=4813=

  static int rt6_mtu_change_route(struct fib6_info *f6i, void *p_arg)
  ...
    /* In IPv6 pmtu discovery is not optional,
       so that RTAX_MTU lock cannot disable it.
       We still use this lock to block changes
       caused by addrconf/ndisc.
    */

This reverts to the pre-4.9 behaviour.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Fixes: 19bda36c4299 ("ipv6: add mtu lock check in __ip6_rt_update_pmtu")
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bareudp: avoid uninitialized variable warning
Arnd Bergmann [Tue, 5 May 2020 17:22:14 +0000 (19:22 +0200)]
net: bareudp: avoid uninitialized variable warning

clang points out that building without IPv6 would lead to returning
an uninitialized variable if a packet with family!=AF_INET is
passed into bareudp_udp_encap_recv():

drivers/net/bareudp.c:139:6: error: variable 'err' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
        if (family == AF_INET)
            ^~~~~~~~~~~~~~~~~
drivers/net/bareudp.c:146:15: note: uninitialized use occurs here
        if (unlikely(err)) {
                     ^~~
include/linux/compiler.h:78:42: note: expanded from macro 'unlikely'
 # define unlikely(x)    __builtin_expect(!!(x), 0)
                                            ^
drivers/net/bareudp.c:139:2: note: remove the 'if' if its condition is always true
        if (family == AF_INET)
        ^~~~~~~~~~~~~~~~~~~~~~

This cannot happen in practice, so change the condition in a way that
gcc sees the IPv4 case as unconditionally true here.
For consistency, change all the similar constructs in this file the
same way, using "if(IS_ENABLED())" instead of #if IS_ENABLED()".

Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hisilicon: Make CONFIG_HNS invisible
Geert Uytterhoeven [Thu, 7 May 2020 11:45:11 +0000 (13:45 +0200)]
net: hisilicon: Make CONFIG_HNS invisible

The HNS config symbol enables the framework support for the Hisilicon
Network Subsystem.  It is already selected by all of its users, so there
is no reason to make it visible.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agousb: hso: correct debug message
Oliver Neukum [Thu, 7 May 2020 09:49:49 +0000 (11:49 +0200)]
usb: hso: correct debug message

If you do not find the OUT endpoint, you should say so,
rather than copy the error message for the IN endpoint.
Presumably a copy and paste error.

Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: remove spurious declaration of tcp_default_init_rwnd()
Maciej Żenczykowski [Thu, 7 May 2020 07:58:05 +0000 (00:58 -0700)]
net: remove spurious declaration of tcp_default_init_rwnd()

it doesn't actually exist...

Test: builds and 'git grep tcp_default_init_rwnd' comes up empty
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovirtio_net: fix lockdep warning on 32 bit
Michael S. Tsirkin [Thu, 7 May 2020 07:25:56 +0000 (03:25 -0400)]
virtio_net: fix lockdep warning on 32 bit

When we fill up a receive VQ, try_fill_recv currently tries to count
kicks using a 64 bit stats counter. Turns out, on a 32 bit kernel that
uses a seqcount. sequence counts are "lock" constructs where you need to
make sure that writers are serialized.

In turn, this means that we mustn't run two try_fill_recv concurrently.
Which of course we don't. We do run try_fill_recv sometimes from a
softirq napi context, and sometimes from a fully preemptible context,
but the later always runs with napi disabled.

However, when it comes to the seqcount, lockdep is trying to enforce the
rule that the same lock isn't accessed from preemptible and softirq
context - it doesn't know about napi being enabled/disabled. This causes
a false-positive warning:

WARNING: inconsistent lock state
...
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.

As a work around, shut down the warning by switching
to u64_stats_update_begin_irqsave - that works by disabling
interrupts on 32 bit only, is a NOP on 64 bit.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 7 May 2020 03:53:22 +0000 (20:53 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from David Miller:

 1) Fix reference count leaks in various parts of batman-adv, from Xiyu
    Yang.

 2) Update NAT checksum even when it is zero, from Guillaume Nault.

 3) sk_psock reference count leak in tls code, also from Xiyu Yang.

 4) Sanity check TCA_FQ_CODEL_DROP_BATCH_SIZE netlink attribute in
    fq_codel, from Eric Dumazet.

 5) Fix panic in choke_reset(), also from Eric Dumazet.

 6) Fix VLAN accel handling in bnxt_fix_features(), from Michael Chan.

 7) Disallow out of range quantum values in sch_sfq, from Eric Dumazet.

 8) Fix crash in x25_disconnect(), from Yue Haibing.

 9) Don't pass pointer to local variable back to the caller in
    nf_osf_hdr_ctx_init(), from Arnd Bergmann.

10) Wireguard should use the ECN decap helper functions, from Toke
    Høiland-Jørgensen.

11) Fix command entry leak in mlx5 driver, from Moshe Shemesh.

12) Fix uninitialized variable access in mptcp's
    subflow_syn_recv_sock(), from Paolo Abeni.

13) Fix unnecessary out-of-order ingress frame ordering in macsec, from
    Scott Dial.

14) IPv6 needs to use a global serial number for dst validation just
    like ipv4, from David Ahern.

15) Fix up PTP_1588_CLOCK deps, from Clay McClure.

16) Missing NLM_F_MULTI flag in gtp driver netlink messages, from
    Yoshiyuki Kurauchi.

17) Fix a regression in that dsa user port errors should not be fatal,
    from Florian Fainelli.

18) Fix iomap leak in enetc driver, from Dejin Zheng.

19) Fix use after free in lec_arp_clear_vccs(), from Cong Wang.

20) Initialize protocol value earlier in neigh code paths when
    generating events, from Roman Mashak.

21) netdev_update_features() must be called with RTNL mutex in macsec
    driver, from Antoine Tenart.

22) Validate untrusted GSO packets even more strictly, from Willem de
    Bruijn.

23) Wireguard decrypt worker needs a cond_resched(), from Jason
    Donenfeld.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
  net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE
  MAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper order
  wireguard: send/receive: use explicit unlikely branch instead of implicit coalescing
  wireguard: selftests: initalize ipv6 members to NULL to squelch clang warning
  wireguard: send/receive: cond_resched() when processing worker ringbuffers
  wireguard: socket: remove errant restriction on looping to self
  wireguard: selftests: use normal kernel stack size on ppc64
  net: ethernet: ti: am65-cpsw-nuss: fix irqs type
  ionic: Use debugfs_create_bool() to export bool
  net: dsa: Do not leave DSA master with NULL netdev_ops
  net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred
  net: stricter validation of untrusted gso packets
  seg6: fix SRH processing to comply with RFC8754
  net: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not ms
  net: dsa: ocelot: the MAC table on Felix is twice as large
  net: dsa: sja1105: the PTP_CLK extts input reacts on both edges
  selftests: net: tcp_mmap: fix SO_RCVLOWAT setting
  net: hsr: fix incorrect type usage for protocol variable
  net: macsec: fix rtnl locking issue
  net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del()
  ...

5 years agonet: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE
Pablo Neira Ayuso [Wed, 6 May 2020 18:34:50 +0000 (20:34 +0200)]
net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE

This patch adds FLOW_ACTION_HW_STATS_DONT_CARE which tells the driver
that the frontend does not need counters, this hw stats type request
never fails. The FLOW_ACTION_HW_STATS_DISABLED type explicitly requests
the driver to disable the stats, however, if the driver cannot disable
counters, it bails out.

TCA_ACT_HW_STATS_* maintains the 1:1 mapping with FLOW_ACTION_HW_STATS_*
except by disabled which is mapped to FLOW_ACTION_HW_STATS_DISABLED
(this is 0 in tc). Add tc_act_hw_stats() to perform the mapping between
TCA_ACT_HW_STATS_* and FLOW_ACTION_HW_STATS_*.

Fixes: 319a1d19471e ("flow_offload: check for basic action hw stats type")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper order
Lukas Bulwahn [Wed, 6 May 2020 20:29:06 +0000 (22:29 +0200)]
MAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper order

Commit 9b038086f06b ("docs: networking: convert DIM to RST") added a new
file entry to DYNAMIC INTERRUPT MODERATION to the end, and not following
alphabetical order.

So, ./scripts/checkpatch.pl -f MAINTAINERS complains:

  WARNING: Misordered MAINTAINERS entry - list file patterns in alphabetic
  order
  #5966: FILE: MAINTAINERS:5966:
  +F:      lib/dim/
  +F:      Documentation/networking/net_dim.rst

Reorder the file entries to keep MAINTAINERS nicely ordered.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'wireguard-fixes'
David S. Miller [Thu, 7 May 2020 03:03:48 +0000 (20:03 -0700)]
Merge branch 'wireguard-fixes'

Jason A. Donenfeld says:

====================
wireguard fixes for 5.7-rc5

With Ubuntu and Debian having backported this into their kernels, we're
finally seeing testing from places we hadn't seen prior, which is nice.
With that comes more fixes:

1) The CI for PPC64 was running with extremely small stacks for 64-bit,
   causing spurious crashes in surprising places.

2) There's was an old leftover routing loop restriction, which no longer
   makes sense given the queueing architecture, and was causing problems
   for people who really did want nested routing.

3) Not yielding our kthread on CONFIG_PREEMPT_VOLUNTARY systems caused
   RCU stalls and other issues, reported by Wang Jian, with the fix
   suggested by Sultan Alsawaf.

4) Clang spewed warnings in a selftest for CONFIG_IPV6=n, reported by
   Arnd Bergmann.

5) A complicated if statement was simplified to an assignment while also
   making the likely/unlikely hinting more correct and simple, and
   increasing readability, suggested by Sultan.

Patches (2) and (3) have Fixes: lines and are probably good candidates
for stable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agowireguard: send/receive: use explicit unlikely branch instead of implicit coalescing
Jason A. Donenfeld [Wed, 6 May 2020 21:33:06 +0000 (15:33 -0600)]
wireguard: send/receive: use explicit unlikely branch instead of implicit coalescing

It's very unlikely that send will become true. It's nearly always false
between 0 and 120 seconds of a session, and in most cases becomes true
only between 120 and 121 seconds before becoming false again. So,
unlikely(send) is clearly the right option here.

What happened before was that we had this complex boolean expression
with multiple likely and unlikely clauses nested. Since this is
evaluated left-to-right anyway, the whole thing got converted to
unlikely. So, we can clean this up to better represent what's going on.

The generated code is the same.

Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agowireguard: selftests: initalize ipv6 members to NULL to squelch clang warning
Jason A. Donenfeld [Wed, 6 May 2020 21:33:05 +0000 (15:33 -0600)]
wireguard: selftests: initalize ipv6 members to NULL to squelch clang warning

Without setting these to NULL, clang complains in certain
configurations that have CONFIG_IPV6=n:

In file included from drivers/net/wireguard/ratelimiter.c:223:
drivers/net/wireguard/selftest/ratelimiter.c:173:34: error: variable 'skb6' is uninitialized when used here [-Werror,-Wuninitialized]
                ret = timings_test(skb4, hdr4, skb6, hdr6, &test_count);
                                               ^~~~
drivers/net/wireguard/selftest/ratelimiter.c:123:29: note: initialize the variable 'skb6' to silence this warning
        struct sk_buff *skb4, *skb6;
                                   ^
                                    = NULL
drivers/net/wireguard/selftest/ratelimiter.c:173:40: error: variable 'hdr6' is uninitialized when used here [-Werror,-Wuninitialized]
                ret = timings_test(skb4, hdr4, skb6, hdr6, &test_count);
                                                     ^~~~
drivers/net/wireguard/selftest/ratelimiter.c:125:22: note: initialize the variable 'hdr6' to silence this warning
        struct ipv6hdr *hdr6;
                            ^

We silence this warning by setting the variables to NULL as the warning
suggests.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agowireguard: send/receive: cond_resched() when processing worker ringbuffers
Jason A. Donenfeld [Wed, 6 May 2020 21:33:04 +0000 (15:33 -0600)]
wireguard: send/receive: cond_resched() when processing worker ringbuffers

Users with pathological hardware reported CPU stalls on CONFIG_
PREEMPT_VOLUNTARY=y, because the ringbuffers would stay full, meaning
these workers would never terminate. That turned out not to be okay on
systems without forced preemption, which Sultan observed. This commit
adds a cond_resched() to the bottom of each loop iteration, so that
these workers don't hog the core. Note that we don't need this on the
napi poll worker, since that terminates after its budget is expended.

Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com>
Reported-by: Wang Jian <larkwang@gmail.com>
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agowireguard: socket: remove errant restriction on looping to self
Jason A. Donenfeld [Wed, 6 May 2020 21:33:03 +0000 (15:33 -0600)]
wireguard: socket: remove errant restriction on looping to self

It's already possible to create two different interfaces and loop
packets between them. This has always been possible with tunnels in the
kernel, and isn't specific to wireguard. Therefore, the networking stack
already needs to deal with that. At the very least, the packet winds up
exceeding the MTU and is discarded at that point. So, since this is
already something that happens, there's no need to forbid the not very
exceptional case of routing a packet back to the same interface; this
loop is no different than others, and we shouldn't special case it, but
rather rely on generic handling of loops in general. This also makes it
easier to do interesting things with wireguard such as onion routing.

At the same time, we add a selftest for this, ensuring that both onion
routing works and infinite routing loops do not crash the kernel. We
also add a test case for wireguard interfaces nesting packets and
sending traffic between each other, as well as the loop in this case
too. We make sure to send some throughput-heavy traffic for this use
case, to stress out any possible recursion issues with the locks around
workqueues.

Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agowireguard: selftests: use normal kernel stack size on ppc64
Jason A. Donenfeld [Wed, 6 May 2020 21:33:02 +0000 (15:33 -0600)]
wireguard: selftests: use normal kernel stack size on ppc64

While at some point it might have made sense to be running these tests
on ppc64 with 4k stacks, the kernel hasn't actually used 4k stacks on
64-bit powerpc in a long time, and more interesting things that we test
don't really work when we deviate from the default (16k). So, we stop
pushing our luck in this commit, and return to the default instead of
the minimum.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: ti: am65-cpsw-nuss: fix irqs type
Grygorii Strashko [Tue, 5 May 2020 16:31:26 +0000 (19:31 +0300)]
net: ethernet: ti: am65-cpsw-nuss: fix irqs type

The K3 INTA driver, which is source TX/RX IRQs for CPSW NUSS, defines IRQs
triggering type as EDGE by default, but triggering type for CPSW NUSS TX/RX
IRQs has to be LEVEL as the EDGE triggering type may cause unnecessary IRQs
triggering and NAPI scheduling for empty queues. It was discovered with
RT-kernel.

Fix it by explicitly specifying CPSW NUSS TX/RX IRQ type as
IRQF_TRIGGER_HIGH.

Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoionic: Use debugfs_create_bool() to export bool
Geert Uytterhoeven [Tue, 5 May 2020 13:28:09 +0000 (15:28 +0200)]
ionic: Use debugfs_create_bool() to export bool

Currently bool ionic_cq.done_color is exported using
debugfs_create_u8(), which requires a cast, preventing further compiler
checks.

Fix this by switching to debugfs_create_bool(), and dropping the cast.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: Do not leave DSA master with NULL netdev_ops
Florian Fainelli [Mon, 4 May 2020 20:18:06 +0000 (13:18 -0700)]
net: dsa: Do not leave DSA master with NULL netdev_ops

When ndo_get_phys_port_name() for the CPU port was added we introduced
an early check for when the DSA master network device in
dsa_master_ndo_setup() already implements ndo_get_phys_port_name(). When
we perform the teardown operation in dsa_master_ndo_teardown() we would
not be checking that cpu_dp->orig_ndo_ops was successfully allocated and
non-NULL initialized.

With network device drivers such as virtio_net, this leads to a NPD as
soon as the DSA switch hanging off of it gets torn down because we are
now assigning the virtio_net device's netdev_ops a NULL pointer.

Fixes: da7b9e9b00d4 ("net: dsa: Add ndo_get_phys_port_name() for CPU port")
Reported-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred
Vladimir Oltean [Mon, 4 May 2020 19:58:56 +0000 (22:58 +0300)]
net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred

This was caused by a poor merge conflict resolution on my side. The
"act = &cls->rule->action.entries[0];" assignment was already present in
the code prior to the patch mentioned below.

Fixes: e13c2075280e ("net: dsa: refactor matchall mirred action to separate function")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stricter validation of untrusted gso packets
Willem de Bruijn [Mon, 4 May 2020 16:48:54 +0000 (12:48 -0400)]
net: stricter validation of untrusted gso packets

Syzkaller again found a path to a kernel crash through bad gso input:
a packet with transport header extending beyond skb_headlen(skb).

Tighten validation at kernel entry:

- Verify that the transport header lies within the linear section.

    To avoid pulling linux/tcp.h, verify just sizeof tcphdr.
    tcp_gso_segment will call pskb_may_pull (th->doff * 4) before use.

- Match the gso_type against the ip_proto found by the flow dissector.

Fixes: bfd5f4a3d605 ("packet: Add GSO/csum offload support.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoseg6: fix SRH processing to comply with RFC8754
Ahmed Abdelsalam [Mon, 4 May 2020 14:42:11 +0000 (14:42 +0000)]
seg6: fix SRH processing to comply with RFC8754

The Segment Routing Header (SRH) which defines the SRv6 dataplane is defined
in RFC8754.

RFC8754 (section 4.1) defines the SR source node behavior which encapsulates
packets into an outer IPv6 header and SRH. The SR source node encodes the
full list of Segments that defines the packet path in the SRH. Then, the
first segment from list of Segments is copied into the Destination address
of the outer IPv6 header and the packet is sent to the first hop in its path
towards the destination.

If the Segment list has only one segment, the SR source node can omit the SRH
as he only segment is added in the destination address.

RFC8754 (section 4.1.1) defines the Reduced SRH, when a source does not
require the entire SID list to be preserved in the SRH. A reduced SRH does
not contain the first segment of the related SR Policy (the first segment is
the one already in the DA of the IPv6 header), and the Last Entry field is
set to n-2, where n is the number of elements in the SR Policy.

RFC8754 (section 4.3.1.1) defines the SRH processing and the logic to
validate the SRH (S09, S10, S11) which works for both reduced and
non-reduced behaviors.

This patch updates seg6_validate_srh() to validate the SRH as per RFC8754.

Signed-off-by: Ahmed Abdelsalam <ahabdels@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'FDB-fixes-for-Felix-and-Ocelot-switches'
David S. Miller [Thu, 7 May 2020 00:15:38 +0000 (17:15 -0700)]
Merge branch 'FDB-fixes-for-Felix-and-Ocelot-switches'

Vladimir Oltean says:

====================
FDB fixes for Felix and Ocelot switches

This series fixes the following problems:
- Dynamically learnt addresses never expiring (neither for Ocelot nor
  for Felix)
- Half of the FDB not visible in 'bridge fdb show' (for Felix only)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not ms
Vladimir Oltean [Sun, 3 May 2020 22:20:27 +0000 (01:20 +0300)]
net: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not ms

One may notice that automatically-learnt entries 'never' expire, even
though the bridge configures the address age period at 300 seconds.

Actually the value written to hardware corresponds to a time interval
1000 times higher than intended, i.e. 83 hours.

Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Faineli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: ocelot: the MAC table on Felix is twice as large
Vladimir Oltean [Sun, 3 May 2020 22:20:26 +0000 (01:20 +0300)]
net: dsa: ocelot: the MAC table on Felix is twice as large

When running 'bridge fdb dump' on Felix, sometimes learnt and static MAC
addresses would appear, sometimes they wouldn't.

Turns out, the MAC table has 4096 entries on VSC7514 (Ocelot) and 8192
entries on VSC9959 (Felix), so the existing code from the Ocelot common
library only dumped half of Felix's MAC table. They are both organized
as a 4-way set-associative TCAM, so we just need a single variable
indicating the correct number of rows.

Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'tag-chrome-platform-fixes-for-v5.7-rc5' of git://git.kernel.org/pub/scm...
Linus Torvalds [Wed, 6 May 2020 23:40:14 +0000 (16:40 -0700)]
Merge tag 'tag-chrome-platform-fixes-for-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux

Pull chrome platform fix from Benson Leung:
 "Fix a resource allocation issue in cros_ec_sensorhub.c"

* tag 'tag-chrome-platform-fixes-for-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
  platform/chrome: cros_ec_sensorhub: Allocate sensorhub resource before claiming sensors

5 years agonet: dsa: sja1105: the PTP_CLK extts input reacts on both edges
Vladimir Oltean [Wed, 6 May 2020 17:48:13 +0000 (20:48 +0300)]
net: dsa: sja1105: the PTP_CLK extts input reacts on both edges

It looks like the sja1105 external timestamping input is not as generic
as we thought. When fed a signal with 50% duty cycle, it will timestamp
both the rising and the falling edge. When fed a short pulse signal,
only the timestamp of the falling edge will be seen in the PTPSYNCTS
register, because that of the rising edge had been overwritten. So the
moral is: don't feed it short pulse inputs.

Luckily this is not a complete deal breaker, as we can still work with
1 Hz square waves. But the problem is that the extts polling period was
not dimensioned enough for this input signal. If we leave the period at
half a second, we risk losing timestamps due to jitter in the measuring
process. So we need to increase it to 4 times per second.

Also, the very least we can do to inform the user is to deny any other
flags combination than with PTP_RISING_EDGE and PTP_FALLING_EDGE both
set.

Fixes: 747e5eb31d59 ("net: dsa: sja1105: configure the PTP_CLK pin as EXT_TS or PER_OUT")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: net: tcp_mmap: fix SO_RCVLOWAT setting
Eric Dumazet [Wed, 6 May 2020 16:21:15 +0000 (09:21 -0700)]
selftests: net: tcp_mmap: fix SO_RCVLOWAT setting

Since chunk_size is no longer an integer, we can not
use it directly as an argument of setsockopt().

This patch should fix tcp_mmap for Big Endian kernels.

Fixes: 597b01edafac ("selftests: net: avoid ptl lock contention in tcp_mmap")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hsr: fix incorrect type usage for protocol variable
Murali Karicheri [Wed, 6 May 2020 15:41:07 +0000 (11:41 -0400)]
net: hsr: fix incorrect type usage for protocol variable

Fix following sparse checker warning:-

net/hsr/hsr_slave.c:38:18: warning: incorrect type in assignment (different base types)
net/hsr/hsr_slave.c:38:18:    expected unsigned short [unsigned] [usertype] protocol
net/hsr/hsr_slave.c:38:18:    got restricted __be16 [usertype] h_proto
net/hsr/hsr_slave.c:39:25: warning: restricted __be16 degrades to integer
net/hsr/hsr_slave.c:39:57: warning: restricted __be16 degrades to integer

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: macsec: fix rtnl locking issue
Antoine Tenart [Wed, 6 May 2020 13:58:30 +0000 (15:58 +0200)]
net: macsec: fix rtnl locking issue

netdev_update_features() must be called with the rtnl lock taken. Not
doing so triggers a warning, as ASSERT_RTNL() is used in
__netdev_update_features(), the first function called by
netdev_update_features(). Fix this.

Fixes: c850240b6c41 ("net: macsec: report real_dev features when HW offloading is enabled")
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del()
Dan Carpenter [Wed, 6 May 2020 10:16:56 +0000 (13:16 +0300)]
net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del()

The "info->fs.location" is a u32 that comes from the user via the
ethtool_set_rxnfc() function.  We need to check for invalid values to
prevent a buffer overflow.

I copy and pasted this check from the mvpp2_ethtool_cls_rule_ins()
function.

Fixes: 90b509b39ac9 ("net: mvpp2: cls: Add Classification offload support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mvpp2: prevent buffer overflow in mvpp22_rss_ctx()
Dan Carpenter [Wed, 6 May 2020 10:16:22 +0000 (13:16 +0300)]
net: mvpp2: prevent buffer overflow in mvpp22_rss_ctx()

The "rss_context" variable comes from the user via  ethtool_get_rxfh().
It can be any u32 value except zero.  Eventually it gets passed to
mvpp22_rss_ctx() and if it is over MVPP22_N_RSS_TABLES (8) then it
results in an array overflow.

Fixes: 895586d5dc32 ("net: mvpp2: cls: Use RSS contexts to handle RSS tables")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: net: tcp_mmap: clear whole tcp_zerocopy_receive struct
Eric Dumazet [Wed, 6 May 2020 03:51:06 +0000 (20:51 -0700)]
selftests: net: tcp_mmap: clear whole tcp_zerocopy_receive struct

We added fields in tcp_zerocopy_receive structure,
so make sure to clear all fields to not pass garbage to the kernel.

We were lucky because recent additions added 'out' parameters,
still we need to clean our reference implementation, before folks
copy/paste it.

Fixes: c8856c051454 ("tcp-zerocopy: Return inq along with tcp receive zerocopy.")
Fixes: 33946518d493 ("tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Wed, 6 May 2020 17:20:00 +0000 (10:20 -0700)]
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
 "This fixes a potential scheduling latency problem for the algorithms
  used by WireGuard"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: arch/nhpoly1305 - process in explicit 4k chunks
  crypto: arch/lib - limit simd usage to 4k chunks

5 years agoMerge tag 'platform-drivers-x86-v5.7-2' of git://git.infradead.org/linux-platform...
Linus Torvalds [Tue, 5 May 2020 23:29:03 +0000 (16:29 -0700)]
Merge tag 'platform-drivers-x86-v5.7-2' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform driver fixes from Andy Shevchenko:

 - Avoid loading asus-nb-wmi module on selected laptop models

 - Fix S0ix debug support for Jasper Lake PMC

 - Few fixes which have been reported by Hulk bot and others

* tag 'platform-drivers-x86-v5.7-2' of git://git.infradead.org/linux-platform-drivers-x86:
  platform/x86: thinkpad_acpi: Remove always false 'value < 0' statement
  platform/x86: intel_pmc_core: avoid unused-function warnings
  platform/x86: asus-nb-wmi: Do not load on Asus T100TA and T200TA
  platform/x86: intel_pmc_core: Change Jasper Lake S0ix debug reg map back to ICL
  platform/x86/intel-uncore-freq: make uncore_root_kobj static
  platform/x86: wmi: Make two functions static
  platform/x86: surface3_power: Fix a NULL vs IS_ERR() check in probe

5 years agobpf, sockmap: bpf_tcp_ingress needs to subtract bytes from sg.size
John Fastabend [Mon, 4 May 2020 17:21:44 +0000 (10:21 -0700)]
bpf, sockmap: bpf_tcp_ingress needs to subtract bytes from sg.size

In bpf_tcp_ingress we used apply_bytes to subtract bytes from sg.size
which is used to track total bytes in a message. But this is not
correct because apply_bytes is itself modified in the main loop doing
the mem_charge.

Then at the end of this we have sg.size incorrectly set and out of
sync with actual sk values. Then we can get a splat if we try to
cork the data later and again try to redirect the msg to ingress. To
fix instead of trying to track msg.size do the easy thing and include
it as part of the sk_msg_xfer logic so that when the msg is moved the
sg.size is always correct.

To reproduce the below users will need ingress + cork and hit an
error path that will then try to 'free' the skmsg.

[  173.699981] BUG: KASAN: null-ptr-deref in sk_msg_free_elem+0xdd/0x120
[  173.699987] Read of size 8 at addr 0000000000000008 by task test_sockmap/5317

[  173.700000] CPU: 2 PID: 5317 Comm: test_sockmap Tainted: G          I       5.7.0-rc1+ #43
[  173.700005] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019
[  173.700009] Call Trace:
[  173.700021]  dump_stack+0x8e/0xcb
[  173.700029]  ? sk_msg_free_elem+0xdd/0x120
[  173.700034]  ? sk_msg_free_elem+0xdd/0x120
[  173.700042]  __kasan_report+0x102/0x15f
[  173.700052]  ? sk_msg_free_elem+0xdd/0x120
[  173.700060]  kasan_report+0x32/0x50
[  173.700070]  sk_msg_free_elem+0xdd/0x120
[  173.700080]  __sk_msg_free+0x87/0x150
[  173.700094]  tcp_bpf_send_verdict+0x179/0x4f0
[  173.700109]  tcp_bpf_sendpage+0x3ce/0x5d0

Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/158861290407.14306.5327773422227552482.stgit@john-Precision-5820-Tower
5 years agobpf, sockmap: msg_pop_data can incorrecty set an sge length
John Fastabend [Mon, 4 May 2020 17:21:23 +0000 (10:21 -0700)]
bpf, sockmap: msg_pop_data can incorrecty set an sge length

When sk_msg_pop() is called where the pop operation is working on
the end of a sge element and there is no additional trailing data
and there _is_ data in front of pop, like the following case,

   |____________a_____________|__pop__|

We have out of order operations where we incorrectly set the pop
variable so that instead of zero'ing pop we incorrectly leave it
untouched, effectively. This can cause later logic to shift the
buffers around believing it should pop extra space. The result is
we have 'popped' more data then we expected potentially breaking
program logic.

It took us a while to hit this case because typically we pop headers
which seem to rarely be at the end of a scatterlist elements but
we can't rely on this.

Fixes: 7246d8ed4dcce ("bpf: helper to pop data from messages")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/158861288359.14306.7654891716919968144.stgit@john-Precision-5820-Tower
5 years agoneigh: send protocol value in neighbor create notification
Roman Mashak [Sat, 2 May 2020 01:34:18 +0000 (21:34 -0400)]
neigh: send protocol value in neighbor create notification

When a new neighbor entry has been added, event is generated but it does not
include protocol, because its value is assigned after the event notification
routine has run, so move protocol assignment code earlier.

Fixes: df9b0e30d44c ("neighbor: Add protocol attribute")
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: broadcom: fix a mistake about ioremap resource
Dejin Zheng [Tue, 5 May 2020 02:03:29 +0000 (10:03 +0800)]
net: broadcom: fix a mistake about ioremap resource

Commit d7a5502b0bb8b ("net: broadcom: convert to
devm_platform_ioremap_resource_byname()") will broke this driver.
idm_base and nicpm_base were optional, after this change, they are
mandatory. it will probe fails with -22 when the dtb doesn't have them
defined. so revert part of this commit and make idm_base and nicpm_base
as optional.

Fixes: d7a5502b0bb8bde ("net: broadcom: convert to devm_platform_ioremap_resource_byname()")
Reported-by: Jonathan Richardson <jonathan.richardson@broadcom.com>
Cc: Scott Branden <scott.branden@broadcom.com>
Cc: Ray Jui <ray.jui@broadcom.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoplatform/x86: thinkpad_acpi: Remove always false 'value < 0' statement
Xiongfeng Wang [Wed, 29 Apr 2020 08:59:40 +0000 (16:59 +0800)]
platform/x86: thinkpad_acpi: Remove always false 'value < 0' statement

Since 'value' is declared as unsigned long, the following statement is
always false.
value < 0

So let's remove it.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
5 years agoplatform/x86: intel_pmc_core: avoid unused-function warnings
Arnd Bergmann [Tue, 28 Apr 2020 21:36:38 +0000 (23:36 +0200)]
platform/x86: intel_pmc_core: avoid unused-function warnings

When both CONFIG_DEBUG_FS and CONFIG_PM_SLEEP are disabled, the
functions that got moved out of the #ifdef section now cause
a warning:

drivers/platform/x86/intel_pmc_core.c:654:13: error: 'pmc_core_lpm_display' defined but not used [-Werror=unused-function]
  654 | static void pmc_core_lpm_display(struct pmc_dev *pmcdev, struct device *dev,
      |             ^~~~~~~~~~~~~~~~~~~~
drivers/platform/x86/intel_pmc_core.c:617:13: error: 'pmc_core_slps0_display' defined but not used [-Werror=unused-function]
  617 | static void pmc_core_slps0_display(struct pmc_dev *pmcdev, struct device *dev,
      |             ^~~~~~~~~~~~~~~~~~~~~~

Rather than add even more #ifdefs here, remove them entirely and
let the compiler work it out, it can actually get rid of all the
debugfs calls without problems as long as the struct member is
there.

The two PM functions just need a __maybe_unused annotations to avoid
another warning instead of the #ifdef.

Fixes: aae43c2bcdc1 ("platform/x86: intel_pmc_core: Relocate pmc_core_*_display() to outside of CONFIG_DEBUG_FS")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
5 years agoplatform/x86: asus-nb-wmi: Do not load on Asus T100TA and T200TA
Hans de Goede [Wed, 22 Apr 2020 22:05:59 +0000 (00:05 +0200)]
platform/x86: asus-nb-wmi: Do not load on Asus T100TA and T200TA

asus-nb-wmi does not add any extra functionality on these Asus
Transformer books. They have detachable keyboards, so the hotkeys are
send through a HID device (and handled by the hid-asus driver) and also
the rfkill functionality is not used on these devices.

Besides not adding any extra functionality, initializing the WMI interface
on these devices actually has a negative side-effect. For some reason
the \_SB.ATKD.INIT() function which asus_wmi_platform_init() calls drives
GPO2 (INT33FC:02) pin 8, which is connected to the front facing webcam LED,
high and there is no (WMI or other) interface to drive this low again
causing the LED to be permanently on, even during suspend.

This commit adds a blacklist of DMI system_ids on which not to load the
asus-nb-wmi and adds these Transformer books to this list. This fixes
the webcam LED being permanently on under Linux.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
5 years agoplatform/x86: intel_pmc_core: Change Jasper Lake S0ix debug reg map back to ICL
Archana Patni [Tue, 21 Apr 2020 08:40:19 +0000 (14:10 +0530)]
platform/x86: intel_pmc_core: Change Jasper Lake S0ix debug reg map back to ICL

Jasper Lake uses Icelake PCH IPs and the S0ix debug interfaces are same as
Icelake. It uses SLP_S0_DBG register latch/read interface from Icelake
generation. It doesn't use Tiger Lake LPM debug registers. Change the
Jasper Lake S0ix debug interface to use the ICL reg map.

Fixes: 16292bed9c56 ("platform/x86: intel_pmc_core: Add Atom based Jasper Lake (JSL) platform support")
Signed-off-by: Archana Patni <archana.patni@intel.com>
Acked-by: David E. Box <david.e.box@intel.com>
Tested-by: Divagar Mohandass <divagar.mohandass@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
5 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Linus Torvalds [Tue, 5 May 2020 01:55:20 +0000 (18:55 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

 - Wacom driver functional and regression fixes from Jason Gerecke

 - race condition fix in usbhid, found by syzbot and fixed by Alan Stern

 - a few device-specific quirks and ID additions

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: quirks: Add HID_QUIRK_NO_INIT_REPORTS quirk for Dell K12A keyboard-dock
  HID: mcp2221: add gpiolib dependency
  HID: i2c-hid: reset Synaptics SYNA2393 on resume
  HID: wacom: Report 2nd-gen Intuos Pro S center button status over BT
  HID: usbhid: Fix race between usbhid_close() and usbhid_stop()
  Revert "HID: wacom: generic: read the number of expected touches on a per collection basis"
  HID: alps: ALPS_1657 is too specific; use U1_UNICORN_LEGACY instead
  HID: alps: Add AUI1657 device ID
  HID: logitech: Add support for Logitech G11 extra keys
  HID: multitouch: add eGalaxTouch P80H84 support
  HID: wacom: Read HID_DG_CONTACTMAX directly for non-generic devices

5 years agonfp: abm: fix a memory leak bug
Qiushi Wu [Sat, 2 May 2020 22:42:59 +0000 (17:42 -0500)]
nfp: abm: fix a memory leak bug

In function nfp_abm_vnic_set_mac, pointer nsp is allocated by nfp_nsp_open.
But when nfp_nsp_has_hwinfo_lookup fail, the pointer is not released,
which can lead to a memory leak bug. Fix this issue by adding
nfp_nsp_close(nsp) in the error path.

Fixes: f6e71efdf9fb1 ("nfp: abm: look up MAC addresses via management FW")
Signed-off-by: Qiushi Wu <wu000273@umn.edu>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoatm: fix a memory leak of vcc->user_back
Cong Wang [Fri, 1 May 2020 18:11:09 +0000 (11:11 -0700)]
atm: fix a memory leak of vcc->user_back

In lec_arp_clear_vccs() only entry->vcc is freed, but vcc
could be installed on entry->recv_vcc too in lec_vcc_added().

This fixes the following memory leak:

unreferenced object 0xffff8880d9266b90 (size 16):
  comm "atm2", pid 425, jiffies 4294907980 (age 23.488s)
  hex dump (first 16 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 6b 6b 6b a5  ............kkk.
  backtrace:
    [<(____ptrval____)>] kmem_cache_alloc_trace+0x10e/0x151
    [<(____ptrval____)>] lane_ioctl+0x4b3/0x569
    [<(____ptrval____)>] do_vcc_ioctl+0x1ea/0x236
    [<(____ptrval____)>] svc_ioctl+0x17d/0x198
    [<(____ptrval____)>] sock_do_ioctl+0x47/0x12f
    [<(____ptrval____)>] sock_ioctl+0x2f9/0x322
    [<(____ptrval____)>] vfs_ioctl+0x1e/0x2b
    [<(____ptrval____)>] ksys_ioctl+0x61/0x80
    [<(____ptrval____)>] __x64_sys_ioctl+0x16/0x19
    [<(____ptrval____)>] do_syscall_64+0x57/0x65
    [<(____ptrval____)>] entry_SYSCALL_64_after_hwframe+0x49/0xb3

Cc: Gengming Liu <l.dmxcsnsbh@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoatm: fix a UAF in lec_arp_clear_vccs()
Cong Wang [Fri, 1 May 2020 18:11:08 +0000 (11:11 -0700)]
atm: fix a UAF in lec_arp_clear_vccs()

Gengming reported a UAF in lec_arp_clear_vccs(),
where we add a vcc socket to an entry in a per-device
list but free the socket without removing it from the
list when vcc->dev is NULL.

We need to call lec_vcc_close() to search and remove
those entries contain the vcc being destroyed. This can
be done by calling vcc->push(vcc, NULL) unconditionally
in vcc_destroy_socket().

Another issue discovered by Gengming's reproducer is
the vcc->dev may point to the static device lecatm_dev,
for which we don't need to register/unregister device,
so we can just check for vcc->dev->ops->owner.

Reported-by: Gengming Liu <l.dmxcsnsbh@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: gmac5+: fix potential integer overflow on 32 bit multiply
Colin Ian King [Fri, 1 May 2020 14:10:16 +0000 (15:10 +0100)]
net: stmmac: gmac5+: fix potential integer overflow on 32 bit multiply

The multiplication of cfg->ctr[1] by 1000000000 is performed using a
32 bit multiplication (since cfg->ctr[1] is a u32) and this can lead
to a potential overflow. Fix this by making the constant a ULL to
ensure a 64 bit multiply occurs.

Fixes: 504723af0d85 ("net: stmmac: Add basic EST support for GMAC5+")
Addresses-Coverity: ("Unintentional integer overflow")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet_sched: fix tcm_parent in tc filter dump
Cong Wang [Fri, 1 May 2020 03:53:49 +0000 (20:53 -0700)]
net_sched: fix tcm_parent in tc filter dump

When we tell kernel to dump filters from root (ffff:ffff),
those filters on ingress (ffff:0000) are matched, but their
true parents must be dumped as they are. However, kernel
dumps just whatever we tell it, that is either ffff:ffff
or ffff:0000:

 $ nl-cls-list --dev=dummy0 --parent=root
 cls basic dev dummy0 id none parent root prio 49152 protocol ip match-all
 cls basic dev dummy0 id :1 parent root prio 49152 protocol ip match-all
 $ nl-cls-list --dev=dummy0 --parent=ffff:
 cls basic dev dummy0 id none parent ffff: prio 49152 protocol ip match-all
 cls basic dev dummy0 id :1 parent ffff: prio 49152 protocol ip match-all

This is confusing and misleading, more importantly this is
a regression since 4.15, so the old behavior must be restored.

And, when tc filters are installed on a tc class, the parent
should be the classid, rather than the qdisc handle. Commit
edf6711c9840 ("net: sched: remove classid and q fields from tcf_proto")
removed the classid we save for filters, we can just restore
this classid in tcf_block.

Steps to reproduce this:
 ip li set dev dummy0 up
 tc qd add dev dummy0 ingress
 tc filter add dev dummy0 parent ffff: protocol arp basic action pass
 tc filter show dev dummy0 root

Before this patch:
 filter protocol arp pref 49152 basic
 filter protocol arp pref 49152 basic handle 0x1
action order 1: gact action pass
 random type none pass val 0
 index 1 ref 1 bind 1

After this patch:
 filter parent ffff: protocol arp pref 49152 basic
 filter parent ffff: protocol arp pref 49152 basic handle 0x1
  action order 1: gact action pass
   random type none pass val 0
 index 1 ref 1 bind 1

Fixes: a10fa20101ae ("net: sched: propagate q and parent from caller down to tcf_fill_node")
Fixes: edf6711c9840 ("net: sched: remove classid and q fields from tcf_proto")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4/chcr: avoid -Wreturn-local-addr warning
Arnd Bergmann [Thu, 30 Apr 2020 10:39:02 +0000 (12:39 +0200)]
cxgb4/chcr: avoid -Wreturn-local-addr warning

gcc-10 warns about functions that return a pointer to a stack
variable. In chcr_write_cpl_set_tcb_ulp(), this does not actually
happen, but it's too hard to see for the compiler:

drivers/crypto/chelsio/chcr_ktls.c: In function 'chcr_write_cpl_set_tcb_ulp.constprop':
drivers/crypto/chelsio/chcr_ktls.c:760:9: error: function may return address of local variable [-Werror=return-local-addr]
  760 |  return pos;
      |         ^~~
drivers/crypto/chelsio/chcr_ktls.c:712:5: note: declared here
  712 |  u8 buf[48] = {0};
      |     ^~~

Split the middle part of the function out into a helper to make
it easier to understand by both humans and compilers, which avoids
the warning.

Fixes: 5a4b9fe7fece ("cxgb4/chcr: complete record tx handling")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agos390/qeth: fix cancelling of TX timer on dev_close()
Julian Wiedmann [Mon, 4 May 2020 17:39:42 +0000 (19:39 +0200)]
s390/qeth: fix cancelling of TX timer on dev_close()

With the introduction of TX coalescing, .ndo_start_xmit now potentially
starts the TX completion timer. So only kill the timer _after_ TX has
been disabled.

Fixes: ee1e52d1e4bb ("s390/qeth: add TX IRQ coalescing support for IQD devices")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'gcc-plugins-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 4 May 2020 18:20:32 +0000 (11:20 -0700)]
Merge tag 'gcc-plugins-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull gcc-plugins fixes from Kees Cook:
 "GCC 10 fixes for gcc-plugins:

   - Adjust caller of cgraph_create_edge for GCC 10 argument usage

   - Update common headers to build under GCC 10 (Frédéric Pierret)"

* tag 'gcc-plugins-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  gcc-common.h: Update for GCC 10
  gcc-plugins/stackleak: Avoid assignment for unused macro argument

5 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Mon, 4 May 2020 18:10:24 +0000 (11:10 -0700)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
 "A couple of bug fixes"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost: vsock: kick send_pkt worker once device is started
  virtio-blk: handle block_device_operations callbacks after hot unplug

5 years agonet: enetc: fix an issue about leak system resources
Dejin Zheng [Mon, 4 May 2020 12:01:27 +0000 (20:01 +0800)]
net: enetc: fix an issue about leak system resources

the related system resources were not released when enetc_hw_alloc()
return error in the enetc_pci_mdio_probe(), add iounmap() for error
handling label "err_hw_alloc" to fix it.

Fixes: 6517798dd3432a ("enetc: Make MDIO accessors more generic and export to include/linux/fsl")
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'flexible-array-member-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Mon, 4 May 2020 17:47:41 +0000 (10:47 -0700)]
Merge tag 'flexible-array-member-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull flex-array reverts from Gustavo Silva:
 "This reverts flexible array changes in include/uapi/

  These structures can get embedded in other structures in user-space
  and cause all sorts of warnings and problems[1]. So, we better don't
  take any chances and keep the zero-length arrays in place for now"

[1] https://lore.kernel.org/lkml/20200424121553.GE26002@ziepe.ca/

* tag 'flexible-array-member-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  uapi: revert flexible-array conversions

5 years agonet/mlx4_core: Fix use of ENOSPC around mlx4_counter_alloc()
Tariq Toukan [Mon, 4 May 2020 08:36:02 +0000 (11:36 +0300)]
net/mlx4_core: Fix use of ENOSPC around mlx4_counter_alloc()

When ENOSPC is set the idx is still valid and gets set to the global
MLX4_SINK_COUNTER_INDEX.  However gcc's static analysis cannot tell that
ENOSPC is impossible from mlx4_cmd_imm() and gives this warning:

drivers/net/ethernet/mellanox/mlx4/main.c:2552:28: warning: 'idx' may be
used uninitialized in this function [-Wmaybe-uninitialized]
 2552 |    priv->def_counter[port] = idx;

Also, when ENOSPC is returned mlx4_allocate_default_counters should not
fail.

Fixes: 6de5f7f6a1fa ("net/mlx4_core: Allocate default counter per port")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Fix reporter's recovery condition
Aya Levin [Mon, 4 May 2020 08:27:46 +0000 (11:27 +0300)]
devlink: Fix reporter's recovery condition

Devlink health core conditions the reporter's recovery with the
expiration of the grace period. This is not relevant for the first
recovery. Explicitly demand that the grace period will only apply to
recoveries other than the first.

Fixes: c8e1da0bf923 ("devlink: Add health report functionality")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agostmmac: fix pointer check after utilization in stmmac_interrupt
Maxim Petrov [Mon, 4 May 2020 06:26:43 +0000 (09:26 +0300)]
stmmac: fix pointer check after utilization in stmmac_interrupt

The paranoidal pointer check in IRQ handler looks very strange - it
really protects us only against bogus drivers which request IRQ line
with null pointer dev_id. However, the code fragment is incorrect
because the dev pointer is used before the actual check which leads
to undefined behavior. Remove the check to avoid confusing people
with incorrect code.

Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: fix partial topology connection closure
Tuong Lien [Mon, 4 May 2020 04:15:54 +0000 (11:15 +0700)]
tipc: fix partial topology connection closure

When an application connects to the TIPC topology server and subscribes
to some services, a new connection is created along with some objects -
'tipc_subscription' to store related data correspondingly...
However, there is one omission in the connection handling that when the
connection or application is orderly shutdown (e.g. via SIGQUIT, etc.),
the connection is not closed in kernel, the 'tipc_subscription' objects
are not freed too.
This results in:
- The maximum number of subscriptions (65535) will be reached soon, new
subscriptions will be rejected;
- TIPC module cannot be removed (unless the objects  are somehow forced
to release first);

The commit fixes the issue by closing the connection if the 'recvmsg()'
returns '0' i.e. when the peer is shutdown gracefully. It also includes
the other unexpected cases.

Acked-by: Jon Maloy <jmaloy@redhat.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: Do not make user port errors fatal
Florian Fainelli [Mon, 4 May 2020 03:50:57 +0000 (20:50 -0700)]
net: dsa: Do not make user port errors fatal

Prior to 1d27732f411d ("net: dsa: setup and teardown ports"), we would
not treat failures to set-up an user port as fatal, but after this
commit we would, which is a regression for some systems where interfaces
may be declared in the Device Tree, but the underlying hardware may not
be present (pluggable daughter cards for instance).

Fixes: 1d27732f411d ("net: dsa: setup and teardown ports")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agouapi: revert flexible-array conversions
Gustavo A. R. Silva [Fri, 24 Apr 2020 15:50:00 +0000 (10:50 -0500)]
uapi: revert flexible-array conversions

These structures can get embedded in other structures in user-space
and cause all sorts of warnings and problems. So, we better don't take
any chances and keep the zero-length arrays in place for now.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
5 years agogcc-10 warnings: fix low-hanging fruit
Linus Torvalds [Mon, 4 May 2020 16:16:37 +0000 (09:16 -0700)]
gcc-10 warnings: fix low-hanging fruit

Due to a bug-report that was compiler-dependent, I updated one of my
machines to gcc-10.  That shows a lot of new warnings.  Happily they
seem to be mostly the valid kind, but it's going to cause a round of
churn for getting rid of them..

This is the really low-hanging fruit of removing a couple of zero-sized
arrays in some core code.  We have had a round of these patches before,
and we'll have many more coming, and there is nothing special about
these except that they were particularly trivial, and triggered more
warnings than most.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agoHID: quirks: Add HID_QUIRK_NO_INIT_REPORTS quirk for Dell K12A keyboard-dock
Hans de Goede [Sat, 2 May 2020 18:18:42 +0000 (20:18 +0200)]
HID: quirks: Add HID_QUIRK_NO_INIT_REPORTS quirk for Dell K12A keyboard-dock

Add a HID_QUIRK_NO_INIT_REPORTS quirk for the Dell K12A keyboard-dock,
which can be used with various Dell Venue 11 models.

Without this quirk the keyboard/touchpad combo works fine when connected
at boot, but when hotplugged 9 out of 10 times it will not work properly.
Adding the quirk fixes this.

Cc: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
5 years agonet: macb: fix an issue about leak related system resources
Dejin Zheng [Sun, 3 May 2020 12:32:26 +0000 (20:32 +0800)]
net: macb: fix an issue about leak related system resources

A call of the function macb_init() can fail in the function
fu540_c000_init. The related system resources were not released
then. use devm_platform_ioremap_resource() to replace ioremap()
to fix it.

Fixes: c218ad559020ff9 ("macb: Add support for SiFive FU540-C000")
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Yash Shah <yash.shah@sifive.com>
Suggested-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: usb: qmi_wwan: add support for DW5816e
Matt Jolly [Sat, 2 May 2020 15:52:28 +0000 (01:52 +1000)]
net: usb: qmi_wwan: add support for DW5816e

Add support for Dell Wireless 5816e to drivers/net/usb/qmi_wwan.c

Signed-off-by: Matt Jolly <Kangie@footclan.ninja>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet_sched: sch_skbprio: add message validation to skbprio_change()
Eric Dumazet [Sun, 3 May 2020 03:09:25 +0000 (20:09 -0700)]
net_sched: sch_skbprio: add message validation to skbprio_change()

Do not assume the attribute has the right size.

Fixes: aea5f654e6b7 ("net/sched: add skbprio scheduler")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoLinux 5.7-rc4 v5.7-rc4
Linus Torvalds [Sun, 3 May 2020 21:56:04 +0000 (14:56 -0700)]
Linux 5.7-rc4

5 years agoMerge tag 'for-5.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Sun, 3 May 2020 18:30:08 +0000 (11:30 -0700)]
Merge tag 'for-5.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull more btrfs fixes from David Sterba:
 "A few more stability fixes, minor build warning fixes and git url
  fixup:

   - fix partial loss of prealloc extent past i_size after fsync

   - fix potential deadlock due to wrong transaction handle passing via
     journal_info

   - fix gcc 4.8 struct intialization warning

   - update git URL in MAINTAINERS entry"

* tag 'for-5.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  MAINTAINERS: btrfs: fix git repo URL
  btrfs: fix gcc-4.8 build warning for struct initializer
  btrfs: transaction: Avoid deadlock due to bad initialization timing of fs_info::journal_info
  btrfs: fix partial loss of prealloc extent past i_size after fsync

5 years agoMerge tag 'iommu-fixes-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 3 May 2020 18:04:57 +0000 (11:04 -0700)]
Merge tag 'iommu-fixes-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:

 - Fix a memory leak when dev_iommu gets freed and a sub-pointer does
   not

 - Build dependency fixes for Mediatek, spapr_tce, and Intel IOMMU
   driver

 - Export iommu_group_get_for_dev() only for GPLed modules

 - Fix AMD IOMMU interrupt remapping when x2apic is enabled

 - Fix error path in the QCOM IOMMU driver probe function

* tag 'iommu-fixes-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/qcom: Fix local_base status check
  iommu: Properly export iommu_group_get_for_dev()
  iommu/vt-d: Use right Kconfig option name
  iommu/amd: Fix legacy interrupt remapping for x2APIC-enabled system
  iommu: spapr_tce: Disable compile testing to fix build on book3s_32 config
  iommu/mediatek: Fix MTK_IOMMU dependencies
  iommu: Fix the memory leak in dev_iommu_free()

5 years agoMAINTAINERS: btrfs: fix git repo URL
Eric Biggers [Fri, 1 May 2020 23:44:17 +0000 (16:44 -0700)]
MAINTAINERS: btrfs: fix git repo URL

The git repo listed for btrfs hasn't been updated in over a year.
List the current one instead.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
5 years agoMerge tag 'pm-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Sat, 2 May 2020 20:45:30 +0000 (13:45 -0700)]
Merge tag 'pm-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:

 - prevent the intel_pstate driver from printing excessive diagnostic
   messages in some cases (Chris Wilson)

 - make the hibernation restore kernel freeze kernel threads as well as
   user space tasks (Dexuan Cui)

 - fix the ACPI device PM disagnostic messages to include the correct
   power state name (Kai-Heng Feng).

* tag 'pm-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: ACPI: Output correct message on target power state
  PM: hibernate: Freeze kernel threads in software_resume()
  cpufreq: intel_pstate: Only mention the BIOS disabling turbo mode once

5 years agoMerge branches 'pm-cpufreq' and 'pm-sleep'
Rafael J. Wysocki [Sat, 2 May 2020 19:39:17 +0000 (21:39 +0200)]
Merge branches 'pm-cpufreq' and 'pm-sleep'

* pm-cpufreq:
  cpufreq: intel_pstate: Only mention the BIOS disabling turbo mode once

* pm-sleep:
  PM: hibernate: Freeze kernel threads in software_resume()

5 years agoMerge tag 'iomap-5.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sat, 2 May 2020 18:31:12 +0000 (11:31 -0700)]
Merge tag 'iomap-5.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap fix from Darrick Wong:
 "Hoist the check for an unrepresentable FIBMAP return value into
  ioctl_fibmap.

  The internal kernel function can handle 64-bit values (and is needed
  to fix a regression on ext4 + jbd2). It is only the userspace ioctl
  that is so old that it cannot deal"

* tag 'iomap-5.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  fibmap: Warn and return an error in case of block > INT_MAX

5 years agoMerge tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Linus Torvalds [Sat, 2 May 2020 18:24:01 +0000 (11:24 -0700)]
Merge tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

  Stable fixes:
   - fix handling of backchannel binding in BIND_CONN_TO_SESSION

  Bugfixes:
   - Fix a credential use-after-free issue in pnfs_roc()
   - Fix potential posix_acl refcnt leak in nfs3_set_acl
   - defer slow parts of rpc_free_client() to a workqueue
   - Fix an Oopsable race in __nfs_list_for_each_server()
   - Fix trace point use-after-free race
   - Regression: the RDMA client no longer responds to server disconnect
     requests
   - Fix return values of xdr_stream_encode_item_{present, absent}
   - _pnfs_return_layout() must always wait for layoutreturn completion

  Cleanups:
   - Remove unreachable error conditions"

* tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Fix a race in __nfs_list_for_each_server()
  NFSv4.1: fix handling of backchannel binding in BIND_CONN_TO_SESSION
  SUNRPC: defer slow parts of rpc_free_client() to a workqueue.
  NFSv4: Remove unreachable error condition due to rpc_run_task()
  SUNRPC: Remove unreachable error condition
  xprtrdma: Fix use of xdr_stream_encode_item_{present, absent}
  xprtrdma: Fix trace point use-after-free race
  xprtrdma: Restore wake-up-all to rpcrdma_cm_event_handler()
  nfs: Fix potential posix_acl refcnt leak in nfs3_set_acl
  NFS/pnfs: Fix a credential use-after-free issue in pnfs_roc()
  NFS/pnfs: Ensure that _pnfs_return_layout() waits for layoutreturn completion

5 years agoMerge tag 'dmaengine-fix-5.7-rc4' of git://git.infradead.org/users/vkoul/slave-dma
Linus Torvalds [Sat, 2 May 2020 18:16:14 +0000 (11:16 -0700)]
Merge tag 'dmaengine-fix-5.7-rc4' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
 "Core:
   - Documentation typo fixes
   - fix the channel indexes
   - dmatest: fixes for process hang and iterations

  Drivers:
   - hisilicon: build error fix without PCI_MSI
   - ti-k3: deadlock fix
   - uniphier-xdmac: fix for reg region
   - pch: fix data race
   - tegra: fix clock state"

* tag 'dmaengine-fix-5.7-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: dmatest: Fix process hang when reading 'wait' parameter
  dmaengine: dmatest: Fix iteration non-stop logic
  dmaengine: tegra-apb: Ensure that clock is enabled during of DMA synchronization
  dmaengine: fix channel index enumeration
  dmaengine: mmp_tdma: Reset channel error on release
  dmaengine: mmp_tdma: Do not ignore slave config validation errors
  dmaengine: pch_dma.c: Avoid data race between probe and irq handler
  dt-bindings: dma: uniphier-xdmac: switch to single reg region
  include/linux/dmaengine: Typos fixes in API documentation
  dmaengine: xilinx_dma: Add missing check for empty list
  dmaengine: ti: k3-psil: fix deadlock on error path
  dmaengine: hisilicon: Fix build error without PCI_MSI

5 years agovhost: vsock: kick send_pkt worker once device is started
Jia He [Fri, 1 May 2020 04:38:40 +0000 (12:38 +0800)]
vhost: vsock: kick send_pkt worker once device is started

Ning Bo reported an abnormal 2-second gap when booting Kata container [1].
The unconditional timeout was caused by VSOCK_DEFAULT_CONNECT_TIMEOUT of
connecting from the client side. The vhost vsock client tries to connect
an initializing virtio vsock server.

The abnormal flow looks like:
host-userspace           vhost vsock                       guest vsock
==============           ===========                       ============
connect()     -------->  vhost_transport_send_pkt_work()   initializing
   |                     vq->private_data==NULL
   |                     will not be queued
   V
schedule_timeout(2s)
                         vhost_vsock_start()  <---------   device ready
                         set vq->private_data

wait for 2s and failed
connect() again          vq->private_data!=NULL         recv connecting pkt

Details:
1. Host userspace sends a connect pkt, at that time, guest vsock is under
   initializing, hence the vhost_vsock_start has not been called. So
   vq->private_data==NULL, and the pkt is not been queued to send to guest
2. Then it sleeps for 2s
3. After guest vsock finishes initializing, vq->private_data is set
4. When host userspace wakes up after 2s, send connecting pkt again,
   everything is fine.

As suggested by Stefano Garzarella, this fixes it by additional kicking the
send_pkt worker in vhost_vsock_start once the virtio device is started. This
makes the pending pkt sent again.

After this patch, kata-runtime (with vsock enabled) boot time is reduced
from 3s to 1s on a ThunderX2 arm64 server.

[1] https://github.com/kata-containers/runtime/issues/1917

Reported-by: Ning Bo <n.b@live.com>
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jia He <justin.he@arm.com>
Link: https://lore.kernel.org/r/20200501043840.186557-1-justin.he@arm.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
5 years agovirtio-blk: handle block_device_operations callbacks after hot unplug
Stefan Hajnoczi [Thu, 30 Apr 2020 14:04:42 +0000 (15:04 +0100)]
virtio-blk: handle block_device_operations callbacks after hot unplug

A userspace process holding a file descriptor to a virtio_blk device can
still invoke block_device_operations after hot unplug.  This leads to a
use-after-free accessing vblk->vdev in virtblk_getgeo() when
ioctl(HDIO_GETGEO) is invoked:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
  IP: [<ffffffffc00e5450>] virtio_check_driver_offered_feature+0x10/0x90 [virtio]
  PGD 800000003a92f067 PUD 3a930067 PMD 0
  Oops: 0000 [#1] SMP
  CPU: 0 PID: 1310 Comm: hdio-getgeo Tainted: G           OE  ------------   3.10.0-1062.el7.x86_64 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  task: ffff9be5fbfb8000 ti: ffff9be5fa890000 task.ti: ffff9be5fa890000
  RIP: 0010:[<ffffffffc00e5450>]  [<ffffffffc00e5450>] virtio_check_driver_offered_feature+0x10/0x90 [virtio]
  RSP: 0018:ffff9be5fa893dc8  EFLAGS: 00010246
  RAX: ffff9be5fc3f3400 RBX: ffff9be5fa893e30 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff9be5fbc10b40
  RBP: ffff9be5fa893dc8 R08: 0000000000000301 R09: 0000000000000301
  R10: 0000000000000000 R11: 0000000000000000 R12: ffff9be5fdc24680
  R13: ffff9be5fbc10b40 R14: ffff9be5fbc10480 R15: 0000000000000000
  FS:  00007f1bfb968740(0000) GS:ffff9be5ffc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000090 CR3: 000000003a894000 CR4: 0000000000360ff0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   [<ffffffffc016ac37>] virtblk_getgeo+0x47/0x110 [virtio_blk]
   [<ffffffff8d3f200d>] ? handle_mm_fault+0x39d/0x9b0
   [<ffffffff8d561265>] blkdev_ioctl+0x1f5/0xa20
   [<ffffffff8d488771>] block_ioctl+0x41/0x50
   [<ffffffff8d45d9e0>] do_vfs_ioctl+0x3a0/0x5a0
   [<ffffffff8d45dc81>] SyS_ioctl+0xa1/0xc0

A related problem is that virtblk_remove() leaks the vd_index_ida index
when something still holds a reference to vblk->disk during hot unplug.
This causes virtio-blk device names to be lost (vda, vdb, etc).

Fix these issues by protecting vblk->vdev with a mutex and reference
counting vblk so the vd_index_ida index can be removed in all cases.

Fixes: 48e4043d4529 ("virtio: add virtio disk geometry feature")
Reported-by: Lance Digby <ldigby@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20200430140442.171016-1-stefanha@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
5 years agoMerge tag 'vfio-v5.7-rc4' of git://github.com/awilliam/linux-vfio
Linus Torvalds [Sat, 2 May 2020 00:19:15 +0000 (17:19 -0700)]
Merge tag 'vfio-v5.7-rc4' of git://github.com/awilliam/linux-vfio

Pull VFIO fixes from Alex Williamson:

 - copy_*_user validity check for new vfio_dma_rw interface (Yan Zhao)

 - Fix a potential math overflow (Yan Zhao)

 - Use follow_pfn() for calculating PFNMAPs (Sean Christopherson)

* tag 'vfio-v5.7-rc4' of git://github.com/awilliam/linux-vfio:
  vfio/type1: Fix VA->PA translation for PFNMAP VMAs in vaddr_get_pfn()
  vfio: avoid possible overflow in vfio_iommu_type1_pin_pages
  vfio: checking of validity of user vaddr in vfio_dma_rw