David S. Miller [Wed, 9 Sep 2020 18:31:38 +0000 (11:31 -0700)]
Merge branch 'wireguard-fixes'
Jason A. Donenfeld says:
====================
wireguard fixes for 5.9-rc5
Yesterday, Eric reported a race condition found by syzbot. This series
contains two commits, one that fixes the direct issue, and another that
addresses the more general issue, as a defense in depth.
1) The basic problem syzbot unearthed was that one particular mutation
of handshake->entry was not protected by the handshake mutex like the
other cases, so this patch basically just reorders a line to make
sure the mutex is actually taken at the right point. Most of the work
here went into making sure the race was fully understood and making a
reproducer (which syzbot was unable to do itself, due to the rarity
of the race).
2) Eric's initial suggestion for fixing this was taking a spinlock
around the hash table replace function where the null ptr deref was
happening. This doesn't address the main problem in the most precise
possible way like (1) does, but it is a good suggestion for
defense-in-depth, in case related issues come up in the future, and
basically costs nothing from a performance perspective. I thought it
aided in implementing a good general rule: all mutators of that hash
table take the table lock. So that's part of this series as a
companion.
Both of these contain Fixes: tags and are good candidates for stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason A. Donenfeld [Wed, 9 Sep 2020 11:58:15 +0000 (13:58 +0200)]
wireguard: peerlookup: take lock before checking hash in replace operation
Eric's suggested fix for the previous commit's mentioned race condition
was to simply take the table->lock in wg_index_hashtable_replace(). The
table->lock of the hash table is supposed to protect the bucket heads,
not the entires, but actually, since all the mutator functions are
already taking it, it makes sense to take it too for the test to
hlist_unhashed, as a defense in depth measure, so that it no longer
races with deletions, regardless of what other locks are protecting
individual entries. This is sensible from a performance perspective
because, as Eric pointed out, the case of being unhashed is already the
unlikely case, so this won't add common contention. And comparing
instructions, this basically doesn't make much of a difference other
than pushing and popping %r13, used by the new `bool ret`. More
generally, I like the idea of locking consistency across table mutator
functions, and this might let me rest slightly easier at night.
Jason A. Donenfeld [Wed, 9 Sep 2020 11:58:14 +0000 (13:58 +0200)]
wireguard: noise: take lock when removing handshake entry from table
Eric reported that syzkaller found a race of this variety:
CPU 1 CPU 2
-------------------------------------------|---------------------------------------
wg_index_hashtable_replace(old, ...) |
if (hlist_unhashed(&old->index_hash)) |
| wg_index_hashtable_remove(old)
| hlist_del_init_rcu(&old->index_hash)
| old->index_hash.pprev = NULL
hlist_replace_rcu(&old->index_hash, ...) |
*old->index_hash.pprev |
Syzbot wasn't actually able to reproduce this more than once or create a
reproducer, because the race window between checking "hlist_unhashed" and
calling "hlist_replace_rcu" is just so small. Adding an mdelay(5) or
similar there helps make this demonstrable using this simple script:
#!/bin/bash
set -ex
trap 'kill $pid1; kill $pid2; ip link del wg0; ip link del wg1' EXIT
ip link add wg0 type wireguard
ip link add wg1 type wireguard
wg set wg0 private-key <(wg genkey) listen-port 9999
wg set wg1 private-key <(wg genkey) peer $(wg show wg0 public-key) endpoint 127.0.0.1:9999 persistent-keepalive 1
wg set wg0 peer $(wg show wg1 public-key)
ip link set wg0 up
yes link set wg1 up | ip -force -batch - &
pid1=$!
yes link set wg1 down | ip -force -batch - &
pid2=$!
wait
The fundumental underlying problem is that we permit calls to wg_index_
hashtable_remove(handshake.entry) without requiring the caller to take
the handshake mutex that is intended to protect members of handshake
during mutations. This is consistently the case with calls to wg_index_
hashtable_insert(handshake.entry) and wg_index_hashtable_replace(
handshake.entry), but it's missing from a pertinent callsite of wg_
index_hashtable_remove(handshake.entry). So, this patch makes sure that
mutex is taken.
The original code was a little bit funky though, in the form of:
The original intention of that double removal pattern outside the lock
appears to be some attempt to prevent insertions that might happen while
locks are dropped during expensive crypto operations, but actually, all
callers of wg_index_hashtable_insert(handshake.entry) take the write
lock and then explicitly check handshake.state, as they should, which
the aforementioned memzero clears, which means an insertion should
already be impossible. And regardless, the original intention was
necessarily racy, since it wasn't guaranteed that something else would
run after the unlock() instead of after the remove(). So, from a
soundness perspective, it seems positive to remove what looks like a
hack at best.
The crash from both syzbot and from the script above is as follows:
Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/wireguard/20200908145911.4090480-1-edumazet@google.com/ Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ye Bin [Wed, 9 Sep 2020 09:38:21 +0000 (17:38 +0800)]
hsr: avoid newline at end of message in NL_SET_ERR_MSG_MOD
clean follow coccicheck warning:
net//hsr/hsr_netlink.c:94:8-42: WARNING avoid newline at end of message
in NL_SET_ERR_MSG_MOD
net//hsr/hsr_netlink.c:87:30-57: WARNING avoid newline at end of message
in NL_SET_ERR_MSG_MOD
net//hsr/hsr_netlink.c:79:29-53: WARNING avoid newline at end of message
in NL_SET_ERR_MSG_MOD
Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 9 Sep 2020 08:27:39 +0000 (01:27 -0700)]
net: qrtr: check skb_put_padto() return value
If skb_put_padto() returns an error, skb has been freed.
Better not touch it anymore, as reported by syzbot [1]
Note to qrtr maintainers : this suggests qrtr_sendmsg()
should adjust sock_alloc_send_skb() second parameter
to account for the potential added alignment to avoid
reallocation.
[1]
BUG: KASAN: use-after-free in __skb_insert include/linux/skbuff.h:1907 [inline]
BUG: KASAN: use-after-free in __skb_queue_before include/linux/skbuff.h:2016 [inline]
BUG: KASAN: use-after-free in __skb_queue_tail include/linux/skbuff.h:2049 [inline]
BUG: KASAN: use-after-free in skb_queue_tail+0x6b/0x120 net/core/skbuff.c:3146
Write of size 8 at addr ffff88804d8ab3c0 by task syz-executor.4/4316
Wei Wang [Tue, 8 Sep 2020 21:09:34 +0000 (14:09 -0700)]
ip: fix tos reflection in ack and reset packets
Currently, in tcp_v4_reqsk_send_ack() and tcp_v4_send_reset(), we
echo the TOS value of the received packets in the response.
However, we do not want to echo the lower 2 ECN bits in accordance
with RFC 3168 6.1.5 robustness principles.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 9 Sep 2020 03:12:58 +0000 (20:12 -0700)]
Merge tag 'ieee802154-for-davem-2020-09-08' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2020-09-08
An update from ieee802154 for your *net* tree.
A potential memory leak fix for ca8210 from Liu Jian,
a check on the return for a register read in adf7242
and finally a user after free fix in the softmac tx
function from Eric found by syzkaller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Brian Vazquez [Tue, 8 Sep 2020 16:18:12 +0000 (09:18 -0700)]
fib: fix fib_rule_ops indirect call wrappers when CONFIG_IPV6=m
If CONFIG_IPV6=m, the IPV6 functions won't be found by the linker:
ld: net/core/fib_rules.o: in function `fib_rules_lookup':
fib_rules.c:(.text+0x606): undefined reference to `fib6_rule_match'
ld: fib_rules.c:(.text+0x611): undefined reference to `fib6_rule_match'
ld: fib_rules.c:(.text+0x68c): undefined reference to `fib6_rule_action'
ld: fib_rules.c:(.text+0x693): undefined reference to `fib6_rule_action'
ld: fib_rules.c:(.text+0x6aa): undefined reference to `fib6_rule_suppress'
ld: fib_rules.c:(.text+0x6bc): undefined reference to `fib6_rule_suppress'
make: *** [Makefile:1166: vmlinux] Error 1
Reported-by: Sven Joachim <svenjoac@gmx.de> Fixes: b9aaec8f0be5 ("fib: use indirect call wrappers in the most common fib_rules_ops") Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 421842edeaf6 ("net/ipv6: Add fib6_null_entry") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Ahern <dsahern@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 7 Sep 2020 23:48:42 +0000 (02:48 +0300)]
net: dsa: link interfaces with the DSA master to get rid of lockdep warnings
Since commit 845e0ebb4408 ("net: change addr_list_lock back to static
key"), cascaded DSA setups (DSA switch port as DSA master for another
DSA switch port) are emitting this lockdep warning:
============================================
WARNING: possible recursive locking detected 5.8.0-rc1-00133-g923e4b5032dd-dirty #208 Not tainted
--------------------------------------------
dhcpcd/323 is trying to acquire lock: ffff000066dd4268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90
but task is already holding lock: ffff00006608c268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90
other info that might help us debug this:
Possible unsafe locking scenario:
Since DSA never made use of the netdev API for describing links between
upper devices and lower devices, the dev->lower_level value of a DSA
switch interface would be 1, which would warn when it is a DSA master.
We can use netdev_upper_dev_link() to describe the relationship between
a DSA slave and a DSA master. To be precise, a DSA "slave" (switch port)
is an "upper" to a DSA "master" (host port). The relationship is "many
uppers to one lower", like in the case of VLAN. So, for that reason, we
use the same function as VLAN uses.
There might be a chance that somebody will try to take hold of this
interface and use it immediately after register_netdev() and before
netdev_upper_dev_link(). To avoid that, we do the registration and
linkage while holding the RTNL, and we use the RTNL-locked cousin of
register_netdev(), which is register_netdevice().
Since this warning was not there when lockdep was using dynamic keys for
addr_list_lock, we are blaming the lockdep patch itself. The network
stack _has_ been using static lockdep keys before, and it _is_ likely
that stacked DSA setups have been triggering these lockdep warnings
since forever, however I can't test very old kernels on this particular
stacked DSA setup, to ensure I'm not in fact introducing regressions.
Fixes: 845e0ebb4408 ("net: change addr_list_lock back to static key") Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Memory state around the buggy address: ffff8880251a8b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880251a8b80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8880251a8c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff8880251a8c80: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff8880251a8d00: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
Fixes: 409c3b0c5f03 ("mac802154: tx: move stats tx increment") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Alexander Aring <alex.aring@gmail.com> Cc: Stefan Schmidt <stefan@datenfreihafen.org> Cc: linux-wpan@vger.kernel.org Link: https://lore.kernel.org/r/20200908104025.4009085-1-edumazet@google.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
netfilter: conntrack: nf_conncount_init is failing with IPv6 disabled
The openvswitch module fails initialization when used in a kernel
without IPv6 enabled. nf_conncount_init() fails because the ct code
unconditionally tries to initialize the netns IPv6 related bit,
regardless of the build option. The change below ignores the IPv6
part if not enabled.
Note that the corresponding _put() function already has this IPv6
configuration check.
Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Martin Willi [Tue, 1 Sep 2020 06:56:19 +0000 (08:56 +0200)]
netfilter: ctnetlink: fix mark based dump filtering regression
conntrack mark based dump filtering may falsely skip entries if a mask
is given: If the mask-based check does not filter out the entry, the
else-if check is always true and compares the mark without considering
the mask. The if/else-if logic seems wrong.
Given that the mask during filter setup is implicitly set to 0xffffffff
if not specified explicitly, the mark filtering flags seem to just
complicate things. Restore the previously used approach by always
matching against a zero mask is no filter mark is given.
Fixes: cb8aa9a3affb ("netfilter: ctnetlink: add kernel side filtering for dump") Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Thu, 27 Aug 2020 17:28:42 +0000 (19:28 +0200)]
netfilter: nf_tables: coalesce multiple notifications into one skbuff
On x86_64, each notification results in one skbuff allocation which
consumes at least 768 bytes due to the skbuff overhead.
This patch coalesces several notifications into one single skbuff, so
each notification consumes at least ~211 bytes, that ~3.5 times less
memory consumption. As a result, this is reducing the chances to exhaust
the netlink socket receive buffer.
Rule of thumb is that each notification batch only contains netlink
messages whose report flag is the same, nfnetlink_send() requires this
to do appropriate delivery to userspace, either via unicast (echo
mode) or multicast (monitor mode).
The skbuff control buffer is used to annotate the report flag for later
handling at the new coalescing routine.
The batch skbuff notification size is NLMSG_GOODSIZE, using a larger
skbuff would allow for more socket receiver buffer savings (to amortize
the cost of the skbuff even more), however, going over that size might
break userspace applications, so let's be conservative and stick to
NLMSG_GOODSIZE.
Reported-by: Phil Sutter <phil@nwl.cc> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Will McVicker [Mon, 24 Aug 2020 19:38:32 +0000 (19:38 +0000)]
netfilter: ctnetlink: add a range check for l3/l4 protonum
The indexes to the nf_nat_l[34]protos arrays come from userspace. So
check the tuple's family, e.g. l3num, when creating the conntrack in
order to prevent an OOB memory access during setup. Here is an example
kernel panic on 4.14.180 when userspace passes in an index greater than
NFPROTO_NUMPROTO.
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:...
Process poc (pid: 5614, stack limit = 0x00000000a3933121)
CPU: 4 PID: 5614 Comm: poc Tainted: G S W O 4.14.180-g051355490483
Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM
task: 000000002a3dfffe task.stack: 00000000a3933121
pc : __cfi_check_fail+0x1c/0x24
lr : __cfi_check_fail+0x1c/0x24
...
Call trace:
__cfi_check_fail+0x1c/0x24
name_to_dev_t+0x0/0x468
nfnetlink_parse_nat_setup+0x234/0x258
ctnetlink_parse_nat_setup+0x4c/0x228
ctnetlink_new_conntrack+0x590/0xc40
nfnetlink_rcv_msg+0x31c/0x4d4
netlink_rcv_skb+0x100/0x184
nfnetlink_rcv+0xf4/0x180
netlink_unicast+0x360/0x770
netlink_sendmsg+0x5a0/0x6a4
___sys_sendmsg+0x314/0x46c
SyS_sendmsg+0xb4/0x108
el0_svc_naked+0x34/0x38
This crash is not happening since 5.4+, however, ctnetlink still
allows for creating entries with unsupported layer 3 protocol number.
Fixes: c1d10adb4a521 ("[NETFILTER]: Add ctnetlink port for nf_conntrack") Signed-off-by: Will McVicker <willmcvicker@google.com>
[pablo@netfilter.org: rebased original patch on top of nf.git] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Dexuan Cui [Mon, 7 Sep 2020 07:13:39 +0000 (00:13 -0700)]
hv_netvsc: Fix hibernation for mlx5 VF driver
mlx5_suspend()/resume() keep the network interface, so during hibernation
netvsc_unregister_vf() and netvsc_register_vf() are not called, and hence
netvsc_resume() should call netvsc_vf_changed() to switch the data path
back to the VF after hibernation. Note: after we close and re-open the
vmbus channel of the netvsc NIC in netvsc_suspend() and netvsc_resume(),
the data path is implicitly switched to the netvsc NIC. Similarly,
netvsc_suspend() should not call netvsc_unregister_vf(), otherwise the VF
can no longer be used after hibernation.
For mlx4, since the VF network interafce is explicitly destroyed and
re-created during hibernation (see mlx4_suspend()/resume()), hv_netvsc
already explicitly switches the data path from and to the VF automatically
via netvsc_register_vf() and netvsc_unregister_vf(), so mlx4 doesn't need
this fix. Note: mlx4 can still work with the fix because in
netvsc_suspend()/resume() ndev_ctx->vf_netdev is NULL for mlx4.
Fixes: 0efeea5fb153 ("hv_netvsc: Add the support of hibernation") Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To protect netns id, the nsid_lock is used when netns id is being
allocated and removed by peernet2id_alloc() and unhash_nsid().
The nsid_lock can be used in BH context but only spin_lock() is used
in this code.
Using spin_lock() instead of spin_lock_bh() can result in a deadlock in
the following scenario reported by the lockdep.
In order to avoid a deadlock, the spin_lock_bh() should be used instead
of spin_lock() to acquire nsid_lock.
Test commands:
ip netns del nst
ip netns add nst
ip link add veth1 type veth peer name veth2
ip link set veth1 netns nst
ip netns exec nst ip link add name br1 type bridge vlan_filtering 1
ip netns exec nst ip link set dev br1 up
ip netns exec nst ip link set dev veth1 master br1
ip netns exec nst ip link set dev veth1 up
ip netns exec nst ip link add macvlan0 link br1 up type macvlan
../include/linux/netdevice.h:2158: warning: Function parameter or member 'proto_down_reason' not described in 'net_device'
Fixes: 829eb208e80d ("rtnetlink: add support for protodown reason") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Mon, 7 Sep 2020 17:08:37 +0000 (10:08 -0700)]
Merge branch 'bnxt_en-Two-bug-fixes'
Michael Chan says:
====================
bnxt_en: Two bug fixes.
The first patch fixes AER recovery by reducing the time from several
minutes to a more reasonable 20 - 30 seconds. The second patch fixes
a possible NULL pointer crash during firmware reset.
====================
bnxt_en: Fix NULL ptr dereference crash in bnxt_fw_reset_task()
bnxt_fw_reset_task() which runs from a workqueue can race with
bnxt_remove_one(). For example, if firmware reset and VF FLR are
happening at about the same time.
bnxt_remove_one() already cancels the workqueue and waits for it
to finish, but we need to do this earlier before the devlink
reporters are destroyed. This will guarantee that
the devlink reporters will always be valid when bnxt_fw_reset_task()
is still running.
Fixes: b148bb238c02 ("bnxt_en: Fix possible crash in bnxt_fw_reset_task().") Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
bnxt_en: Avoid sending firmware messages when AER error is detected.
When the driver goes through PCIe AER reset in error state, all
firmware messages will timeout because the PCIe bus is no longer
accessible. This can lead to AER reset taking many minutes to
complete as each firmware command takes time to timeout.
Define a new macro BNXT_NO_FW_ACCESS() to skip these firmware messages
when either firmware is in fatal error state or when
pci_channel_offline() is true. It now takes a more reasonable 20 to
30 seconds to complete AER recovery.
Fixes: b4fff2079d10 ("bnxt_en: Do not send firmware messages if firmware is in error state.") Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
cxgb4: Fix offset when clearing filter byte counters
Pass the correct offset to clear the stale filter hit
bytes counter. Otherwise, the counter starts incrementing
from the stale information, instead of 0.
Fixes: 12b276fbf6e0 ("cxgb4: add support to create hash filters") Signed-off-by: Ganji Aravind <ganji.aravind@chelsio.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 5 Sep 2020 22:24:50 +0000 (15:24 -0700)]
Merge branch 'hinic-BugFixes'
Luo bin says:
====================
hinic: BugFixes
The bugs fixed in this patchset have been present since the following
commits:
patch #1: Fixes: 00e57a6d4ad3 ("net-next/hinic: Add Tx operation")
patch #2: Fixes: 5e126e7c4e52 ("hinic: add firmware update support")
====================
Luo bin [Fri, 4 Sep 2020 08:37:29 +0000 (16:37 +0800)]
hinic: bump up the timeout of UPDATE_FW cmd
Firmware erases the entire flash region which may take several
seconds before flashing, so we bump up the timeout to ensure this
cmd won't return failure.
Fixes: 5e126e7c4e52 ("hinic: add firmware update support") Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Luo bin [Fri, 4 Sep 2020 08:37:28 +0000 (16:37 +0800)]
hinic: bump up the timeout of SET_FUNC_STATE cmd
We free memory regardless of the return value of SET_FUNC_STATE
cmd in hinic_close function to avoid memory leak and this cmd may
timeout when fw is busy with handling other cmds, so we bump up the
timeout of this cmd to ensure it won't return failure.
Fixes: 00e57a6d4ad3 ("net-next/hinic: Add Tx operation") Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At this point, thread A is waiting for thread B to release RTNL
lock, while thread B is waiting for thread A to commit the IDR
change with tcf_idr_insert() later.
Break this deadlock situation by preloading ife modules earlier,
before tcf_idr_check_alloc(), this is fine because we only need
to load modules we need potentially.
Reported-and-tested-by: syzbot+80e32b5d1f9923f8ace6@syzkaller.appspotmail.com Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xie He [Thu, 3 Sep 2020 00:06:58 +0000 (17:06 -0700)]
drivers/net/wan/hdlc_fr: Add needed_headroom for PVC devices
PVC devices are virtual devices in this driver stacked on top of the
actual HDLC device. They are the devices normal users would use.
PVC devices have two types: normal PVC devices and Ethernet-emulating
PVC devices.
When transmitting data with PVC devices, the ndo_start_xmit function
will prepend a header of 4 or 10 bytes. Currently this driver requests
this headroom to be reserved for normal PVC devices by setting their
hard_header_len to 10. However, this does not work when these devices
are used with AF_PACKET/RAW sockets. Also, this driver does not request
this headroom for Ethernet-emulating PVC devices (but deals with this
problem by reallocating the skb when needed, which is not optimal).
This patch replaces hard_header_len with needed_headroom, and set
needed_headroom for Ethernet-emulating PVC devices, too. This makes
the driver to request headroom for all PVC devices in all cases.
Cc: Krzysztof Halasa <khc@pm.waw.pl> Cc: Martin Schiller <ms@dev.tdt.de> Signed-off-by: Xie He <xie.he.0141@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Merge tag 's390-5.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix GENERIC_LOCKBREAK dependency on PREEMPTION in Kconfig broken
because of a typo
- Update defconfigs
* tag 's390-5.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: update defconfigs
s390: fix GENERIC_LOCKBREAK dependency typo in Kconfig
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Fix the loading of modules built with binutils-2.35. This version
produces writable and executable .text.ftrace_trampoline section
which is rejected by the kernel.
- Remove the exporting of cpu_logical_map() as the Tegra driver has now
been fixed and no longer uses this function.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/module: set trampoline section flags regardless of CONFIG_DYNAMIC_FTRACE
arm64: Remove exporting cpu_logical_map symbol
Merge tag 'kbuild-fixes-v5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- fix documents
- fix warning in 'make localmodconfig'
* tag 'kbuild-fixes-v5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: remove redundant assignment prompt = prompt
kbuild: Documentation: clean up makefiles.rst
kconfig: streamline_config.pl: check defined(ENV variable) before using it
Documentation/llvm: Improve formatting of commands, variables, and arguments
Merge tag 'pm-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix reference counting in the operating performance points (OPP)
framework and address a few intel_pstate driver issues, mostly related
to switching driver operation modes and similar with hardware-managed
P-states (HWP) enabled.
- Address intel_pstate driver interface issues, mostly related to
switching operation modes and handling CPU offline and online and
system-wide suspend/resume with hardware-managed P-states (HWP)
enabled (Rafael Wysocki).
- Fix the maximum frequency computation in the intel_pstate driver
with turbo P-states disabled by the platform firmware and HWP
enabled (Francisco Jerez)"
* tag 'pm-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled
cpufreq: intel_pstate: Free memory only when turning off
cpufreq: intel_pstate: Add ->offline and ->online callbacks
cpufreq: intel_pstate: Tweak the EPP sysfs interface
cpufreq: intel_pstate: Update cached EPP in the active mode
cpufreq: intel_pstate: Refuse to turn off with HWP enabled
opp: Don't drop reference for an OPP table that was never parsed
Merge tag 'libata-5.9-2020-09-04' of git://git.kernel.dk/linux-block
Pull libata fixes from Jens Axboe:
- improve Sandisks ATA_HORKAGE on NCQ (Tejun)
- link printk cleanup (Xu)
* tag 'libata-5.9-2020-09-04' of git://git.kernel.dk/linux-block:
libata: implement ATA_HORKAGE_MAX_TRIM_128M and apply to Sandisks
ata: ahci: use ata_link_info() instead of ata_link_printk()
Merge tag 'block-5.9-2020-09-04' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A bit larger than usual this week, mostly due to the NVMe fixes
arriving late for -rc3 and hence didn't make last weeks pull request.
- NVMe:
- instance leak and io boundary fixes from Keith
- fc locking fix from Christophe
- various tcp/rdma reset during traffic fixes from Sagi
- pci use-after-free fix from Tong
- tcp target null deref fix from Ziye
- Locking fix for partition removal (Christoph)
- Ensure bdi->io_pages is always set (me)
- Fixup for hd struct reference (Ming)
- Fix for zero length bvecs (Ming)
- Two small blk-iocost fixes (Tejun)"
* tag 'block-5.9-2020-09-04' of git://git.kernel.dk/linux-block:
block: allow for_each_bvec to support zero len bvec
blk-stat: make q->stats->lock irqsafe
blk-iocost: ioc_pd_free() shouldn't assume irq disabled
block: fix locking in bdev_del_partition
block: release disk reference in hd_struct_free_work
block: ensure bdi->io_pages is always initialized
nvme-pci: cancel nvme device request before disabling
nvme: only use power of two io boundaries
nvme: fix controller instance leak
nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()'
nvme: Fix NULL dereference for pci nvme controllers
nvme-rdma: fix reset hang if controller died in the middle of a reset
nvme-rdma: fix timeout handler
nvme-rdma: serialize controller teardown sequences
nvme-tcp: fix reset hang if controller died in the middle of a reset
nvme-tcp: fix timeout handler
nvme-tcp: serialize controller teardown sequences
nvme: have nvme_wait_freeze_timeout return if it timed out
nvme-fabrics: don't check state NVME_CTRL_NEW for request acceptance
nvmet-tcp: Fix NULL dereference when a connect data comes in h2cdata pdu
Merge tag 'io_uring-5.9-2020-09-04' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- EAGAIN with O_NONBLOCK retry fix
- Two small fixes for registered files (Jiufei)
* tag 'io_uring-5.9-2020-09-04' of git://git.kernel.dk/linux-block:
io_uring: no read/write-retry on -EAGAIN error and O_NONBLOCK marked file
io_uring: set table->files[i] to NULL when io_sqe_file_register failed
io_uring: fix removing the wrong file in __io_sqe_files_update()
Merge tag 'thermal-v5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal fixes from Daniel Lezcano:
- Fix bogus thermal shutdowns for omap4430 where bogus values resulting
from an incorrect ADC conversion are too high and fire an emergency
shutdown (Tony Lindgren)
- Don't suppress negative temp for qcom spmi as they are valid and
userspace needs them (Veera Vegivada)
- Fix use-after-free in thermal_zone_device_unregister reported by
Kasan (Dmitry Osipenko)
* tag 'thermal-v5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
thermal: core: Fix use-after-free in thermal_zone_device_unregister()
thermal: qcom-spmi-temp-alarm: Don't suppress negative temp
thermal: ti-soc-thermal: Fix bogus thermal shutdowns for omap4430
Merge tag 'dmaengine-fix-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
"A couple of core fixes and odd driver fixes for dmaengine subsystem:
Core:
- drop ACPI CSRT table reference after using it
- fix of_dma_router_xlate() error handling
Drivers fixes in idxd, at_hdmac, pl330, dw-edma and jz478"
* tag 'dmaengine-fix-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: ti: k3-udma: Update rchan_oes_offset for am654 SYSFW ABI 3.0
drivers/dma/dma-jz4780: Fix race condition between probe and irq handler
dmaengine: dw-edma: Fix scatter-gather address calculation
dmaengine: ti: k3-udma: Fix the TR initialization for prep_slave_sg
dmaengine: pl330: Fix burst length if burst size is smaller than bus width
dmaengine: at_hdmac: add missing kfree() call in at_dma_xlate()
dmaengine: at_hdmac: add missing put_device() call in at_dma_xlate()
dmaengine: at_hdmac: check return value of of_find_device_by_node() in at_dma_xlate()
dmaengine: of-dma: Fix of_dma_router_xlate's of_dma_xlate handling
dmaengine: idxd: reset states after device disable or reset
dmaengine: acpi: Put the CSRT table after using it
Merge tag 'sound-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small changes, nothing intrusive:
- remaining tasklet API conversions, now all sound stuff have been
converted
- a few HD-audio and USB-audio quirks and minor fixes
- FireWire Tascam and Digi00xx fixes
- drop a kernel WARNING from PCM OSS for syzkaller"
* tag 'sound-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (29 commits)
ALSA: hda/realtek - Improved routing for Thinkpad X1 7th/8th Gen
ALSA: hda: use consistent HDAudio spelling in comments/docs
ALSA: hda: add dev_dbg log when driver is not selected
ALSA: hda: fix a runtime pm issue in SOF when integrated GPU is disabled
ALSA: hda: hdmi - add Rocketlake support
ALSA: ua101: convert tasklets to use new tasklet_setup() API
ALSA: usb-audio: convert tasklets to use new tasklet_setup() API
ASoC: txx9: convert tasklets to use new tasklet_setup() API
ASoC: siu: convert tasklets to use new tasklet_setup() API
ASoC: fsl_esai: convert tasklets to use new tasklet_setup() API
ALSA: hdsp: convert tasklets to use new tasklet_setup() API
ALSA: riptide: convert tasklets to use new tasklet_setup() API
ALSA: pci/asihpi: convert tasklets to use new tasklet_setup() API
ALSA: firewire: convert tasklets to use new tasklet_setup() API
ALSA: core: convert tasklets to use new tasklet_setup() API
ALSA: pcm: oss: Remove superfluous WARN_ON() for mulaw sanity check
ALSA: hda - Fix silent audio output and corrupted input on MSI X570-A PRO
ALSA: hda/hdmi: always check pin power status in i915 pin fixup
ALSA: hda/realtek: Add quirk for Samsung Galaxy Book Ion NT950XCJ-X716A
ALSA: usb-audio: Add basic capture support for Pioneer DJ DJM-250MK2
...
Merge tag 'drm-fixes-2020-09-04' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Not much going on this week, nouveau has a display hw bug workaround,
amdgpu has some PM fixes and CIK regression fixes, one single radeon
PLL fix, and a couple of i915 display fixes.
amdgpu:
- Fix for 32bit systems
- SW CTF fix
- Update for Sienna Cichlid
- CIK bug fixes
radeon:
- PLL fix
i915:
- Clang build warning fix
- HDCP fixes
nouveau:
- display fixes"
* tag 'drm-fixes-2020-09-04' of git://anongit.freedesktop.org/drm/drm:
drm/nouveau/kms/nv50-gp1xx: add WAR for EVO push buffer HW bug
drm/nouveau/kms/nv50-gp1xx: disable notifies again after core update
drm/nouveau/kms/nv50-: add some whitespace before debug message
drm/nouveau/kms/gv100-: Include correct push header in crcc37d.c
drm/radeon: Prefer lower feedback dividers
drm/amdgpu: Fix bug in reporting voltage for CIK
drm/amdgpu: Specify get_argument function for ci_smu_funcs
drm/amd/pm: enable MP0 DPM for sienna_cichlid
drm/amd/pm: avoid false alarm due to confusing softwareshutdowntemp setting
drm/amd/pm: fix is_dpm_running() run error on 32bit system
drm/i915: Clear the repeater bit on HDCP disable
drm/i915: Fix sha_text population code
drm/i915/display: Ensure that ret is always initialized in icl_combo_phy_verify_state
Or Cohen [Fri, 4 Sep 2020 04:05:28 +0000 (21:05 -0700)]
net/packet: fix overflow in tpacket_rcv
Using tp_reserve to calculate netoff can overflow as
tp_reserve is unsigned int and netoff is unsigned short.
This may lead to macoff receving a smaller value then
sizeof(struct virtio_net_hdr), and if po->has_vnet_hdr
is set, an out-of-bounds write will occur when
calling virtio_net_hdr_from_skb.
The bug is fixed by converting netoff to unsigned int
and checking if it exceeds USHRT_MAX.
This addresses CVE-2020-14386
Fixes: 8913336a7e8d ("packet: add PACKET_RESERVE sockopt") Signed-off-by: Or Cohen <orcohen@paloaltonetworks.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge emailed patches from Peter Xu:
"This is a small series that I picked up from Linus's suggestion to
simplify cow handling (and also make it more strict) by checking
against page refcounts rather than mapcounts.
This makes uffd-wp work again (verified by running upmapsort)"
Note: this is horrendously bad timing, and making this kind of
fundamental vm change after -rc3 is not at all how things should work.
The saving grace is that it really is a a nice simplification:
The reason for the bad timing is that it turns out that commit 17839856fd58 ("gup: document and work around 'COW can break either way'
issue" broke not just UFFD functionality (as Peter noticed), but Mikulas
Patocka also reports that it caused issues for strace when running in a
DAX environment with ext4 on a persistent memory setup.
And we can't just revert that commit without re-introducing the original
issue that is a potential security hole, so making COW stricter (and in
the process much simpler) is a step to then undoing the forced COW that
broke other uses.
Rafael J. Wysocki [Fri, 4 Sep 2020 16:31:25 +0000 (18:31 +0200)]
Merge branch 'pm-cpufreq'
* pm-cpufreq:
cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled
cpufreq: intel_pstate: Free memory only when turning off
cpufreq: intel_pstate: Add ->offline and ->online callbacks
cpufreq: intel_pstate: Tweak the EPP sysfs interface
cpufreq: intel_pstate: Update cached EPP in the active mode
cpufreq: intel_pstate: Refuse to turn off with HWP enabled
Peter Xu [Fri, 21 Aug 2020 23:49:57 +0000 (19:49 -0400)]
mm/gup: Remove enfornced COW mechanism
With the more strict (but greatly simplified) page reuse logic in
do_wp_page(), we can safely go back to the world where cow is not
enforced with writes.
This essentially reverts commit 17839856fd58 ("gup: document and work
around 'COW can break either way' issue"). There are some context
differences due to some changes later on around it:
Dmitry Osipenko [Mon, 17 Aug 2020 23:58:54 +0000 (02:58 +0300)]
thermal: core: Fix use-after-free in thermal_zone_device_unregister()
The user-after-free bug in thermal_zone_device_unregister() is reported by
KASAN. It happens because struct thermal_zone_device is released during of
device_unregister() invocation, and hence the "tz" variable shouldn't be
touched by thermal_notify_tz_delete(tz->id).
Currently driver is suppressing the negative temperature
readings from the vadc. Consumers of the thermal zones need
to read the negative temperature too. Don't suppress the
readings.
Tony Lindgren [Mon, 6 Jul 2020 18:33:38 +0000 (11:33 -0700)]
thermal: ti-soc-thermal: Fix bogus thermal shutdowns for omap4430
We can sometimes get bogus thermal shutdowns on omap4430 at least with
droid4 running idle with a battery charger connected:
thermal thermal_zone0: critical temperature reached (143 C), shutting down
Dumping out the register values shows we can occasionally get a 0x7f value
that is outside the TRM listed values in the ADC conversion table. And then
we get a normal value when reading again after that. Reading the register
multiple times does not seem help avoiding the bogus values as they stay
until the next sample is ready.
Looking at the TRM chapter "18.4.10.2.3 ADC Codes Versus Temperature", we
should have values from 13 to 107 listed with a total of 95 values. But
looking at the omap4430_adc_to_temp array, the values are off, and the
end values are missing. And it seems that the 4430 ADC table is similar
to omap3630 rather than omap4460.
Let's fix the issue by using values based on the omap3630 table and just
ignoring invalid values. Compared to the 4430 TRM, the omap3630 table has
the missing values added while the TRM table only shows every second
value.
Note that sometimes the ADC register values within the valid table can
also be way off for about 1 out of 10 values. But it seems that those
just show about 25 C too low values rather than too high values. So those
do not cause a bogus thermal shutdown.
Fixes: 1a31270e54d7 ("staging: omap-thermal: add OMAP4 data structures") Cc: Merlijn Wajer <merlijn@wizzup.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200706183338.25622-1-tony@atomide.com
Merge tag 'perf-tools-fixes-for-v5.9-2020-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull more perf tools fixes from Arnaldo Carvalho de Melo:
- Use uintptr_t when casting numbers to pointers
- Keep output expected by 3rd parties: Turn off summary for interval
mode by default.
- BPF is in kernel space, make sure do_validate_kcore_modules() knows
about that.
- Explicitly call out event modifiers in the documentation.
- Fix jevents() allocation of space for regular expressions.
- Address libtraceevent build warnings on 32-bit arches.
- Fix checking of functions returns using ERR_PTR() in 'perf bench'.
* tag 'perf-tools-fixes-for-v5.9-2020-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf tools: Add bpf image check to __map__is_kmodule
perf record/stat: Explicitly call out event modifiers in the documentation
perf bench: The do_run_multi_threaded() function must use IS_ERR(perf_session__new())
perf stat: Turn off summary for interval mode by default
libtraceevent: Fix build warning on 32-bit arches
perf jevents: Fix suspicious code in fixregex()
perf parse-events: Use uintptr_t when casting numbers to pointers
1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi
Kivilinna.
2) Fix loss of RTT samples in rxrpc, from David Howells.
3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu.
4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka.
5) We disable BH for too lokng in sctp_get_port_local(), add a
cond_resched() here as well, from Xin Long.
6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu.
7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from
Yonghong Song.
8) Missing of_node_put() in mt7530 DSA driver, from Sumera
Priyadarsini.
9) Fix crash in bnxt_fw_reset_task(), from Michael Chan.
10) Fix geneve tunnel checksumming bug in hns3, from Yi Li.
11) Memory leak in rxkad_verify_response, from Dinghao Liu.
12) In tipc, don't use smp_processor_id() in preemptible context. From
Tuong Lien.
13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu.
14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter.
15) Fix ABI mismatch between driver and firmware in nfp, from Louis
Peens.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits)
net/smc: fix sock refcounting in case of termination
net/smc: reset sndbuf_desc if freed
net/smc: set rx_off for SMCR explicitly
net/smc: fix toleration of fake add_link messages
tg3: Fix soft lockup when tg3_reset_task() fails.
doc: net: dsa: Fix typo in config code sample
net: dp83867: Fix WoL SecureOn password
nfp: flower: fix ABI mismatch between driver and firmware
tipc: fix shutdown() of connectionless socket
ipv6: Fix sysctl max for fib_multipath_hash_policy
drivers/net/wan/hdlc: Change the default of hard_header_len to 0
net: gemini: Fix another missing clk_disable_unprepare() in probe
net: bcmgenet: fix mask check in bcmgenet_validate_flow()
amd-xgbe: Add support for new port mode
net: usb: dm9601: Add USB ID of Keenetic Plus DSL
vhost: fix typo in error message
net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init()
pktgen: fix error message with wrong function name
net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode
cxgb4: fix thermal zone device registration
...
Merge branch 'gate-page-refcount' (patches from Dave Hansen)
Merge gate page refcount fix from Dave Hansen:
"During the conversion over to pin_user_pages(), gate pages were missed.
The fix is pretty simple, and is accompanied by a new test from Andy
which probably would have caught this earlier"
* emailed patches from Dave Hansen <dave.hansen@linux.intel.com>:
selftests/x86/test_vsyscall: Improve the process_vm_readv() test
mm: fix pin vs. gup mismatch with gate pages
Andy Lutomirski [Thu, 3 Sep 2020 20:40:30 +0000 (13:40 -0700)]
selftests/x86/test_vsyscall: Improve the process_vm_readv() test
The existing code accepted process_vm_readv() success or failure as long
as it didn't return garbage. This is too weak: if the vsyscall page is
readable, then process_vm_readv() should succeed and, if the page is not
readable, then it should fail.
Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Hansen [Thu, 3 Sep 2020 20:40:28 +0000 (13:40 -0700)]
mm: fix pin vs. gup mismatch with gate pages
Gate pages were missed when converting from get to pin_user_pages().
This can lead to refcount imbalances. This is reliably and quickly
reproducible running the x86 selftests when vsyscall=emulate is enabled
(the default). Fix by using try_grab_page() with appropriate flags
passed.
The long story:
Today, pin_user_pages() and get_user_pages() are similar interfaces for
manipulating page reference counts. However, "pins" use a "bias" value
and manipulate the actual reference count by 1024 instead of 1 used by
plain "gets".
That means that pin_user_pages() must be matched with unpin_user_pages()
and can't be mixed with a plain put_user_pages() or put_page().
Enter gate pages, like the vsyscall page. They are pages usually in the
kernel image, but which are mapped to userspace. Userspace is allowed
access to them, including interfaces using get/pin_user_pages(). The
refcount of these kernel pages is manipulated just like a normal user
page on the get/pin side so that the put/unpin side can work the same
for normal user pages or gate pages.
get_gate_page() uses try_get_page() which only bumps the refcount by
1, not 1024, even if called in the pin_user_pages() path. If someone
pins a gate page, this happens:
pin_user_pages()
get_gate_page()
try_get_page() // bump refcount +1
... some time later
unpin_user_pages()
page_ref_sub_and_test(page, 1024))
... and boom, we get a refcount off by 1023. This is reliably and
quickly reproducible running the x86 selftests when booted with
vsyscall=emulate (the default). The selftests use ptrace(), but I
suspect anything using pin_user_pages() on gate pages could hit this.
To fix it, simply use try_grab_page() instead of try_get_page(), and
pass 'gup_flags' in so that FOLL_PIN can be respected.
This bug traces back to the very beginning of the FOLL_PIN support in
commit 3faa52c03f44 ("mm/gup: track FOLL_PIN pages"), which showed up in
the 5.7 release.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Fixes: 3faa52c03f44 ("mm/gup: track FOLL_PIN pages") Reported-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: x86@kernel.org Cc: Jann Horn <jannh@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Airlie [Fri, 4 Sep 2020 01:14:24 +0000 (11:14 +1000)]
Merge branch 'linux-5.9' of git://github.com/skeggsb/linux into drm-fixes
A couple of minor fixes to the display changes that went in for 5.9.
The most important of which is a workaround for a HW bug that was
exposed by better push buffer space management, leading to
random(ish...) display engine hangs.
David S. Miller [Thu, 3 Sep 2020 23:52:33 +0000 (16:52 -0700)]
Merge branch 'smc-fixes'
Karsten Graul says:
====================
net/smc: fixes 2020-09-03
Please apply the following patch series for smc to netdev's net tree.
Patch 1 fixes the toleration of older SMC implementations. Patch 2
takes care of a problem that happens when SMCR is used after SMCD
initialization failed. Patch 3 fixes a problem with freed send buffers,
and patch 4 corrects refcounting when SMC terminates due to device
removal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net/smc: fix sock refcounting in case of termination
When an ISM device is removed, all its linkgroups are terminated,
i.e. all the corresponding connections are killed.
Connection killing invokes smc_close_active_abort(), which decreases
the sock refcount for certain states to simulate passive closing.
And it cancels the close worker and has to give up the sock lock for
this timeframe. This opens the door for a passive close worker or a
socket close to run in between. In this case smc_close_active_abort() and
passive close worker resp. smc_release() might do a sock_put for passive
closing. This causes:
The patch rechecks sock state in smc_close_active_abort() after
smc_close_cancel_work() to avoid duplicate decrease of sock
refcount for the same purpose.
Fixes: 611b63a12732 ("net/smc: cancel tx worker in case of socket aborts") Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When an SMC connection is created, and there is a problem to
create an RMB or DMB, the previously created send buffer is
thrown away as well including buffer descriptor freeing.
Make sure the connection no longer references the freed
buffer descriptor, otherwise bugs like this are possible:
[71556.835148] =============================================================================
[71556.835168] BUG kmalloc-128 (Tainted: G B OE ): Poison overwritten
[71556.835172] -----------------------------------------------------------------------------
SMC tries to make use of SMCD first. If a problem shows up,
it tries to switch to SMCR. If the SMCD initializing problem shows
up after the SMCD connection has already been initialized, field
rx_off keeps the wrong SMCD value for SMCR, which results in corrupted
data at the receiver.
This patch adds an explicit (re-)setting of field rx_off to zero if the
connection uses SMCR.
Fixes: be244f28d22f ("net/smc: add SMC-D support in data transfer") Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Older SMCR implementations had no link failover support and used one
link only. Because the handshake protocol requires to try the
establishment of a second link the old code sent a fake add_link message
and declined any server response afterwards.
The current code supports multiple links and inspects the received fake
add_link message more closely. To tolerate the fake add_link messages
smc_llc_is_local_add_link() needs an improved check of the message to
be able to separate between locally enqueued and fake add_link messages.
And smc_llc_cli_add_link() needs to check if the provided qp_mtu size is
invalid and reject the add_link request in that case.
Fixes: c48254fa48e5 ("net/smc: move add link processing for new device into llc layer") Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 3 Sep 2020 18:28:54 +0000 (14:28 -0400)]
tg3: Fix soft lockup when tg3_reset_task() fails.
If tg3_reset_task() fails, the device state is left in an inconsistent
state with IFF_RUNNING still set but NAPI state not enabled. A
subsequent operation, such as ifdown or AER error can cause it to
soft lock up when it tries to disable NAPI state.
Fix it by bringing down the device to !IFF_RUNNING state when
tg3_reset_task() fails. tg3_reset_task() running from workqueue
will now call tg3_close() when the reset fails. We need to
modify tg3_reset_task_cancel() slightly to avoid tg3_close()
calling cancel_work_sync() to cancel tg3_reset_task(). Otherwise
cancel_work_sync() will wait forever for tg3_reset_task() to
finish.
Reported-by: David Christensen <drc@linux.vnet.ibm.com> Reported-by: Baptiste Covolato <baptiste@arista.com> Fixes: db2199737990 ("tg3: Schedule at most one tg3_reset_task run") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Olsa [Wed, 26 Aug 2020 21:30:17 +0000 (23:30 +0200)]
perf tools: Add bpf image check to __map__is_kmodule
When validating kcore modules the do_validate_kcore_modules function
checks on every kernel module dso against modules record. The
__map__is_kmodule check is used to get only kernel module dso objects
through.
Currently the bpf images are slipping through the check and making the
validation to fail, so report falls back from kcore usage to kallsyms.
Adding __map__is_bpf_image check for bpf image and adding it to
__map__is_kmodule check.
Fixes: 3c29d4483e85 ("perf annotate: Add basic support for bpf_image") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200826213017.818788-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kim Phillips [Tue, 1 Sep 2020 21:58:53 +0000 (16:58 -0500)]
perf record/stat: Explicitly call out event modifiers in the documentation
Event modifiers are not mentioned in the perf record or perf stat
manpages. Add them to orient new users more effectively by pointing
them to the perf list manpage for details.
Fixes: 2055fdaf8703 ("perf list: Document precise event sampling for AMD IBS") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tony Jones <tonyj@suse.de> Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20200901215853.276234-1-kim.phillips@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf bench: The do_run_multi_threaded() function must use IS_ERR(perf_session__new())
In case of error, the function perf_session__new() returns ERR_PTR() and
never returns NULL. The NULL test in the return value check should be
replaced with IS_ERR()
Committer notes:
This wasn't compiling due to an extraneous '{' not matched by a '}', fix
it.
Fixes: 13edc237200c ("perf bench: Add a multi-threaded synthesize benchmark") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200902140526.26916-1-yuehaibing@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Thu, 3 Sep 2020 15:25:10 +0000 (00:25 +0900)]
perf jevents: Fix suspicious code in fixregex()
The new string should have enough space for the original string and the
back slashes IMHO.
Fixes: fbc2844e84038ce3 ("perf vendor events: Use more flexible pattern matching for CPU identification for mapfile.csv") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: William Cohen <wcohen@redhat.com> Link: http://lore.kernel.org/lkml/20200903152510.489233-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf parse-events: Use uintptr_t when casting numbers to pointers
To address these errors found when cross building from x86_64 to MIPS
little endian 32-bit:
CC /tmp/build/perf/util/parse-events-bison.o
util/parse-events.y: In function 'parse_events_parse':
util/parse-events.y:514:6: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
514 | (void *) $2, $6, $4);
| ^
util/parse-events.y:531:7: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
531 | (void *) $2, NULL, $4)) {
| ^
util/parse-events.y:547:6: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
547 | (void *) $2, $4, 0);
| ^
util/parse-events.y:564:7: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
564 | (void *) $2, NULL, 0)) {
| ^
Fixes: cabbf26821aa210f ("perf parse: Before yyabort-ing free components") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Paul Barker [Thu, 3 Sep 2020 08:49:25 +0000 (09:49 +0100)]
doc: net: dsa: Fix typo in config code sample
In the "single port" example code for configuring a DSA switch without
tagging support from userspace the command to bring up the "lan2" link
was typo'd.
Signed-off-by: Paul Barker <pbarker@konsulko.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'fixes-2020-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull misc build failure fixes from Mike Rapoport:
"Fix min_low_pfn/max_low_pfn build errors on ia64 and microblaze.
Some configurations of ia64 and microblaze use min_low_pfn and
max_low_pfn in pfn_valid(). This causes build failures for modules
that use pfn_valid().
The fix is to add EXPORT_SYMBOL() for these variables on ia64 and
microblaze"
* tag 'fixes-2020-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
ia64: fix min_low_pfn/max_low_pfn build errors
microblaze: fix min_low_pfn/max_low_pfn build errors
Merge tag 'media/v5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- a compilation fix issue with ti-vpe on arm 32 bits
- two Kconfig fixes for imx214 and max9286 drivers
- a kernel information leak at v4l2-core on time32 compat ioctls
- some fixes at rc core unbind logic
- a fix at mceusb driver for it to not use GFP_ATOMIC
- fixes at cedrus and vicodec drivers at the control handling logic
- a fix at gpio-ir-tx to avoid disabling interruts on a spinlock
* tag 'media/v5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: mceusb: Avoid GFP_ATOMIC where it is not needed
media: gpio-ir-tx: spinlock is not needed to disable interrupts
media: rc: do not access device via sysfs after rc_unregister_device()
media: rc: uevent sysfs file races with rc_unregister_device()
media: max9286: Depend on OF_GPIO
media: i2c: imx214: select V4L2_FWNODE
media: cedrus: Add missing v4l2_ctrl_request_hdl_put()
media: vicodec: add missing v4l2_ctrl_request_hdl_put()
media: media/v4l2-core: Fix kernel-infoleak in video_put_user()
media: ti-vpe: cal: Fix compilation on 32-bit ARM
ALSA: hda/realtek - Improved routing for Thinkpad X1 7th/8th Gen
There've been quite a few regression reports about the lowered volume
(reduced to ca 65% from the previous level) on Lenovo Thinkpad X1
after the commit d2cd795c4ece ("ALSA: hda - fixup for the bass speaker
on Lenovo Carbon X1 7th gen"). Although the commit itself does the
right thing from HD-audio POV in order to have a volume control for
bass speakers, it seems that the machine has some secret recipe under
the hood.
Through experiments, Benjamin Poirier found out that the following
routing gives the best result:
* DAC1 (NID 0x02) -> Speaker pin (NID 0x14)
* DAC2 (NID 0x03) -> Shared by both Bass Speaker pin (NID 0x17) &
Headphone pin (0x21)
* DAC3 (NID 0x06) -> Unused
DAC1 seems to have some equalizer internally applied, and you'd get
again the output in a bad quality if you connect this to the
headphone pin. Hence the headphone is connected to DAC2, which is now
shared with the bass speaker pin. DAC3 has no volume amp, hence it's
not connected at all.
For achieving the routing above, this patch introduced a couple of
workarounds:
* The connection list of bass speaker pin (NID 0x17) is reduced not to
include DAC3 (NID 0x06)
* Pass preferred_pairs array to specify the fixed connection
Here, both workarounds are needed because the generic parser prefers
the individual DAC assignment over others.
When the routing above is applied, the generic parser creates the two
volume controls "Front" and "Bass Speaker". Since we have only two
DACs for three output pins, those are not fully controlling each
output individually, and it would confuse PulseAudio. For avoiding
the pitfall, in this patch, we rename those volume controls to some
unique ones ("DAC1" and "DAC2"). Then PulseAudio ignore them and
concentrate only on the still good-working "Master" volume control.
If a user still wants to control each DAC volume, they can still
change manually via "DAC1" and "DAC2" volume controls.
Thomas Bogendoerfer [Wed, 2 Sep 2020 21:32:14 +0000 (23:32 +0200)]
MIPS: SNI: Fix SCSI interrupt
On RM400(a20r) machines ISA and SCSI interrupts share the same interrupt
line. Commit 49e6e07e3c80 ("MIPS: pass non-NULL dev_id on shared
request_irq()") accidently dropped the IRQF_SHARED bit, which breaks
registering SCSI interrupt. Put back IRQF_SHARED and add dev_id for
ISA interrupt.
Fixes: 49e6e07e3c80 ("MIPS: pass non-NULL dev_id on shared request_irq()") Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
MIPS: add missing MSACSR and upper MSA initialization
In cc97ab235f3f ("MIPS: Simplify FP context initialization), init_fp_ctx
just initialize the fp/msa context, and own_fp_inatomic just restore
FCSR and 64bit FP regs from it, but miss MSACSR and upper MSA regs for
MSA, so MSACSR and MSA upper regs's value from previous task on current
cpu can leak into current task and cause unpredictable behavior when MSA
context not initialized.
Lyude Paul [Mon, 10 Aug 2020 21:18:37 +0000 (17:18 -0400)]
drm/nouveau/kms/gv100-: Include correct push header in crcc37d.c
Looks like when we converted everything over to Nvidia's class headers,
we mistakenly included the nvif/push507b.h instead of nvif/pushc37b.h,
which resulted in breaking CRC reporting for volta+:
nouveau 0000:1f:00.0: disp: chid 0 stat 10003361 reason 3
[RESERVED_METHOD] mthd 0d84 data 00000000 code 00000000
nouveau 0000:1f:00.0: disp: chid 0 stat 10003360 reason 3
[RESERVED_METHOD] mthd 0d80 data 00000000 code 00000000
nouveau 0000:1f:00.0: DRM: CRC notifier ctx for head 3 not finished
after 50ms
So, fix that.
Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: c4b27bc8682c ("drm/nouveau/kms/nv50-: convert core crc_set_src() to new push macros") Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Kai-Heng Feng [Tue, 25 Aug 2020 17:33:48 +0000 (01:33 +0800)]
drm/radeon: Prefer lower feedback dividers
Commit 2e26ccb119bd ("drm/radeon: prefer lower reference dividers")
fixed screen flicker for HP Compaq nx9420 but breaks other laptops like
Asus X50SL.
Turns out we also need to favor lower feedback dividers.
Users confirmed this change fixes the regression and doesn't regress the
original fix.
Fixes: 2e26ccb119bd ("drm/radeon: prefer lower reference dividers") BugLink: https://bugs.launchpad.net/bugs/1791312 BugLink: https://bugs.launchpad.net/bugs/1861554 Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sandeep Raghuraman [Thu, 27 Aug 2020 13:13:37 +0000 (18:43 +0530)]
drm/amdgpu: Fix bug in reporting voltage for CIK
On my R9 390, the voltage was reported as a constant 1000 mV.
This was due to a bug in smu7_hwmgr.c, in the smu7_read_sensor()
function, where some magic constants were used in a condition,
to determine whether the voltage should be read from PLANE2_VID
or PLANE1_VID. The VDDC mask was incorrectly used, instead of
the VDDGFX mask.
This patch changes the code to use the correct defined constants
(and apply the correct bitshift), thus resulting in correct voltage reporting.
Signed-off-by: Sandeep Raghuraman <sandy.8925@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sandeep Raghuraman [Thu, 27 Aug 2020 11:37:33 +0000 (17:07 +0530)]
drm/amdgpu: Specify get_argument function for ci_smu_funcs
Starting in Linux 5.8, the graphics and memory clock frequency were not being
reported for CIK cards. This is a regression, since they were reported correctly
in Linux 5.7.
After investigation, I discovered that the smum_send_msg_to_smc() function,
attempts to call the corresponding get_argument() function of ci_smu_funcs.
However, the get_argument() function is not defined in ci_smu_funcs.
This patch fixes the bug by specifying the correct get_argument() function.
Fixes: a0ec225633d9f6 ("drm/amd/powerplay: unified interfaces for message issuing and response checking") Signed-off-by: Sandeep Raghuraman <sandy.8925@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
Evan Quan [Tue, 25 Aug 2020 05:51:29 +0000 (13:51 +0800)]
drm/amd/pm: avoid false alarm due to confusing softwareshutdowntemp setting
Normally softwareshutdowntemp should be greater than Thotspotlimit.
However, on some VEGA10 ASIC, the softwareshutdowntemp is 91C while
Thotspotlimit is 105C. This seems not right and may trigger some
false alarms.
Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org