Daniel Borkmann [Wed, 24 Mar 2021 13:52:31 +0000 (14:52 +0100)]
bpf: Update selftests to reflect new error states
Update various selftest error messages:
* The 'Rx tried to sub from different maps, paths, or prohibited types'
is reworked into more specific/differentiated error messages for better
guidance.
* The change into 'value -4294967168 makes map_value pointer be out of
bounds' is due to moving the mixed bounds check into the speculation
handling and thus occuring slightly later than above mentioned sanity
check.
* The change into 'math between map_value pointer and register with
unbounded min value' is similarly due to register sanity check coming
before the mixed bounds check.
* The case of 'map access: known scalar += value_ptr from different maps'
now loads fine given masks are the same from the different paths (despite
max map value size being different).
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Wed, 24 Mar 2021 09:38:26 +0000 (10:38 +0100)]
bpf: Tighten speculative pointer arithmetic mask
This work tightens the offset mask we use for unprivileged pointer arithmetic
in order to mitigate a corner case reported by Piotr and Benedict where in
the speculative domain it is possible to advance, for example, the map value
pointer by up to value_size-1 out-of-bounds in order to leak kernel memory
via side-channel to user space.
Before this change, the computed ptr_limit for retrieve_ptr_limit() helper
represents largest valid distance when moving pointer to the right or left
which is then fed as aux->alu_limit to generate masking instructions against
the offset register. After the change, the derived aux->alu_limit represents
the largest potential value of the offset register which we mask against which
is just a narrower subset of the former limit.
For minimal complexity, we call sanitize_ptr_alu() from 2 observation points
in adjust_ptr_min_max_vals(), that is, before and after the simulated alu
operation. In the first step, we retieve the alu_state and alu_limit before
the operation as well as we branch-off a verifier path and push it to the
verification stack as we did before which checks the dst_reg under truncation,
in other words, when the speculative domain would attempt to move the pointer
out-of-bounds.
In the second step, we retrieve the new alu_limit and calculate the absolute
distance between both. Moreover, we commit the alu_state and final alu_limit
via update_alu_sanitation_state() to the env's instruction aux data, and bail
out from there if there is a mismatch due to coming from different verification
paths with different states.
Reported-by: Piotr Krysiuk <piotras@gmail.com> Reported-by: Benedict Schlueter <benedict.schlueter@rub.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Benedict Schlueter <benedict.schlueter@rub.de>
Daniel Borkmann [Wed, 24 Mar 2021 10:25:39 +0000 (11:25 +0100)]
bpf: Move sanitize_val_alu out of op switch
Add a small sanitize_needed() helper function and move sanitize_val_alu()
out of the main opcode switch. In upcoming work, we'll move sanitize_ptr_alu()
as well out of its opcode switch so this helps to streamline both.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Tue, 23 Mar 2021 07:51:02 +0000 (08:51 +0100)]
bpf: Ensure off_reg has no mixed signed bounds for all types
The mixed signed bounds check really belongs into retrieve_ptr_limit()
instead of outside of it in adjust_ptr_min_max_vals(). The reason is
that this check is not tied to PTR_TO_MAP_VALUE only, but to all pointer
types that we handle in retrieve_ptr_limit() and given errors from the latter
propagate back to adjust_ptr_min_max_vals() and lead to rejection of the
program, it's a better place to reside to avoid anything slipping through
for future types. The reason why we must reject such off_reg is that we
otherwise would not be able to derive a mask, see details in 9d7eceede769
("bpf: restrict unknown scalars of mixed signed bounds for unprivileged").
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Tue, 23 Mar 2021 07:32:59 +0000 (08:32 +0100)]
bpf: Use correct permission flag for mixed signed bounds arithmetic
We forbid adding unknown scalars with mixed signed bounds due to the
spectre v1 masking mitigation. Hence this also needs bypass_spec_v1
flag instead of allow_ptr_leaks.
Fixes: 2c78ee898d8f ("bpf: Implement CAP_BPF") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
Null pointer dereference happens on master->slaves dereference in
teql_destroy() as master is null-pointer.
When qdisc_create() calls teql_qdisc_init() it imediately fails after
check "if (m->dev == dev)" because both devices are teql0, and it does
not set qdisc_priv(sch)->m leaving it zero on error path, then
qdisc_create() imediately calls teql_destroy() which does not expect
zero master pointer and we get OOPS.
Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 8 Apr 2021 21:08:37 +0000 (14:08 -0700)]
Merge tag 'mac80211-for-net-2021-04-08.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes berg says:
====================
Various small fixes:
* S1G beacon validation
* potential leak in nl80211
* fast-RX confusion with 4-addr mode
* erroneous WARN_ON that userspace can trigger
* wrong time units in virt_wifi
* rfkill userspace API breakage
* TXQ AC confusing that led to traffic stopped forever
* connection monitoring time after/before confusion
* netlink beacon head validation buffer overrun
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger [Wed, 7 Apr 2021 15:59:12 +0000 (08:59 -0700)]
ipv6: report errors for iftoken via netlink extack
Setting iftoken can fail for several different reasons but there
and there was no report to user as to the cause. Add netlink
extended errors to the processing of the request.
This requires adding additional argument through rtnl_af_ops
set_link_af callback.
Reported-by: Hongren Zheng <li@zenithal.me> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 8 Apr 2021 20:47:34 +0000 (13:47 -0700)]
Merge branch 'net-sched-action-init-fixes'
Vlad Buslov says:
====================
Action initalization fixes
This series fixes reference counting of action instances and modules in
several parts of action init code. The first patch reverts previous fix
that didn't properly account for rollback from a failure in the middle of
the loop in tcf_action_init() which is properly fixed by the following
patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
With recent changes that separated action module load from action
initialization tcf_action_init() function error handling code was modified
to manually release the loaded modules if loading/initialization of any
further action in same batch failed. For the case when all modules
successfully loaded and some of the actions were initialized before one of
them failed in init handler. In this case for all previous actions the
module will be released twice by the error handler: First time by the loop
that manually calls module_put() for all ops, and second time by the action
destroy code that puts the module after destroying the action.
Reproduction:
$ sudo tc actions add action simple sdata \"2\" index 2
$ sudo tc actions add action simple sdata \"1\" index 1 \
action simple sdata \"2\" index 2
RTNETLINK answers: File exists
We have an error talking to the kernel
$ sudo tc actions ls action simple
total acts 1
action order 0: Simple <"2">
index 2 ref 1 bind 0
$ sudo tc actions flush action simple
$ sudo tc actions ls action simple
$ sudo tc actions add action simple sdata \"2\" index 2
Error: Failed to load TC action module.
We have an error talking to the kernel
$ lsmod | grep simple
act_simple 20480 -1
Fix the issue by modifying module reference counting handling in action
initialization code:
- Get module reference in tcf_idr_create() and put it in tcf_idr_release()
instead of taking over the reference held by the caller.
- Modify users of tcf_action_init_1() to always release the module
reference which they obtain before calling init function instead of
assuming that created action takes over the reference.
- Finally, modify tcf_action_init_1() to not release the module reference
when overwriting existing action as this is no longer necessary since both
upper and lower layers obtain and manage their own module references
independently.
Fixes: d349f9976868 ("net_sched: fix RTNL deadlock again caused by request_module()") Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Action init code increments reference counter when it changes an action.
This is the desired behavior for cls API which needs to obtain action
reference for every classifier that points to action. However, act API just
needs to change the action and releases the reference before returning.
This sequence breaks when the requested action doesn't exist, which causes
act API init code to create new action with specified index, but action is
still released before returning and is deleted (unless it was referenced
concurrently by cls API).
Reproduction:
$ sudo tc actions ls action gact
$ sudo tc actions change action gact drop index 1
$ sudo tc actions ls action gact
Extend tcf_action_init() to accept 'init_res' array and initialize it with
action->ops->init() result. In tcf_action_add() remove pointers to created
actions from actions array before passing it to tcf_action_put_many().
Fixes: cae422f379f3 ("net: sched: use reference counting action init") Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Thu, 8 Apr 2021 13:45:20 +0000 (15:45 +0200)]
nl80211: fix beacon head validation
If the beacon head attribute (NL80211_ATTR_BEACON_HEAD)
is too short to even contain the frame control field,
we access uninitialized data beyond the buffer. Fix this
by checking the minimal required size first. We used to
do this until S1G support was added, where the fixed
data portion has a different size.
Reported-and-tested-by: syzbot+72b99dcf4607e8c770f3@syzkaller.appspotmail.com Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: 1d47f1198d58 ("nl80211: correctly validate S1G beacon head") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20210408154518.d9b06d39b4ee.Iff908997b2a4067e8d456b3cb96cab9771d252b8@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Piotr Krysiuk [Tue, 6 Apr 2021 20:59:39 +0000 (21:59 +0100)]
bpf, x86: Validate computation of branch displacements for x86-32
The branch displacement logic in the BPF JIT compilers for x86 assumes
that, for any generated branch instruction, the distance cannot
increase between optimization passes.
But this assumption can be violated due to how the distances are
computed. Specifically, whenever a backward branch is processed in
do_jit(), the distance is computed by subtracting the positions in the
machine code from different optimization passes. This is because part
of addrs[] is already updated for the current optimization pass, before
the branch instruction is visited.
And so the optimizer can expand blocks of machine code in some cases.
This can confuse the optimizer logic, where it assumes that a fixed
point has been reached for all machine code blocks once the total
program size stops changing. And then the JIT compiler can output
abnormal machine code containing incorrect branch displacements.
To mitigate this issue, we assert that a fixed point is reached while
populating the output image. This rejects any problematic programs.
The issue affects both x86-32 and x86-64. We mitigate separately to
ease backporting.
Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Reviewed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Piotr Krysiuk [Mon, 5 Apr 2021 21:52:15 +0000 (22:52 +0100)]
bpf, x86: Validate computation of branch displacements for x86-64
The branch displacement logic in the BPF JIT compilers for x86 assumes
that, for any generated branch instruction, the distance cannot
increase between optimization passes.
But this assumption can be violated due to how the distances are
computed. Specifically, whenever a backward branch is processed in
do_jit(), the distance is computed by subtracting the positions in the
machine code from different optimization passes. This is because part
of addrs[] is already updated for the current optimization pass, before
the branch instruction is visited.
And so the optimizer can expand blocks of machine code in some cases.
This can confuse the optimizer logic, where it assumes that a fixed
point has been reached for all machine code blocks once the total
program size stops changing. And then the JIT compiler can output
abnormal machine code containing incorrect branch displacements.
To mitigate this issue, we assert that a fixed point is reached while
populating the output image. This rejects any problematic programs.
The issue affects both x86-32 and x86-64. We mitigate separately to
ease backporting.
Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Reviewed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
A. Cody Schuffelen [Thu, 18 Mar 2021 20:04:19 +0000 (13:04 -0700)]
virt_wifi: Return micros for BSS TSF values
cfg80211_inform_bss expects to receive a TSF value, but is given the
time since boot in nanoseconds. TSF values are expected to be at
microsecond scale rather than nanosecond scale.
Du Cheng [Wed, 7 Apr 2021 16:27:56 +0000 (00:27 +0800)]
cfg80211: remove WARN_ON() in cfg80211_sme_connect
A WARN_ON(wdev->conn) would trigger in cfg80211_sme_connect(), if multiple
send_msg(NL80211_CMD_CONNECT) system calls are made from the userland, which
should be anticipated and handled by the wireless driver. Remove this WARN_ON()
to prevent kernel panic if kernel is configured to "panic_on_warn".
Ben Greear [Tue, 30 Mar 2021 23:07:49 +0000 (16:07 -0700)]
mac80211: fix time-is-after bug in mlme
The incorrect timeout check caused probing to happen when it did
not need to happen. This in turn caused tx performance drop
for around 5 seconds in ath10k-ct driver. Possibly that tx drop
is due to a secondary issue, but fixing the probe to not happen
when traffic is running fixes the symptom.
Signed-off-by: Ben Greear <greearb@candelatech.com> Fixes: 9abf4e49830d ("mac80211: optimize station connection monitor") Acked-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20210330230749.14097-1-greearb@candelatech.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This makes sense, but ieee80211_ac_from_tid(16) is the same
as ieee80211_ac_from_tid(0) which is just IEEE80211_AC_BE.
Now, normally this is fine. However, if the netdev queues
were stopped, then the code in ieee80211_tx_dequeue() will
propagate the stop from the interface (vif->txqs_stopped[])
if the AC 2 (ieee80211_ac_from_tid(txq->tid)) is marked as
stopped. On wake, however, __ieee80211_wake_txqs() will wake
the TXQ if AC 0 (txq->ac) is woken up.
If a driver stops all queues with ieee80211_stop_tx_queues()
and then wakes them again with ieee80211_wake_tx_queues(),
the ieee80211_wake_txqs() tasklet will run to resync queue
and TXQ state. If all queues were woken, then what'll happen
is that _ieee80211_wake_txqs() will run in order of HW queues
0-3, typically (and certainly for iwlwifi) corresponding to
ACs 0-3, so it'll call __ieee80211_wake_txqs() for each AC in
order 0-3.
When __ieee80211_wake_txqs() is called for AC 0 (VO) that'll
wake up the management TXQ (remember its tid is 16), and the
driver's wake_tx_queue() will be called. That tries to get a
frame, which will immediately *stop* the TXQ again, because
now we check against AC 2, and AC 2 hasn't yet been marked as
woken up again in sdata->vif.txqs_stopped[] since we're only
in the __ieee80211_wake_txqs() call for AC 0.
Thus, the management TXQ will never be started again.
Fix this by checking txq->ac directly instead of calculating
the AC as ieee80211_ac_from_tid(txq->tid).
Johannes Berg [Fri, 19 Mar 2021 22:25:11 +0000 (23:25 +0100)]
rfkill: revert back to old userspace API by default
Recompiling with the new extended version of struct rfkill_event
broke systemd in *two* ways:
- It used "sizeof(struct rfkill_event)" to read the event, but
then complained if it actually got something != 8, this broke
it on new kernels (that include the updated API);
- It used sizeof(struct rfkill_event) to write a command, but
didn't implement the intended expansion protocol where the
kernel returns only how many bytes it accepted, and errored
out due to the unexpected smaller size on kernels that didn't
include the updated API.
Even though systemd has now been fixed, that fix may not be always
deployed, and other applications could potentially have similar
issues.
As such, in the interest of avoiding regressions, revert the
default API "struct rfkill_event" back to the original size.
Instead, add a new "struct rfkill_event_ext" that extends it by
the new field, and even more clearly document that applications
should be prepared for extensions in two ways:
* write might only accept fewer bytes on older kernels, and
will return how many to let userspace know which data may
have been ignored;
* read might return anything between 8 (the original size) and
whatever size the application sized its buffer at, indicating
how much event data was supported by the kernel.
Perhaps that will help avoid such issues in the future and we
won't have to come up with another version of the struct if we
ever need to extend it again.
Applications that want to take advantage of the new field will
have to be modified to use struct rfkill_event_ext instead now,
which comes with the danger of them having already been updated
to use it from 'struct rfkill_event', but I found no evidence
of that, and it's still relatively new.
Seevalamuthu Mariappan [Fri, 19 Mar 2021 14:18:52 +0000 (19:48 +0530)]
mac80211: clear sta->fast_rx when STA removed from 4-addr VLAN
In some race conditions, with more clients and traffic configuration,
below crash is seen when making the interface down. sta->fast_rx wasn't
cleared when STA gets removed from 4-addr AP_VLAN interface. The crash is
due to try accessing 4-addr AP_VLAN interface's net_device (fast_rx->dev)
which has been deleted already.
Resolve this by clearing sta->fast_rx pointer when STA removes
from a 4-addr VLAN.
[ 239.449529] Unable to handle kernel NULL pointer dereference at virtual address 00000004
[ 239.449531] pgd = 80204000
...
[ 239.481496] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.60 #227
[ 239.481591] Hardware name: Generic DT based system
[ 239.487665] task: be05b700 ti: be08e000 task.ti: be08e000
[ 239.492360] PC is at get_rps_cpu+0x2d4/0x31c
[ 239.497823] LR is at 0xbe08fc54
...
[ 239.778574] [<80739740>] (get_rps_cpu) from [<8073cb10>] (netif_receive_skb_internal+0x8c/0xac)
[ 239.786722] [<8073cb10>] (netif_receive_skb_internal) from [<8073d578>] (napi_gro_receive+0x48/0xc4)
[ 239.795267] [<8073d578>] (napi_gro_receive) from [<c7b83e8c>] (ieee80211_mark_rx_ba_filtered_frames+0xbcc/0x12d4 [mac80211])
[ 239.804776] [<c7b83e8c>] (ieee80211_mark_rx_ba_filtered_frames [mac80211]) from [<c7b84d4c>] (ieee80211_rx_napi+0x7b8/0x8c8 [mac8
0211])
[ 239.815857] [<c7b84d4c>] (ieee80211_rx_napi [mac80211]) from [<c7f63d7c>] (ath11k_dp_process_rx+0x7bc/0x8c8 [ath11k])
[ 239.827757] [<c7f63d7c>] (ath11k_dp_process_rx [ath11k]) from [<c7f5b6c4>] (ath11k_dp_service_srng+0x2c0/0x2e0 [ath11k])
[ 239.838484] [<c7f5b6c4>] (ath11k_dp_service_srng [ath11k]) from [<7f55b7dc>] (ath11k_ahb_ext_grp_napi_poll+0x20/0x84 [ath11k_ahb]
)
[ 239.849419] [<7f55b7dc>] (ath11k_ahb_ext_grp_napi_poll [ath11k_ahb]) from [<8073ce1c>] (net_rx_action+0xe0/0x28c)
[ 239.860945] [<8073ce1c>] (net_rx_action) from [<80324868>] (__do_softirq+0xe4/0x228)
[ 239.871269] [<80324868>] (__do_softirq) from [<80324c48>] (irq_exit+0x98/0x108)
[ 239.879080] [<80324c48>] (irq_exit) from [<8035c59c>] (__handle_domain_irq+0x90/0xb4)
[ 239.886114] [<8035c59c>] (__handle_domain_irq) from [<8030137c>] (gic_handle_irq+0x50/0x94)
[ 239.894100] [<8030137c>] (gic_handle_irq) from [<803024c0>] (__irq_svc+0x40/0x74)
net: hso: fix null-ptr-deref during tty device unregistration
Multiple ttys try to claim the same the minor number causing a double
unregistration of the same device. The first unregistration succeeds
but the next one results in a null-ptr-deref.
The get_free_serial_index() function returns an available minor number
but doesn't assign it immediately. The assignment is done by the caller
later. But before this assignment, calls to get_free_serial_index()
would return the same minor number.
Fix this by modifying get_free_serial_index to assign the minor number
immediately after one is found to be and rename it to obtain_minor()
to better reflect what it does. Similary, rename set_serial_by_index()
to release_minor() and modify it to free up the minor number of the
given hso_serial. Every obtain_minor() should have corresponding
release_minor() call.
Fixes: 72dc1c096c705 ("HSO: add option hso driver") Reported-by: syzbot+c49fe6089f295a05e6f8@syzkaller.appspotmail.com Tested-by: syzbot+c49fe6089f295a05e6f8@syzkaller.appspotmail.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 7 Apr 2021 22:04:55 +0000 (15:04 -0700)]
Merge tag 'ieee802154-for-davem-2021-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2021-04-07
An update from ieee802154 for your *net* tree.
Most of these are coming from the flood of syzkaller reports
lately got for the ieee802154 subsystem. There are likely to
come more for this, but this is a good batch to get out for now.
Alexander Aring created a patchset to avoid llsec handling on a
monitor interface, which we do not support.
Alex Shi removed a unused macro.
Pavel Skripkin fixed another protection fault found by syzkaller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 7 Apr 2021 21:57:35 +0000 (14:57 -0700)]
Merge tag 'wireless-drivers-2021-04-07' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for v5.12
Third, and last, set of fixes for v5.12. Small fixes, iwlwifi having
most of them. brcmfmac regression caused by cfg80211 changes is the
most important here.
iwlwifi
* fix a lockdep warning
* fix regulatory feature detection in certain firmware versions
* new hardware support
* fix lockdep warning
* mvm: fix beacon protection checks
mt76
* mt7921: fix airtime reporting
brcmfmac
* fix a deadlock regression
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, link_mode parameter derives 3 other link parameters, speed,
lanes and duplex, and the derived information is sent to user space.
Few bugs were found in that functionality.
First, some drivers clear the 'ethtool_link_ksettings' struct in their
get_link_ksettings() callback and cause receiving wrong link mode
information in user space. And also, some drivers can report random
values in the 'link_mode' field and cause general protection fault.
Second, the link parameters are only derived in netlink path so in ioctl
path, we don't any reasonable values.
Third, setting 'speed 10000 lanes 1' fails since the lanes parameter
wasn't set for ETHTOOL_LINK_MODE_10000baseR_FEC_BIT.
Patch #1 solves the first two problems by removing link_mode parameter
and deriving the link parameters in driver instead of ethtool.
Patch #2 solves the third one, by setting the lanes parameter for the
link_mode.
v3:
* Remove the link_mode parameter in the first patch to solve
both two issues from patch#1 and patch#2.
* Add the second patch to solve the third issue.
v2:
* Add patch #2.
* Introduce 'cap_link_mode_supported' instead of adding a
validity field to 'ethtool_link_ksettings' struct in patch #1.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ethtool: Add lanes parameter for ETHTOOL_LINK_MODE_10000baseR_FEC_BIT
Lanes field is missing for ETHTOOL_LINK_MODE_10000baseR_FEC_BIT
link mode and it causes a failure when trying to set
'speed 10000 lanes 1' on Spectrum-2 machines when autoneg is set to on.
Add the lanes parameter for ETHTOOL_LINK_MODE_10000baseR_FEC_BIT
link mode.
Fixes: c8907043c6ac9 ("ethtool: Get link mode in use instead of speed and duplex parameters") Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ethtool: Remove link_mode param and derive link params from driver
Some drivers clear the 'ethtool_link_ksettings' struct in their
get_link_ksettings() callback, before populating it with actual values.
Such drivers will set the new 'link_mode' field to zero, resulting in
user space receiving wrong link mode information given that zero is a
valid value for the field.
Another problem is that some drivers (notably tun) can report random
values in the 'link_mode' field. This can result in a general protection
fault when the field is used as an index to the 'link_mode_params' array
[1].
This happens because such drivers implement their set_link_ksettings()
callback by simply overwriting their private copy of
'ethtool_link_ksettings' struct with the one they get from the stack,
which is not always properly initialized.
Fix these problems by removing 'link_mode' from 'ethtool_link_ksettings'
and instead have drivers call ethtool_params_from_link_mode() with the
current link mode. The function will derive the link parameters (e.g.,
speed) from the link mode and fill them in the 'ethtool_link_ksettings'
struct.
v3:
* Remove link_mode parameter and derive the link parameters in
the driver instead of passing link_mode parameter to ethtool
and derive it there.
v2:
* Introduce 'cap_link_mode_supported' instead of adding a
validity field to 'ethtool_link_ksettings' struct.
Fixes: c8907043c6ac9 ("ethtool: Get link mode in use instead of speed and duplex parameters") Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
These patches fix a series of spelling errors in net/tipc module.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kurt Kanzenbach [Tue, 6 Apr 2021 07:35:09 +0000 (09:35 +0200)]
net: hsr: Reset MAC header for Tx path
Reset MAC header in HSR Tx path. This is needed, because direct packet
transmission, e.g. by specifying PACKET_QDISC_BYPASS does not reset the MAC
header.
This has been observed using the following setup:
|$ ip link add name hsr0 type hsr slave1 lan0 slave2 lan1 supervision 45 version 1
|$ ifconfig hsr0 up
|$ ./test hsr0
The test binary is using mmap'ed sockets and is specifying the
PACKET_QDISC_BYPASS socket option.
This patch resolves the following warning on a non-patched kernel:
The warning can be safely removed, because the other call sites of
hsr_forward_skb() make sure that the skb is prepared correctly.
Fixes: d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/rds: Avoid potential use after free in rds_send_remove_from_sock
In case of rs failure in rds_send_remove_from_sock(), the 'rm' resource
is freed and later under spinlock, causing potential use-after-free.
Set the free pointer to NULL to avoid undefined behavior.
Signed-off-by: Aditya Pakki <pakki001@umn.edu> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure to modify uplink port to follow only if the uplink_follow
capability is set as required by the HW spec. Failure to do so causes
traffic to the uplink representor net device to cease after switching to
switchdev mode.
Fixes: 7d0314b11cdd ("net/mlx5e: Modify uplink state on interface up/down") Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Phillip Potter [Tue, 6 Apr 2021 17:45:54 +0000 (18:45 +0100)]
net: tun: set tun->dev->addr_len during TUNSETLINK processing
When changing type with TUNSETLINK ioctl command, set tun->dev->addr_len
to match the appropriate type, using new tun_get_addr_len utility function
which returns appropriate address length for given type. Fixes a
KMSAN-found uninit-value bug reported by syzbot at:
https://syzkaller.appspot.com/bug?id=0766d38c656abeace60621896d705743aeefed51
Reported-by: syzbot+001516d86dbe88862cec@syzkaller.appspotmail.com Diagnosed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wong Vee Khee [Tue, 6 Apr 2021 13:17:30 +0000 (21:17 +0800)]
ethtool: fix incorrect datatype in set_eee ops
The member 'tx_lpi_timer' is defined with __u32 datatype in the ethtool
header file. Hence, we should use ethnl_update_u32() in set_eee ops.
Fixes: fd77be7bd43c ("ethtool: set EEE settings with EEE_SET request") Cc: <stable@vger.kernel.org> # 5.10.x Cc: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
net: hns3: clear VF down state bit before request link status
Currently, the VF down state bit is cleared after VF sending
link status request command. There is problem that when VF gets
link status replied from PF, the down state bit may still set
as 1. In this case, the link status replied from PF will be
ignored and always set VF link status to down.
To fix this problem, clear VF down state bit before VF requests
link status.
Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
pcnet32: Use pci_resource_len to validate PCI resource
pci_resource_start() is not a good indicator to determine if a PCI
resource exists or not, since the resource may start at address 0.
This is seen when trying to instantiate the driver in qemu for riscv32
or riscv64.
This currently happens because on redirect case we do skb_set_owner_r()
with the original sock. This increments the fwd_alloc memory accounting
on the original sock. Then on redirect we may push this into the queue
of the psock we are redirecting to. When the skb is flushed from the
queue we give the memory back to the original sock. The problem is if
the original sock is destroyed/closed with skbs on another psocks queue
then the original sock will not have a way to reclaim the memory before
being destroyed. Then above warning will be thrown
At this point we have torn down our own psock, but have the outstanding
skb in psock_other. Note that SK_PASS doesn't have this problem because
the sk_psock_drop() logic releases the skb, its still associated with
our psock.
To resolve lets only account for sockets on the ingress queue that are
still associated with the current socket. On the redirect case we will
check memory limits per 6fa9201a89898, but will omit fwd_alloc accounting
until skb is actually enqueued. When the skb is sent via skb_send_sock_locked
or received with sk_psock_skb_ingress memory will be claimed on psock_other.
John Fastabend [Thu, 1 Apr 2021 22:00:19 +0000 (15:00 -0700)]
bpf, sockmap: Fix sk->prot unhash op reset
In '4da6a196f93b1' we fixed a potential unhash loop caused when
a TLS socket in a sockmap was removed from the sockmap. This
happened because the unhash operation on the TLS ctx continued
to point at the sockmap implementation of unhash even though the
psock has already been removed. The sockmap unhash handler when a
psock is removed does the following,
rcu_read_lock();
psock = sk_psock(sk);
if (unlikely(!psock)) {
rcu_read_unlock();
if (sk->sk_prot->unhash)
sk->sk_prot->unhash(sk);
return;
}
[...]
}
The unlikely() case is there to handle the case where psock is detached
but the proto ops have not been updated yet. But, in the above case
with TLS and removed psock we never fixed sk_prot->unhash() and unhash()
points back to sock_map_unhash resulting in a loop. To fix this we added
this bit of code,
This will set the sk_prot->unhash back to its saved value. This is the
correct callback for a TLS socket that has been removed from the sock_map.
Unfortunately, this also overwrites the unhash pointer for all psocks.
We effectively break sockmap unhash handling for any future socks.
Omitting the unhash operation will leave stale entries in the map if
a socket transition through unhash, but does not do close() op.
To fix set unhash correctly before calling into tls_update. This way the
TLS enabled socket will point to the saved unhash() handler.
Fixes: 4da6a196f93b1 ("bpf: Sockmap/tls, during free we may call tcp_bpf_unhash() in loop") Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Reported-by: Lorenz Bauer <lmb@cloudflare.com> Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/161731441904.68884.15593917809745631972.stgit@john-XPS-13-9370
This is caused by NULL returned by tipc_aead_get(), and then crashed when
dereferencing it later in tipc_crypto_rcv_complete(). This might happen
when tipc_crypto_rcv_complete() is called by two threads at the same time:
the tmp attached by tipc_crypto_key_attach() in one thread may be released
by the one attached by that in the other thread.
This patch is to fix it by incrementing the tmp's refcnt before attaching
it instead of calling tipc_aead_get() after attaching it.
Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 2 Apr 2021 13:26:02 +0000 (06:26 -0700)]
virtio_net: Do not pull payload in skb->head
Xuan Zhuo reported that commit 3226b158e67c ("net: avoid 32 x truesize
under-estimation for tiny skbs") brought a ~10% performance drop.
The reason for the performance drop was that GRO was forced
to chain sk_buff (using skb_shinfo(skb)->frag_list), which
uses more memory but also cause packet consumers to go over
a lot of overhead handling all the tiny skbs.
It turns out that virtio_net page_to_skb() has a wrong strategy :
It allocates skbs with GOOD_COPY_LEN (128) bytes in skb->head, then
copies 128 bytes from the page, before feeding the packet to GRO stack.
This was suboptimal before commit 3226b158e67c ("net: avoid 32 x truesize
under-estimation for tiny skbs") because GRO was using 2 frags per MSS,
meaning we were not packing MSS with 100% efficiency.
Fix is to pull only the ethernet header in page_to_skb()
Then, we change virtio_net_hdr_to_skb() to pull the missing
headers, instead of assuming they were already pulled by callers.
This fixes the performance regression, but could also allow virtio_net
to accept packets with more than 128bytes of headers.
Many thanks to Xuan Zhuo for his report, and his tests/help.
Fixes: 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") Reported-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://www.spinics.net/lists/netdev/msg731397.html Co-Developed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: broadcom: bcm4908enet: Fix a double free in bcm4908_enet_dma_alloc
In bcm4908_enet_dma_alloc, if callee bcm4908_dma_alloc_buf_descs() failed,
it will free the ring->cpu_addr by dma_free_coherent() and return error.
Then bcm4908_enet_dma_free() will be called, and free the same cpu_addr
by dma_free_coherent() again.
My patch set ring->cpu_addr to NULL after it is freed in
bcm4908_dma_alloc_buf_descs() to avoid the double free.
Fixes: 4feffeadbcb2e ("net: broadcom: bcm4908enet: add BCM4908 controller driver") Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Skripkin [Thu, 4 Mar 2021 15:21:25 +0000 (18:21 +0300)]
net: mac802154: Fix general protection fault
syzbot found general protection fault in crypto_destroy_tfm()[1].
It was caused by wrong clean up loop in llsec_key_alloc().
If one of the tfm array members is in IS_ERR() range it will
cause general protection fault in clean up function [1].
Alexander Aring [Mon, 5 Apr 2021 00:30:54 +0000 (20:30 -0400)]
net: ieee802154: stop dump llsec params for monitors
This patch stops dumping llsec params for monitors which we don't support
yet. Otherwise we will access llsec mib which isn't initialized for
monitors.
Alexander Aring [Mon, 5 Apr 2021 00:30:53 +0000 (20:30 -0400)]
net: ieee802154: forbid monitor for del llsec seclevel
This patch forbids to del llsec seclevel for monitor interfaces which we
don't support yet. Otherwise we will access llsec mib which isn't
initialized for monitors.
Alexander Aring [Mon, 5 Apr 2021 00:30:52 +0000 (20:30 -0400)]
net: ieee802154: forbid monitor for add llsec seclevel
This patch forbids to add llsec seclevel for monitor interfaces which we
don't support yet. Otherwise we will access llsec mib which isn't
initialized for monitors.
Alexander Aring [Mon, 5 Apr 2021 00:30:51 +0000 (20:30 -0400)]
net: ieee802154: stop dump llsec seclevels for monitors
This patch stops dumping llsec seclevels for monitors which we don't
support yet. Otherwise we will access llsec mib which isn't initialized
for monitors.
Alexander Aring [Mon, 5 Apr 2021 00:30:50 +0000 (20:30 -0400)]
net: ieee802154: forbid monitor for del llsec devkey
This patch forbids to del llsec devkey for monitor interfaces which we
don't support yet. Otherwise we will access llsec mib which isn't
initialized for monitors.
Alexander Aring [Mon, 5 Apr 2021 00:30:49 +0000 (20:30 -0400)]
net: ieee802154: forbid monitor for add llsec devkey
This patch forbids to add llsec devkey for monitor interfaces which we
don't support yet. Otherwise we will access llsec mib which isn't
initialized for monitors.
Alexander Aring [Mon, 5 Apr 2021 00:30:48 +0000 (20:30 -0400)]
net: ieee802154: stop dump llsec devkeys for monitors
This patch stops dumping llsec devkeys for monitors which we don't support
yet. Otherwise we will access llsec mib which isn't initialized for
monitors.
Alexander Aring [Mon, 5 Apr 2021 00:30:47 +0000 (20:30 -0400)]
net: ieee802154: forbid monitor for del llsec dev
This patch forbids to del llsec dev for monitor interfaces which we
don't support yet. Otherwise we will access llsec mib which isn't
initialized for monitors.
Alexander Aring [Mon, 5 Apr 2021 00:30:46 +0000 (20:30 -0400)]
net: ieee802154: forbid monitor for add llsec dev
This patch forbids to add llsec dev for monitor interfaces which we
don't support yet. Otherwise we will access llsec mib which isn't
initialized for monitors.
Alexander Aring [Mon, 5 Apr 2021 00:30:44 +0000 (20:30 -0400)]
net: ieee802154: forbid monitor for del llsec key
This patch forbids to del llsec key for monitor interfaces which we
don't support yet. Otherwise we will access llsec mib which isn't
initialized for monitors.
Alexander Aring [Mon, 5 Apr 2021 00:30:43 +0000 (20:30 -0400)]
net: ieee802154: forbid monitor for add llsec key
This patch forbids to add llsec key for monitor interfaces which we
don't support yet. Otherwise we will access llsec mib which isn't
initialized for monitors.
Marc Kleine-Budde [Tue, 30 Mar 2021 09:46:07 +0000 (11:46 +0200)]
can: mcp251x: fix support for half duplex SPI host controllers
Some SPI host controllers do not support full-duplex SPI transfers.
The function mcp251x_spi_trans() does a full duplex transfer. It is
used in several places in the driver, where a TX half duplex transfer
is sufficient.
To fix support for half duplex SPI host controllers, this patch
introduces a new function mcp251x_spi_write() and changes all callers
that do a TX half duplex transfer to use mcp251x_spi_write().
Fixes: e0e25001d088 ("can: mcp251x: add support for half duplex controllers") Link: https://lore.kernel.org/r/20210330100246.1074375-1-mkl@pengutronix.de Cc: Tim Harvey <tharvey@gateworks.com> Tested-By: Tim Harvey <tharvey@gateworks.com> Reported-by: Gerhard Bertelsmann <info@gerhard-bertelsmann.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Johannes Berg [Fri, 26 Mar 2021 10:57:23 +0000 (12:57 +0200)]
iwlwifi: mvm: fix beacon protection checks
Unfortunately, since beacon protection isn't fully available
yet, we didn't notice that there are problems with it and
that the replay detection isn't working correctly. We were
relying only on mac80211, since iwl_mvm_rx_crypto() exits
when !ieee80211_has_protected(), which is of course true for
protected (but not encrypted) management frames.
Fix this to properly detect protected (but not encrypted)
management frames and handle them - we continue to only care
about beacons since for others everything can and will be
checked in mac80211.
Johannes Berg [Fri, 26 Mar 2021 10:57:20 +0000 (12:57 +0200)]
iwlwifi: fw: fix notification wait locking
Since we now call iwl_notification_wait_notify() from the
NAPI poll in soft-IRQ, we get a (valid) lockdep complaint
that we could get a deadlock by taking the spinlock from
sleeping context and then getting the soft-IRQ that also
tries to take it (in NAPI polling).
Johannes Berg [Fri, 26 Mar 2021 10:57:18 +0000 (12:57 +0200)]
iwlwifi: pcie: properly set LTR workarounds on 22000 devices
As the context info gen3 code is only called for >=AX210 devices
(from iwl_trans_pcie_gen2_start_fw()) the code there to set LTR
on 22000 devices cannot actually do anything (22000 < AX210).
Fix this by moving the LTR code to iwl_trans_pcie_gen2_start_fw()
where it can handle both devices. This then requires that we kick
the firmware only after that rather than doing it from the context
info code.
Note that this again had a dead branch in gen3 code, which I've
removed here.
Luca Coelho [Fri, 26 Mar 2021 10:57:17 +0000 (12:57 +0200)]
iwlwifi: fix 11ax disabled bit in the regulatory capability flags
When version 2 of the regulatory capability flags API was implemented,
the flag to disable 11ax was defined as bit 13, but this was later
changed and the bit remained as bit 10, like in version 1. This was
never changed in the driver, so we were checking for the wrong bit in
newer devices. Fix it.
batman-adv: initialize "struct batadv_tvlv_tt_vlan_data"->reserved field
KMSAN found uninitialized value at batadv_tt_prepare_tvlv_local_data()
[1], for commit ced72933a5e8ab52 ("batman-adv: use CRC32C instead of CRC16
in TT code") inserted 'reserved' field into "struct batadv_tvlv_tt_data"
and commit 7ea7b4a142758dea ("batman-adv: make the TT CRC logic VLAN
specific") moved that field to "struct batadv_tvlv_tt_vlan_data" but left
that field uninitialized.
Reported-by: syzbot <syzbot+50ee810676e6a089487b@syzkaller.appspotmail.com> Tested-by: syzbot <syzbot+50ee810676e6a089487b@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: ced72933a5e8ab52 ("batman-adv: use CRC32C instead of CRC16 in TT code") Fixes: 7ea7b4a142758dea ("batman-adv: make the TT CRC logic VLAN specific") Acked-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: hns3: Remove un-necessary 'else-if' in the hclge_reset_event()
Code to defer the reset(which caps the frequency of the reset) schedules the
timer and returns. Hence, following 'else-if' looks un-necessary.
Fixes: 9de0b86f6444 ("net: hns3: Prevent to request reset frequently") Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: hns3: Remove the left over redundant check & assignment
This removes the left over check and assignment which is no longer used
anywhere in the function and should have been removed as part of the
below mentioned patch.
Fixes: 012fcb52f67c ("net: hns3: activate reset timer when calling reset_event") Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Maciej Żenczykowski [Mon, 5 Apr 2021 07:06:52 +0000 (00:06 -0700)]
net-ipv6: bugfix - raw & sctp - switch to ipv6_can_nonlocal_bind()
Found by virtue of ipv6 raw sockets not honouring the per-socket
IP{,V6}_FREEBIND setting.
Based on hits found via:
git grep '[.]ip_nonlocal_bind'
We fix both raw ipv6 sockets to honour IP{,V6}_FREEBIND and IP{,V6}_TRANSPARENT,
and we fix sctp sockets to honour IP{,V6}_TRANSPARENT (they already honoured
FREEBIND), and not just the ipv6 'ip_nonlocal_bind' sysctl.
The helper is defined as:
static inline bool ipv6_can_nonlocal_bind(struct net *net, struct inet_sock *inet) {
return net->ipv6.sysctl.ip_nonlocal_bind || inet->freebind || inet->transparent;
}
so this change only widens the accepted opt-outs and is thus a clean bugfix.
I'm not entirely sure what 'fixes' tag to add, since this is AFAICT an ancient bug,
but IMHO this should be applied to stable kernels as far back as possible.
As such I'm adding a 'fixes' tag with the commit that originally added the helper,
which happened in 4.19. Backporting to older LTS kernels (at least 4.9 and 4.14)
would presumably require open-coding it or backporting the helper as well.
Cc: Lorenzo Colitti <lorenzo@google.com> Fixes: 83ba4645152d ("net: add helpers checking if socket can be bound to nonlocal address") Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-By: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
openvswitch: fix send of uninitialized stack memory in ct limit reply
'struct ovs_zone_limit' has more members than initialized in
ovs_ct_limit_get_default_limit(). The rest of the memory is a random
kernel stack content that ends up being sent to userspace.
Fix that by using designated initializer that will clear all
non-specified fields.
Fixes: 11efd5cb04a1 ("openvswitch: Support conntrack zone limit") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: c1e85c6ce57ef ("net: macb: save/restore the remaining registers and features") Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yunjian Wang [Thu, 1 Apr 2021 04:52:48 +0000 (12:52 +0800)]
net: cls_api: Fix uninitialised struct field bo->unlocked_driver_cb
The 'unlocked_driver_cb' struct field in 'bo' is not being initialized
in tcf_block_offload_init(). The uninitialized 'unlocked_driver_cb'
will be used when calling unlocked_driver_cb(). So initialize 'bo' to
zero to avoid the issue.
Addresses-Coverity: ("Uninitialized scalar variable") Fixes: 0fdcf78d5973 ("net: use flow_indr_dev_setup_offload()") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 1 Apr 2021 16:57:45 +0000 (18:57 +0200)]
mptcp: revert "mptcp: provide subflow aware release function"
This change reverts commit ad98dd37051e ("mptcp: provide subflow aware
release function"). The latter introduced a deadlock spotted by
syzkaller and is not needed anymore after the previous commit.
Fixes: ad98dd37051e ("mptcp: provide subflow aware release function") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 1 Apr 2021 16:57:44 +0000 (18:57 +0200)]
mptcp: forbit mcast-related sockopt on MPTCP sockets
Unrolling mcast state at msk dismantel time is bug prone, as
syzkaller reported:
======================================================
WARNING: possible circular locking dependency detected
5.11.0-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor905/8822 is trying to acquire lock: ffffffff8d678fe8 (rtnl_mutex){+.+.}-{3:3}, at: ipv6_sock_mc_close+0xd7/0x110 net/ipv6/mcast.c:323
but task is already holding lock: ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1600 [inline] ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: mptcp6_release+0x57/0x130 net/mptcp/protocol.c:3507
which lock already depends on the new lock.
Instead we can simply forbit any mcast-related setsockopt
Fixes: 717e79c867ca5 ("mptcp: Add setsockopt()/getsockopt() socket operations") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Skripkin [Thu, 1 Apr 2021 13:27:52 +0000 (16:27 +0300)]
drivers: net: fix memory leak in peak_usb_create_dev
syzbot reported memory leak in peak_usb.
The problem was in case of failure after calling
->dev_init()[2] in peak_usb_create_dev()[1]. The data
allocated int dev_init() wasn't freed, so simple
->dev_free() call fix this problem.
Reported-by: syzbot+91adee8d9ebb9193d22d@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Norman Maurer [Thu, 1 Apr 2021 06:59:17 +0000 (08:59 +0200)]
net: udp: Add support for getsockopt(..., ..., UDP_GRO, ..., ...);
Support for UDP_GRO was added in the past but the implementation for
getsockopt was missed which did lead to an error when we tried to
retrieve the setting for UDP_GRO. This patch adds the missing switch
case for UDP_GRO
Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") Signed-off-by: Norman Maurer <norman_maurer@apple.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Skripkin [Thu, 1 Apr 2021 04:46:24 +0000 (07:46 +0300)]
drivers: net: fix memory leak in atusb_probe
syzbot reported memory leak in atusb_probe()[1].
The problem was in atusb_alloc_urbs().
Since urb is anchored, we need to release the reference
to correctly free the urb
Reported-by: syzbot+28a246747e0a465127f3@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This series fixes some issues around socket creation for AF_XDP.
Patch 1 fixes a potential NULL pointer dereference in
xsk_socket__create_shared.
Patch 2 ensures that the umem passed to xsk_socket__create(_shared)
remains unchanged in event of failure.
Patch 3 makes it possible for xsk_socket__create(_shared) to
succeed even if the rx and tx XDP rings have already been set up by
introducing a new fields to struct xsk_umem which represent the ring
setup status for the xsk which shares the fd with the umem.
v3->v4:
* Reduced nesting in xsk_put_ctx as suggested by Alexei.
* Use bools instead of a u8 and flags to represent the
ring setup status as suggested by Björn.
v2->v3:
* Instead of ignoring the return values of the setsockopt calls, introduce
a new flag to determine whether or not to call them based on the ring
setup status as suggested by Alexei.
v1->v2:
* Simplified restoring the _save pointers as suggested by Magnus.
* Fixed the condition which determines whether to unmap umem rings
when socket create fails.
====================
Acked-by: Björn Töpel <bjorn@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Ciara Loftus [Wed, 31 Mar 2021 06:12:18 +0000 (06:12 +0000)]
libbpf: Only create rx and tx XDP rings when necessary
Prior to this commit xsk_socket__create(_shared) always attempted to create
the rx and tx rings for the socket. However this causes an issue when the
socket being setup is that which shares the fd with the UMEM. If a
previous call to this function failed with this socket after the rings were
set up, a subsequent call would always fail because the rings are not torn
down after the first call and when we try to set them up again we encounter
an error because they already exist. Solve this by remembering whether the
rings were set up by introducing new bools to struct xsk_umem which
represent the ring setup status and using them to determine whether or
not to set up the rings.