]> www.infradead.org Git - users/hch/dma-mapping.git/log
users/hch/dma-mapping.git
3 years agonet: ipv6: ensure we call ipv6_mc_down() at most once
j.nixdorf@avm.de [Thu, 24 Feb 2022 09:06:49 +0000 (10:06 +0100)]
net: ipv6: ensure we call ipv6_mc_down() at most once

There are two reasons for addrconf_notify() to be called with NETDEV_DOWN:
either the network device is actually going down, or IPv6 was disabled
on the interface.

If either of them stays down while the other is toggled, we repeatedly
call the code for NETDEV_DOWN, including ipv6_mc_down(), while never
calling the corresponding ipv6_mc_up() in between. This will cause a
new entry in idev->mc_tomb to be allocated for each multicast group
the interface is subscribed to, which in turn leaks one struct ifmcaddr6
per nontrivial multicast group the interface is subscribed to.

The following reproducer will leak at least $n objects:

ip addr add ff2e::4242/32 dev eth0 autojoin
sysctl -w net.ipv6.conf.eth0.disable_ipv6=1
for i in $(seq 1 $n); do
ip link set up eth0; ip link set down eth0
done

Joining groups with IPV6_ADD_MEMBERSHIP (unprivileged) or setting the
sysctl net.ipv6.conf.eth0.forwarding to 1 (=> subscribing to ff02::2)
can also be used to create a nontrivial idev->mc_list, which will the
leak objects with the right up-down-sequence.

Based on both sources for NETDEV_DOWN events the interface IPv6 state
should be considered:

 - not ready if the network interface is not ready OR IPv6 is disabled
   for it
 - ready if the network interface is ready AND IPv6 is enabled for it

The functions ipv6_mc_up() and ipv6_down() should only be run when this
state changes.

Implement this by remembering when the IPv6 state is ready, and only
run ipv6_mc_down() if it actually changed from ready to not ready.

The other direction (not ready -> ready) already works correctly, as:

 - the interface notification triggered codepath for NETDEV_UP /
   NETDEV_CHANGE returns early if ipv6 is disabled, and
 - the disable_ipv6=0 triggered codepath skips fully initializing the
   interface as long as addrconf_link_ready(dev) returns false
 - calling ipv6_mc_up() repeatedly does not leak anything

Fixes: 3ce62a84d53c ("ipv6: exit early in addrconf_notify() if IPv6 is disabled")
Signed-off-by: Johannes Nixdorf <j.nixdorf@avm.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
David S. Miller [Sat, 26 Feb 2022 12:50:20 +0000 (12:50 +0000)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-02-25

This series contains updates to iavf driver only.

Slawomir fixes stability issues that can be seen when stressing the
driver using a large number of VFs with a multitude of operations.
Among the fixes are reworking mutexes to provide more effective locking,
ensuring initialization is complete before teardown, preventing
operations which could race while removing the driver, stopping certain
tasks from being queued when the device is down, and adding a missing
mutex unlock.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge tag 'linux-can-fixes-for-5.17-20220225' of git://git.kernel.org/pub/scm/linux...
Jakub Kicinski [Fri, 25 Feb 2022 22:53:58 +0000 (14:53 -0800)]
Merge tag 'linux-can-fixes-for-5.17-20220225' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2022-02-25

The first 2 patches are by Vincent Mailhol and fix the error handling
of the ndo_open callbacks of the etas_es58x and the gs_usb CAN USB
drivers.

The last patch is by Lad Prabhakar and fixes a small race condition in
the rcar_canfd's rcar_canfd_channel_probe() function.

* tag 'linux-can-fixes-for-5.17-20220225' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
  can: rcar_canfd: rcar_canfd_channel_probe(): register the CAN device when fully ready
  can: gs_usb: change active_channels's type from atomic_t to u8
  can: etas_es58x: change opened_channel_cnt's type from atomic_t to u8
====================

Link: https://lore.kernel.org/r/20220225165622.3231809-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoiavf: Fix __IAVF_RESETTING state usage
Slawomir Laba [Wed, 23 Feb 2022 12:38:55 +0000 (13:38 +0100)]
iavf: Fix __IAVF_RESETTING state usage

The setup of __IAVF_RESETTING state in watchdog task had no
effect and could lead to slow resets in the driver as
the task for __IAVF_RESETTING state only requeues watchdog.
Till now the __IAVF_RESETTING was interpreted by reset task
as running state which could lead to errors with allocating
and resources disposal.

Make watchdog_task queue the reset task when it's necessary.
Do not update the state to __IAVF_RESETTING so the reset task
knows exactly what is the current state of the adapter.

Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoiavf: Fix missing check for running netdev
Slawomir Laba [Wed, 23 Feb 2022 12:38:43 +0000 (13:38 +0100)]
iavf: Fix missing check for running netdev

The driver was queueing reset_task regardless of the netdev
state.

Do not queue the reset task in iavf_change_mtu if netdev
is not running.

Fixes: fdd4044ffdc8 ("iavf: Remove timer for work triggering, use delaying work instead")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoiavf: Fix deadlock in iavf_reset_task
Slawomir Laba [Wed, 23 Feb 2022 12:38:31 +0000 (13:38 +0100)]
iavf: Fix deadlock in iavf_reset_task

There exists a missing mutex_unlock call on crit_lock in
iavf_reset_task call path.

Unlock the crit_lock before returning from reset task.

Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoiavf: Fix race in init state
Slawomir Laba [Wed, 23 Feb 2022 12:38:01 +0000 (13:38 +0100)]
iavf: Fix race in init state

When iavf_init_version_check sends VIRTCHNL_OP_GET_VF_RESOURCES
message, the driver will wait for the response after requeueing
the watchdog task in iavf_init_get_resources call stack. The
logic is implemented this way that iavf_init_get_resources has
to be called in order to allocate adapter->vf_res. It is polling
for the AQ response in iavf_get_vf_config function. Expect a
call trace from kernel when adminq_task worker handles this
message first. adapter->vf_res will be NULL in
iavf_virtchnl_completion.

Make the watchdog task not queue the adminq_task if the init
process is not finished yet.

Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoiavf: Fix locking for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS
Slawomir Laba [Wed, 23 Feb 2022 12:37:50 +0000 (13:37 +0100)]
iavf: Fix locking for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS

iavf_virtchnl_completion is called under crit_lock but when
the code for VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS is called,
this lock is released in order to obtain rtnl_lock to avoid
ABBA deadlock with unregister_netdev.

Along with the new way iavf_remove behaves, there exist
many risks related to the lock release and attmepts to regrab
it. The driver faces crashes related to races between
unregister_netdev and netdev_update_features. Yet another
risk is that the driver could already obtain the crit_lock
in order to destroy it and iavf_virtchnl_completion could
crash or block forever.

Make iavf_virtchnl_completion never relock crit_lock in it's
call paths.

Extract rtnl_lock locking logic to the driver for
unregister_netdev in order to set the netdev_registered flag
inside the lock.

Introduce a new flag that will inform adminq_task to perform
the code from VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS right after
it finishes processing messages. Guard this code with remove
flags so it's never called when the driver is in remove state.

Fixes: 5951a2b9812d ("iavf: Fix VLAN feature flags after VFR")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoiavf: Fix init state closure on remove
Slawomir Laba [Wed, 23 Feb 2022 12:37:10 +0000 (13:37 +0100)]
iavf: Fix init state closure on remove

When init states of the adapter work, the errors like lack
of communication with the PF might hop in. If such events
occur the driver restores previous states in order to retry
initialization in a proper way. When remove task kicks in,
this situation could lead to races with unregistering the
netdevice as well as resources cleanup. With the commit
introducing the waiting in remove for init to complete,
this problem turns into an endless waiting if init never
recovers from errors.

Introduce __IAVF_IN_REMOVE_TASK bit to indicate that the
remove thread has started.

Make __IAVF_COMM_FAILED adapter state respect the
__IAVF_IN_REMOVE_TASK bit and set the __IAVF_INIT_FAILED
state and return without any action instead of trying to
recover.

Make __IAVF_INIT_FAILED adapter state respect the
__IAVF_IN_REMOVE_TASK bit and return without any further
actions.

Make the loop in the remove handler break when adapter has
__IAVF_INIT_FAILED state set.

Fixes: 898ef1cb1cb2 ("iavf: Combine init and watchdog state machines")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoiavf: Add waiting so the port is initialized in remove
Slawomir Laba [Wed, 23 Feb 2022 12:36:56 +0000 (13:36 +0100)]
iavf: Add waiting so the port is initialized in remove

There exist races when port is being configured and remove is
triggered.

unregister_netdev is not and can't be called under crit_lock
mutex since it is calling ndo_stop -> iavf_close which requires
this lock. Depending on init state the netdev could be still
unregistered so unregister_netdev never cleans up, when shortly
after that the device could become registered.

Make iavf_remove wait until port finishes initialization.
All critical state changes are atomic (under crit_lock).
Crashes that come from iavf_reset_interrupt_capability and
iavf_free_traffic_irqs should now be solved in a graceful
manner.

Fixes: 605ca7c5c6707 ("iavf: Fix kernel BUG in free_msi_irqs")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agoiavf: Rework mutexes for better synchronisation
Slawomir Laba [Wed, 23 Feb 2022 12:35:49 +0000 (13:35 +0100)]
iavf: Rework mutexes for better synchronisation

The driver used to crash in multiple spots when put to stress testing
of the init, reset and remove paths.

The user would experience call traces or hangs when creating,
resetting, removing VFs. Depending on the machines, the call traces
are happening in random spots, like reset restoring resources racing
with driver remove.

Make adapter->crit_lock mutex a mandatory lock for guarding the
operations performed on all workqueues and functions dealing with
resource allocation and disposal.

Make __IAVF_REMOVE a final state of the driver respected by
workqueues that shall not requeue, when they fail to obtain the
crit_lock.

Make the IRQ handler not to queue the new work for adminq_task
when the __IAVF_REMOVE state is set.

Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections")
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Phani Burra <phani.r.burra@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 years agonet: stmmac: fix return value of __setup handler
Randy Dunlap [Thu, 24 Feb 2022 03:35:36 +0000 (19:35 -0800)]
net: stmmac: fix return value of __setup handler

__setup() handlers should return 1 on success, i.e., the parameter
has been handled. A return of 0 causes the "option=value" string to be
added to init's environment strings, polluting it.

Fixes: 47dd7a540b8a ("net: add support for STMicroelectronics Ethernet controllers.")
Fixes: f3240e2811f0 ("stmmac: remove warning when compile as built-in (V2)")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Jose Abreu <joabreu@synopsys.com>
Link: https://lore.kernel.org/r/20220224033536.25056-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: sxgbe: fix return value of __setup handler
Randy Dunlap [Thu, 24 Feb 2022 03:35:28 +0000 (19:35 -0800)]
net: sxgbe: fix return value of __setup handler

__setup() handlers should return 1 on success, i.e., the parameter
has been handled. A return of 0 causes the "option=value" string to be
added to init's environment strings, polluting it.

Fixes: acc18c147b22 ("net: sxgbe: add EEE(Energy Efficient Ethernet) for Samsung sxgbe")
Fixes: 1edb9ca69e8a ("net: sxgbe: add basic framework for Samsung 10Gb ethernet driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Cc: Siva Reddy <siva.kallam@samsung.com>
Cc: Girish K S <ks.giri@samsung.com>
Cc: Byungho An <bh74.an@samsung.com>
Link: https://lore.kernel.org/r/20220224033528.24640-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agocan: rcar_canfd: rcar_canfd_channel_probe(): register the CAN device when fully ready
Lad Prabhakar [Mon, 21 Feb 2022 22:59:35 +0000 (22:59 +0000)]
can: rcar_canfd: rcar_canfd_channel_probe(): register the CAN device when fully ready

Register the CAN device only when all the necessary initialization is
completed. This patch makes sure all the data structures and locks are
initialized before registering the CAN device.

Link: https://lore.kernel.org/all/20220221225935.12300-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Reported-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Pavel Machek <pavel@denx.de>
Reviewed-by: Ulrich Hecht <uli+renesas@fpond.eu>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agonet: sparx5: Fix add vlan when invalid operation
Casper Andersson [Fri, 25 Feb 2022 10:15:16 +0000 (11:15 +0100)]
net: sparx5: Fix add vlan when invalid operation

Check if operation is valid before changing any
settings in hardware. Otherwise it results in
changes being made despite it not being a valid
operation.

Fixes: 78eab33bb68b ("net: sparx5: add vlan support")
Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: chelsio: cxgb3: check the return value of pci_find_capability()
Jia-Ju Bai [Fri, 25 Feb 2022 12:37:27 +0000 (04:37 -0800)]
net: chelsio: cxgb3: check the return value of pci_find_capability()

The function pci_find_capability() in t3_prep_adapter() can fail, so its
return value should be checked.

Fixes: 4d22de3e6cc4 ("Add support for the latest 1G/10G Chelsio adapter, T3")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'ibmvnic-fixes'
David S. Miller [Fri, 25 Feb 2022 10:57:47 +0000 (10:57 +0000)]
Merge branch 'ibmvnic-fixes'

Sukadev Bhattiprolu says:

====================
ibmvnic: Fix a race in ibmvnic_probe()

If we get a transport (reset) event right after a successful CRQ_INIT
during ibmvnic_probe() but before we set the adapter state to VNIC_PROBED,
we will throw away the reset assuming that the adapter is still in the
probing state. But since the adapter has completed the CRQ_INIT any
subsequent CRQs the we send will be ignored by the vnicserver until
we release/init the CRQ again. This can leave the adapter unconfigured.

While here fix a couple of other bugs that were observed (Patches 1,2,4).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoibmvnic: Allow queueing resets during probe
Sukadev Bhattiprolu [Fri, 25 Feb 2022 06:23:58 +0000 (22:23 -0800)]
ibmvnic: Allow queueing resets during probe

We currently don't allow queuing resets when adapter is in VNIC_PROBING
state - instead we throw away the reset and return EBUSY. The reasoning
is probably that during ibmvnic_probe() the ibmvnic_adapter itself is
being initialized so performing a reset during this time can lead us to
accessing fields in the ibmvnic_adapter that are not fully initialized.
A review of the code shows that all the adapter state neede to process a
reset is initialized before registering the CRQ so that should no longer
be a concern.

Further the expectation is that if we do get a reset (transport event)
during probe, the do..while() loop in ibmvnic_probe() will handle this
by reinitializing the CRQ.

While that is true to some extent, it is possible that the reset might
occur _after_ the CRQ is registered and CRQ_INIT message was exchanged
but _before_ the adapter state is set to VNIC_PROBED. As mentioned above,
such a reset will be thrown away. While the client assumes that the
adapter is functional, the vnic server will wait for the client to reinit
the adapter. This disconnect between the two leaves the adapter down
needing manual intervention.

Because ibmvnic_probe() has other work to do after initializing the CRQ
(such as registering the netdev at a minimum) and because the reset event
can occur at any instant after the CRQ is initialized, there will always
be a window between initializing the CRQ and considering the adapter
ready for resets (ie state == PROBED).

So rather than discarding resets during this window, allow queueing them
- but only process them after the adapter is fully initialized.

To do this, introduce a new completion state ->probe_done and have the
reset worker thread wait on this before processing resets.

This change brings up two new situations in or just after ibmvnic_probe().
First after one or more resets were queued, we encounter an error and
decide to retry the initialization.  At that point the queued resets are
no longer relevant since we could be talking to a new vnic server. So we
must purge/flush the queued resets before restarting the initialization.
As a side note, since we are still in the probing stage and we have not
registered the netdev, it will not be CHANGE_PARAM reset.

Second this change opens up a potential race between the worker thread
in __ibmvnic_reset(), the tasklet and the ibmvnic_open() due to the
following sequence of events:

1. Register CRQ
2. Get transport event before CRQ_INIT completes.
3. Tasklet schedules reset:
a) add rwi to list
b) schedule_work() to start worker thread which runs
   and waits for ->probe_done.
4. ibmvnic_probe() decides to retry, purges rwi_list
5. Re-register crq and this time rest of probe succeeds - register
   netdev and complete(->probe_done).
6. Worker thread resumes in __ibmvnic_reset() from 3b.
7. Worker thread sets ->resetting bit
8. ibmvnic_open() comes in, notices ->resetting bit, sets state
   to IBMVNIC_OPEN and returns early expecting worker thread to
   finish the open.
9. Worker thread finds rwi_list empty and returns without
   opening the interface.

If this happens, the ->ndo_open() call is effectively lost and the
interface remains down. To address this, ensure that ->rwi_list is
not empty before setting the ->resetting  bit. See also comments in
__ibmvnic_reset().

Fixes: 6a2fb0e99f9c ("ibmvnic: driver initialization for kdump/kexec")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoibmvnic: clear fop when retrying probe
Sukadev Bhattiprolu [Fri, 25 Feb 2022 06:23:57 +0000 (22:23 -0800)]
ibmvnic: clear fop when retrying probe

Clear ->failover_pending flag that may have been set in the previous
pass of registering CRQ. If we don't clear, a subsequent ibmvnic_open()
call would be misled into thinking a failover is pending and assuming
that the reset worker thread would open the adapter. If this pass of
registering the CRQ succeeds (i.e there is no transport event), there
wouldn't be a reset worker thread.

This would leave the adapter unconfigured and require manual intervention
to bring it up during boot.

Fixes: 5a18e1e0c193 ("ibmvnic: Fix failover case for non-redundant configuration")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoibmvnic: init init_done_rc earlier
Sukadev Bhattiprolu [Fri, 25 Feb 2022 06:23:56 +0000 (22:23 -0800)]
ibmvnic: init init_done_rc earlier

We currently initialize the ->init_done completion/return code fields
before issuing a CRQ_INIT command. But if we get a transport event soon
after registering the CRQ the taskslet may already have recorded the
completion and error code. If we initialize here, we might overwrite/
lose that and end up issuing the CRQ_INIT only to timeout later.

If that timeout happens during probe, we will leave the adapter in the
DOWN state rather than retrying to register/init the CRQ.

Initialize the completion before registering the CRQ so we don't lose
the notification.

Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoibmvnic: register netdev after init of adapter
Sukadev Bhattiprolu [Fri, 25 Feb 2022 06:23:55 +0000 (22:23 -0800)]
ibmvnic: register netdev after init of adapter

Finish initializing the adapter before registering netdev so state
is consistent.

Fixes: c26eba03e407 ("ibmvnic: Update reset infrastructure to support tunable parameters")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoibmvnic: complete init_done on transport events
Sukadev Bhattiprolu [Fri, 25 Feb 2022 06:23:54 +0000 (22:23 -0800)]
ibmvnic: complete init_done on transport events

If we get a transport event, set the error and mark the init as
complete so the attempt to send crq-init or login fail sooner
rather than wait for the timeout.

Fixes: bbd669a868bb ("ibmvnic: Fix completion structure initialization")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoibmvnic: define flush_reset_queue helper
Sukadev Bhattiprolu [Fri, 25 Feb 2022 06:23:53 +0000 (22:23 -0800)]
ibmvnic: define flush_reset_queue helper

Define and use a helper to flush the reset queue.

Fixes: 2770a7984db5 ("ibmvnic: Introduce hard reset recovery")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoibmvnic: initialize rc before completing wait
Sukadev Bhattiprolu [Fri, 25 Feb 2022 06:23:52 +0000 (22:23 -0800)]
ibmvnic: initialize rc before completing wait

We should initialize ->init_done_rc before calling complete(). Otherwise
the waiting thread may see ->init_done_rc as 0 before we have updated it
and may assume that the CRQ was successful.

Fixes: 6b278c0cb378 ("ibmvnic delay complete()")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoibmvnic: free reset-work-item when flushing
Sukadev Bhattiprolu [Fri, 25 Feb 2022 06:23:51 +0000 (22:23 -0800)]
ibmvnic: free reset-work-item when flushing

Fix a tiny memory leak when flushing the reset work queue.

Fixes: 2770a7984db5 ("ibmvnic: Introduce hard reset recovery")
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
David S. Miller [Fri, 25 Feb 2022 10:44:15 +0000 (10:44 +0000)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
1) Fix PMTU for IPv6 if the reported MTU minus the ESP overhead is
   smaller than 1280. From Jiri Bohac.

2) Fix xfrm interface ID and inter address family tunneling when
   migrating xfrm states. From Yan Yan.

3) Add missing xfrm intrerface ID initialization on xfrmi_changelink.
   From Antony Antony.

4) Enforce validity of xfrm offload input flags so that userspace can't
   send undefined flags to the offload driver.
   From Leon Romanovsky.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dcb: flush lingering app table entries for unregistered devices
Vladimir Oltean [Thu, 24 Feb 2022 16:01:54 +0000 (18:01 +0200)]
net: dcb: flush lingering app table entries for unregistered devices

If I'm not mistaken (and I don't think I am), the way in which the
dcbnl_ops work is that drivers call dcb_ieee_setapp() and this populates
the application table with dynamically allocated struct dcb_app_type
entries that are kept in the module-global dcb_app_list.

However, nobody keeps exact track of these entries, and although
dcb_ieee_delapp() is supposed to remove them, nobody does so when the
interface goes away (example: driver unbinds from device). So the
dcb_app_list will contain lingering entries with an ifindex that no
longer matches any device in dcb_app_lookup().

Reclaim the lost memory by listening for the NETDEV_UNREGISTER event and
flushing the app table entries of interfaces that are now gone.

In fact something like this used to be done as part of the initial
commit (blamed below), but it was done in dcbnl_exit() -> dcb_flushapp(),
essentially at module_exit time. That became dead code after commit
7a6b6f515f77 ("DCB: fix kconfig option") which essentially merged
"tristate config DCB" and "bool config DCBNL" into a single "bool config
DCB", so net/dcb/dcbnl.c could not be built as a module anymore.

Commit 36b9ad8084bd ("net/dcb: make dcbnl.c explicitly non-modular")
recognized this and deleted dcbnl_exit() and dcb_flushapp() altogether,
leaving us with the version we have today.

Since flushing application table entries can and should be done as soon
as the netdevice disappears, fundamentally the commit that is to blame
is the one that introduced the design of this API.

Fixes: 9ab933ab2cc8 ("dcbnl: add appliction tlv handlers")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/smc: fix connection leak
D. Wythe [Thu, 24 Feb 2022 15:26:19 +0000 (23:26 +0800)]
net/smc: fix connection leak

There's a potential leak issue under following execution sequence :

smc_release   smc_connect_work
if (sk->sk_state == SMC_INIT)
send_clc_confirim
tcp_abort();
...
sk.sk_state = SMC_ACTIVE
smc_close_active
switch(sk->sk_state) {
...
case SMC_ACTIVE:
smc_close_final()
// then wait peer closed

Unfortunately, tcp_abort() may discard CLC CONFIRM messages that are
still in the tcp send buffer, in which case our connection token cannot
be delivered to the server side, which means that we cannot get a
passive close message at all. Therefore, it is impossible for the to be
disconnected at all.

This patch tries a very simple way to avoid this issue, once the state
has changed to SMC_ACTIVE after tcp_abort(), we can actively abort the
smc connection, considering that the state is SMC_INIT before
tcp_abort(), abandoning the complete disconnection process should not
cause too much problem.

In fact, this problem may exist as long as the CLC CONFIRM message is
not received by the server. Whether a timer should be added after
smc_close_final() needs to be discussed in the future. But even so, this
patch provides a faster release for connection in above case, it should
also be valuable.

Fixes: 39f41f367b08 ("net/smc: common release code for non-accepted sockets")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: stmmac: only enable DMA interrupts when ready
Vincent Whitchurch [Thu, 24 Feb 2022 11:38:29 +0000 (12:38 +0100)]
net: stmmac: only enable DMA interrupts when ready

In this driver's ->ndo_open() callback, it enables DMA interrupts,
starts the DMA channels, then requests interrupts with request_irq(),
and then finally enables napi.

If RX DMA interrupts are received before napi is enabled, no processing
is done because napi_schedule_prep() will return false.  If the network
has a lot of broadcast/multicast traffic, then the RX ring could fill up
completely before napi is enabled.  When this happens, no further RX
interrupts will be delivered, and the driver will fail to receive any
packets.

Fix this by only enabling DMA interrupts after all other initialization
is complete.

Fixes: 523f11b5d4fd72efb ("net: stmmac: move hardware setup for stmmac_open to new function")
Reported-by: Lars Persson <larper@axis.com>
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoxen/netfront: destroy queues before real_num_tx_queues is zeroed
Marek Marczykowski-Górecki [Wed, 23 Feb 2022 21:19:54 +0000 (22:19 +0100)]
xen/netfront: destroy queues before real_num_tx_queues is zeroed

xennet_destroy_queues() relies on info->netdev->real_num_tx_queues to
delete queues. Since d7dac083414eb5bb99a6d2ed53dc2c1b405224e5
("net-sysfs: update the queue counts in the unregistration path"),
unregister_netdev() indirectly sets real_num_tx_queues to 0. Those two
facts together means, that xennet_destroy_queues() called from
xennet_remove() cannot do its job, because it's called after
unregister_netdev(). This results in kfree-ing queues that are still
linked in napi, which ultimately crashes:

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 0 P4D 0
    Oops: 0000 [#1] PREEMPT SMP PTI
    CPU: 1 PID: 52 Comm: xenwatch Tainted: G        W         5.16.10-1.32.fc32.qubes.x86_64+ #226
    RIP: 0010:free_netdev+0xa3/0x1a0
    Code: ff 48 89 df e8 2e e9 00 00 48 8b 43 50 48 8b 08 48 8d b8 a0 fe ff ff 48 8d a9 a0 fe ff ff 49 39 c4 75 26 eb 47 e8 ed c1 66 ff <48> 8b 85 60 01 00 00 48 8d 95 60 01 00 00 48 89 ef 48 2d 60 01 00
    RSP: 0000:ffffc90000bcfd00 EFLAGS: 00010286
    RAX: 0000000000000000 RBX: ffff88800edad000 RCX: 0000000000000000
    RDX: 0000000000000001 RSI: ffffc90000bcfc30 RDI: 00000000ffffffff
    RBP: fffffffffffffea0 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000001 R12: ffff88800edad050
    R13: ffff8880065f8f88 R14: 0000000000000000 R15: ffff8880066c6680
    FS:  0000000000000000(0000) GS:ffff8880f3300000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 00000000e998c006 CR4: 00000000003706e0
    Call Trace:
     <TASK>
     xennet_remove+0x13d/0x300 [xen_netfront]
     xenbus_dev_remove+0x6d/0xf0
     __device_release_driver+0x17a/0x240
     device_release_driver+0x24/0x30
     bus_remove_device+0xd8/0x140
     device_del+0x18b/0x410
     ? _raw_spin_unlock+0x16/0x30
     ? klist_iter_exit+0x14/0x20
     ? xenbus_dev_request_and_reply+0x80/0x80
     device_unregister+0x13/0x60
     xenbus_dev_changed+0x18e/0x1f0
     xenwatch_thread+0xc0/0x1a0
     ? do_wait_intr_irq+0xa0/0xa0
     kthread+0x16b/0x190
     ? set_kthread_struct+0x40/0x40
     ret_from_fork+0x22/0x30
     </TASK>

Fix this by calling xennet_destroy_queues() from xennet_uninit(),
when real_num_tx_queues is still available. This ensures that queues are
destroyed when real_num_tx_queues is set to 0, regardless of how
unregister_netdev() was called.

Originally reported at
https://github.com/QubesOS/qubes-issues/issues/7257

Fixes: d7dac083414eb5bb9 ("net-sysfs: update the queue counts in the unregistration path")
Cc: stable@vger.kernel.org
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agocan: gs_usb: change active_channels's type from atomic_t to u8
Vincent Mailhol [Mon, 14 Feb 2022 23:48:14 +0000 (08:48 +0900)]
can: gs_usb: change active_channels's type from atomic_t to u8

The driver uses an atomic_t variable: gs_usb:active_channels to keep
track of the number of opened channels in order to only allocate
memory for the URBs when this count changes from zero to one.

However, the driver does not decrement the counter when an error
occurs in gs_can_open(). This issue is fixed by changing the type from
atomic_t to u8 and by simplifying the logic accordingly.

It is safe to use an u8 here because the network stack big kernel lock
(a.k.a. rtnl_mutex) is being hold. For details, please refer to [1].

[1] https://lore.kernel.org/linux-can/CAMZ6Rq+sHpiw34ijPsmp7vbUpDtJwvVtdV7CvRZJsLixjAFfrg@mail.gmail.com/T/#t

Fixes: d08e973a77d1 ("can: gs_usb: Added support for the GS_USB CAN devices")
Link: https://lore.kernel.org/all/20220214234814.1321599-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agocan: etas_es58x: change opened_channel_cnt's type from atomic_t to u8
Vincent Mailhol [Sat, 12 Feb 2022 11:27:13 +0000 (20:27 +0900)]
can: etas_es58x: change opened_channel_cnt's type from atomic_t to u8

The driver uses an atomic_t variable: struct
es58x_device::opened_channel_cnt to keep track of the number of opened
channels in order to only allocate memory for the URBs when this count
changes from zero to one.

While the intent was to prevent race conditions, the choice of an
atomic_t turns out to be a bad idea for several reasons:

- implementation is incorrect and fails to decrement
  opened_channel_cnt when the URB allocation fails as reported in
  [1].

- even if opened_channel_cnt were to be correctly decremented,
  atomic_t is insufficient to cover edge cases: there can be a race
  condition in which 1/ a first process fails to allocate URBs
  memory 2/ a second process enters es58x_open() before the first
  process does its cleanup and decrements opened_channed_cnt. In
  which case, the second process would successfully return despite
  the URBs memory not being allocated.

- actually, any kind of locking mechanism was useless here because
  it is redundant with the network stack big kernel lock
  (a.k.a. rtnl_lock) which is being hold by all the callers of
  net_device_ops:ndo_open() and net_device_ops:ndo_close(). c.f. the
  ASSERST_RTNL() calls in __dev_open() [2] and __dev_close_many()
  [3].

The atmomic_t is thus replaced by a simple u8 type and the logic to
increment and decrement es58x_device:opened_channel_cnt is simplified
accordingly fixing the bug reported in [1]. We do not check again for
ASSERST_RTNL() as this is already done by the callers.

[1] https://lore.kernel.org/linux-can/20220201140351.GA2548@kili/T/#u
[2] https://elixir.bootlin.com/linux/v5.16/source/net/core/dev.c#L1463
[3] https://elixir.bootlin.com/linux/v5.16/source/net/core/dev.c#L1541

Fixes: 8537257874e9 ("can: etas_es58x: add core support for ETAS ES58X CAN USB interfaces")
Link: https://lore.kernel.org/all/20220212112713.577957-1-mailhol.vincent@wanadoo.fr
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
3 years agoMerge branch 'mptcp-fixes-for-5-17'
Jakub Kicinski [Fri, 25 Feb 2022 05:54:56 +0000 (21:54 -0800)]
Merge branch 'mptcp-fixes-for-5-17'

Mat Martineau says:

====================
mptcp: Fixes for 5.17

Patch 1 fixes an issue with the SIOCOUTQ ioctl in MPTCP sockets that
have performed a fallback to TCP.

Patch 2 is a selftest fix to correctly remove temp files.

Patch 3 fixes a shift-out-of-bounds issue found by syzkaller.
====================

Link: https://lore.kernel.org/r/20220225005259.318898-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomptcp: Correctly set DATA_FIN timeout when number of retransmits is large
Mat Martineau [Fri, 25 Feb 2022 00:52:59 +0000 (16:52 -0800)]
mptcp: Correctly set DATA_FIN timeout when number of retransmits is large

Syzkaller with UBSAN uncovered a scenario where a large number of
DATA_FIN retransmits caused a shift-out-of-bounds in the DATA_FIN
timeout calculation:

================================================================================
UBSAN: shift-out-of-bounds in net/mptcp/protocol.c:470:29
shift exponent 32 is too large for 32-bit type 'unsigned int'
CPU: 1 PID: 13059 Comm: kworker/1:0 Not tainted 5.17.0-rc2-00630-g5fbf21c90c60 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Workqueue: events mptcp_worker
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:151
 __ubsan_handle_shift_out_of_bounds.cold+0xb2/0x20e lib/ubsan.c:330
 mptcp_set_datafin_timeout net/mptcp/protocol.c:470 [inline]
 __mptcp_retrans.cold+0x72/0x77 net/mptcp/protocol.c:2445
 mptcp_worker+0x58a/0xa70 net/mptcp/protocol.c:2528
 process_one_work+0x9df/0x16d0 kernel/workqueue.c:2307
 worker_thread+0x95/0xe10 kernel/workqueue.c:2454
 kthread+0x2f4/0x3b0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>
================================================================================

This change limits the maximum timeout by limiting the size of the
shift, which keeps all intermediate values in-bounds.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/259
Fixes: 6477dd39e62c ("mptcp: Retransmit DATA_FIN")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoselftests: mptcp: do complete cleanup at exit
Paolo Abeni [Fri, 25 Feb 2022 00:52:58 +0000 (16:52 -0800)]
selftests: mptcp: do complete cleanup at exit

After commit 05be5e273c84 ("selftests: mptcp: add disconnect tests")
the mptcp selftests leave behind a couple of tmp files after
each run. run_tests_disconnect() misnames a few variables used to
track them. Address the issue setting the appropriate global variables

Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agomptcp: accurate SIOCOUTQ for fallback socket
Paolo Abeni [Fri, 25 Feb 2022 00:52:57 +0000 (16:52 -0800)]
mptcp: accurate SIOCOUTQ for fallback socket

The MPTCP SIOCOUTQ implementation is not very accurate in
case of fallback: it only measures the data in the MPTCP-level
write queue, but it does not take in account the subflow
write queue utilization. In case of fallback the first can be
empty, while the latter is not.

The above produces sporadic self-tests issues and can foul
legit user-space application.

Fix the issue additionally querying the subflow in case of fallback.

Fixes: 644807e3e462 ("mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/260
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge tag 'for-net-2022-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/bluet...
Jakub Kicinski [Fri, 25 Feb 2022 02:13:30 +0000 (18:13 -0800)]
Merge tag 'for-net-2022-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - Fix regression with RFCOMM
 - Fix regression with LE devices using Privacy (RPA)
 - Fix regression with LE devices not waiting proper timeout to
   establish connections
 - Fix race in smp

* tag 'for-net-2022-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: hci_sync: Fix not using conn_timeout
  Bluetooth: hci_sync: Fix hci_update_accept_list_sync
  Bluetooth: assign len after null check
  Bluetooth: Fix bt_skb_sendmmsg not allocating partial chunks
  Bluetooth: fix data races in smp_unregister(), smp_del_chan()
  Bluetooth: hci_core: Fix leaking sent_cmd skb
====================

Link: https://lore.kernel.org/r/20220224210838.197787-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge tag 'pci-v5.17-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaa...
Linus Torvalds [Thu, 24 Feb 2022 21:19:57 +0000 (13:19 -0800)]
Merge tag 'pci-v5.17-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull pci fixes from Bjorn Helgaas:

 - Fix a merge error that broke PCI device enumeration on mvebu
   platforms, including Turris Omnia (Armada 385) (Pali Rohár)

 - Avoid using ATS on all AMD Navi10 and Navi14 GPUs because some
   VBIOSes don't account for "harvested" (disabled) parts of the chip
   when initializing caches (Alex Deucher)

* tag 'pci-v5.17-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Mark all AMD Navi10 and Navi14 GPU ATS as broken
  PCI: mvebu: Fix device enumeration regression

3 years agoMerge tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 24 Feb 2022 20:45:32 +0000 (12:45 -0800)]
Merge tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf and netfilter.

  Current release - regressions:

   - bpf: fix crash due to out of bounds access into reg2btf_ids

   - mvpp2: always set port pcs ops, avoid null-deref

   - eth: marvell: fix driver load from initrd

   - eth: intel: revert "Fix reset bw limit when DCB enabled with 1 TC"

  Current release - new code bugs:

   - mptcp: fix race in overlapping signal events

  Previous releases - regressions:

   - xen-netback: revert hotplug-status changes causing devices to not
     be configured

   - dsa:
      - avoid call to __dev_set_promiscuity() while rtnl_mutex isn't
        held
      - fix panic when removing unoffloaded port from bridge

   - dsa: microchip: fix bridging with more than two member ports

  Previous releases - always broken:

   - bpf:
      - fix crash due to incorrect copy_map_value when both spin lock
        and timer are present in a single value
      - fix a bpf_timer initialization issue with clang
      - do not try bpf_msg_push_data with len 0
      - add schedule points in batch ops

   - nf_tables:
      - unregister flowtable hooks on netns exit
      - correct flow offload action array size
      - fix a couple of memory leaks

   - vsock: don't check owner in vhost_vsock_stop() while releasing

   - gso: do not skip outer ip header in case of ipip and net_failover

   - smc: use a mutex for locking "struct smc_pnettable"

   - openvswitch: fix setting ipv6 fields causing hw csum failure

   - mptcp: fix race in incoming ADD_ADDR option processing

   - sysfs: add check for netdevice being present to speed_show

   - sched: act_ct: fix flow table lookup after ct clear or switching
     zones

   - eth: intel: fixes for SR-IOV forwarding offloads

   - eth: broadcom: fixes for selftests and error recovery

   - eth: mellanox: flow steering and SR-IOV forwarding fixes

  Misc:

   - make __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor
     friends not report freed skbs as drops

   - force inlining of checksum functions in net/checksum.h"

* tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
  net: mv643xx_eth: process retval from of_get_mac_address
  ping: remove pr_err from ping_lookup
  Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC"
  openvswitch: Fix setting ipv6 fields causing hw csum failure
  ipv6: prevent a possible race condition with lifetimes
  net/smc: Use a mutex for locking "struct smc_pnettable"
  bnx2x: fix driver load from initrd
  Revert "xen-netback: Check for hotplug-status existence before watching"
  Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"
  net/mlx5e: Fix VF min/max rate parameters interchange mistake
  net/mlx5e: Add missing increment of count
  net/mlx5e: MPLSoUDP decap, fix check for unsupported matches
  net/mlx5e: Fix MPLSoUDP encap to use MPLS action information
  net/mlx5e: Add feature check for set fec counters
  net/mlx5e: TC, Skip redundant ct clear actions
  net/mlx5e: TC, Reject rules with forward and drop actions
  net/mlx5e: TC, Reject rules with drop and modify hdr action
  net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
  net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
  net/mlx5: Fix possible deadlock on rule deletion
  ...

3 years agoBluetooth: hci_sync: Fix not using conn_timeout
Luiz Augusto von Dentz [Thu, 17 Feb 2022 21:10:38 +0000 (13:10 -0800)]
Bluetooth: hci_sync: Fix not using conn_timeout

When using hci_le_create_conn_sync it shall wait for the conn_timeout
since the connection complete may take longer than just 2 seconds.

Also fix the masking of HCI_EV_LE_ENHANCED_CONN_COMPLETE and
HCI_EV_LE_CONN_COMPLETE so they are never both set so we can predict
which one the controller will use in case of HCI_OP_LE_CREATE_CONN.

Fixes: 6cd29ec6ae5e3 ("Bluetooth: hci_sync: Wait for proper events when connecting LE")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
3 years agoBluetooth: hci_sync: Fix hci_update_accept_list_sync
Luiz Augusto von Dentz [Thu, 24 Feb 2022 15:11:47 +0000 (07:11 -0800)]
Bluetooth: hci_sync: Fix hci_update_accept_list_sync

hci_update_accept_list_sync is returning the filter based on the error
but that gets overwritten by hci_le_set_addr_resolution_enable_sync
return instead of using the actual result of the likes of
hci_le_add_accept_list_sync which was intended.

Fixes: ad383c2c65a5b ("Bluetooth: hci_sync: Enable advertising when LL privacy is enabled")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
3 years agoBluetooth: assign len after null check
Wang Qing [Tue, 15 Feb 2022 02:01:56 +0000 (18:01 -0800)]
Bluetooth: assign len after null check

len should be assigned after a null check

Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
3 years agoBluetooth: Fix bt_skb_sendmmsg not allocating partial chunks
Luiz Augusto von Dentz [Tue, 15 Feb 2022 01:59:38 +0000 (17:59 -0800)]
Bluetooth: Fix bt_skb_sendmmsg not allocating partial chunks

Since bt_skb_sendmmsg can be used with the likes of SOCK_STREAM it
shall return the partial chunks it could allocate instead of freeing
everything as otherwise it can cause problems like bellow.

Fixes: 81be03e026dc ("Bluetooth: RFCOMM: Replace use of memcpy_from_msg with bt_skb_sendmmsg")
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Link: https://lore.kernel.org/r/d7206e12-1b99-c3be-84f4-df22af427ef5@molgen.mpg.de
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215594
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> (Nokia N9 (MeeGo/Harmattan)
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
3 years agoBluetooth: fix data races in smp_unregister(), smp_del_chan()
Lin Ma [Wed, 16 Feb 2022 04:37:14 +0000 (12:37 +0800)]
Bluetooth: fix data races in smp_unregister(), smp_del_chan()

Previous commit e04480920d1e ("Bluetooth: defer cleanup of resources
in hci_unregister_dev()") defers all destructive actions to
hci_release_dev() to prevent cocurrent problems like NPD, UAF.

However, there are still some exceptions that are ignored.

The smp_unregister() in hci_dev_close_sync() (previously in
hci_dev_do_close) will release resources like the sensitive channel
and the smp_dev objects. Consider the situations the device is detaching
or power down while the kernel is still operating on it, the following
data race could take place.

thread-A  hci_dev_close_sync  | thread-B  read_local_oob_ext_data
                              |
hci_dev_unlock()              |
...                           | hci_dev_lock()
if (hdev->smp_data)           |
  chan = hdev->smp_data       |
                              | chan = hdev->smp_data (3)
                              |
  hdev->smp_data = NULL (1)   | if (!chan || !chan->data) (4)
  ...                         |
  smp = chan->data            | smp = chan->data
  if (smp)                    |
    chan->data = NULL (2)     |
    ...                       |
    kfree_sensitive(smp)      |
                              | // dereference smp trigger UFA

That is, the objects hdev->smp_data and chan->data both suffer from the
data races. In a preempt-enable kernel, the above schedule (when (3) is
before (1) and (4) is before (2)) leads to UAF bugs. It can be
reproduced in the latest kernel and below is part of the report:

[   49.097146] ================================================================
[   49.097611] BUG: KASAN: use-after-free in smp_generate_oob+0x2dd/0x570
[   49.097611] Read of size 8 at addr ffff888006528360 by task generate_oob/155
[   49.097611]
[   49.097611] Call Trace:
[   49.097611]  <TASK>
[   49.097611]  dump_stack_lvl+0x34/0x44
[   49.097611]  print_address_description.constprop.0+0x1f/0x150
[   49.097611]  ? smp_generate_oob+0x2dd/0x570
[   49.097611]  ? smp_generate_oob+0x2dd/0x570
[   49.097611]  kasan_report.cold+0x7f/0x11b
[   49.097611]  ? smp_generate_oob+0x2dd/0x570
[   49.097611]  smp_generate_oob+0x2dd/0x570
[   49.097611]  read_local_oob_ext_data+0x689/0xc30
[   49.097611]  ? hci_event_packet+0xc80/0xc80
[   49.097611]  ? sysvec_apic_timer_interrupt+0x9b/0xc0
[   49.097611]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[   49.097611]  ? mgmt_init_hdev+0x1c/0x240
[   49.097611]  ? mgmt_init_hdev+0x28/0x240
[   49.097611]  hci_sock_sendmsg+0x1880/0x1e70
[   49.097611]  ? create_monitor_event+0x890/0x890
[   49.097611]  ? create_monitor_event+0x890/0x890
[   49.097611]  sock_sendmsg+0xdf/0x110
[   49.097611]  __sys_sendto+0x19e/0x270
[   49.097611]  ? __ia32_sys_getpeername+0xa0/0xa0
[   49.097611]  ? kernel_fpu_begin_mask+0x1c0/0x1c0
[   49.097611]  __x64_sys_sendto+0xd8/0x1b0
[   49.097611]  ? syscall_exit_to_user_mode+0x1d/0x40
[   49.097611]  do_syscall_64+0x3b/0x90
[   49.097611]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   49.097611] RIP: 0033:0x7f5a59f51f64
...
[   49.097611] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5a59f51f64
[   49.097611] RDX: 0000000000000007 RSI: 00007f5a59d6ac70 RDI: 0000000000000006
[   49.097611] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[   49.097611] R10: 0000000000000040 R11: 0000000000000246 R12: 00007ffec26916ee
[   49.097611] R13: 00007ffec26916ef R14: 00007f5a59d6afc0 R15: 00007f5a59d6b700

To solve these data races, this patch places the smp_unregister()
function in the protected area by the hci_dev_lock(). That is, the
smp_unregister() function can not be concurrently executed when
operating functions (most of them are mgmt operations in mgmt.c) hold
the device lock.

This patch is tested with kernel LOCK DEBUGGING enabled. The price from
the extended holding time of the device lock is supposed to be low as the
smp_unregister() function is fairly short and efficient.

Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
3 years agoBluetooth: hci_core: Fix leaking sent_cmd skb
Luiz Augusto von Dentz [Fri, 4 Feb 2022 21:12:35 +0000 (13:12 -0800)]
Bluetooth: hci_core: Fix leaking sent_cmd skb

sent_cmd memory is not freed before freeing hci_dev causing it to leak
it contents.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
3 years agoMerge tag 'block-5.17-2022-02-24' of git://git.kernel.dk/linux-block
Linus Torvalds [Thu, 24 Feb 2022 19:15:10 +0000 (11:15 -0800)]
Merge tag 'block-5.17-2022-02-24' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe pull request:
    - send H2CData PDUs based on MAXH2CDATA (Varun Prakash)
    - fix passthrough to namespaces with unsupported features (Christoph
      Hellwig)

 - Clear iocb->private at poll completion (Stefano)

* tag 'block-5.17-2022-02-24' of git://git.kernel.dk/linux-block:
  nvme-tcp: send H2CData PDUs based on MAXH2CDATA
  nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info
  nvme: don't return an error from nvme_configure_metadata
  block: clear iocb->private in blkdev_bio_end_io_async()

3 years agoMerge tag 'io_uring-5.17-2022-02-23' of git://git.kernel.dk/linux-block
Linus Torvalds [Thu, 24 Feb 2022 19:08:15 +0000 (11:08 -0800)]
Merge tag 'io_uring-5.17-2022-02-23' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:

 - Add a conditional schedule point in io_add_buffers() (Eric)

 - Fix for a quiesce speedup merged in this release (Dylan)

 - Don't convert to jiffies for event timeout waiting, it's way too
   coarse when we accept a timespec as input (me)

* tag 'io_uring-5.17-2022-02-23' of git://git.kernel.dk/linux-block:
  io_uring: disallow modification of rsrc_data during quiesce
  io_uring: don't convert to jiffies for waiting on timeouts
  io_uring: add a schedule point in io_add_buffers()

3 years agoMerge tag 'platform-drivers-x86-v5.17-4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 24 Feb 2022 18:42:20 +0000 (10:42 -0800)]
Merge tag 'platform-drivers-x86-v5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull more x86 platform driver fixes from Hans de Goede:
 "Two more fixes:

   - Fix suspend/resume regression on AMD Cezanne APUs in >= 5.16

   - Fix Microsoft Surface 3 battery readings"

* tag 'platform-drivers-x86-v5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  surface: surface3_power: Fix battery readings on batteries without a serial number
  platform/x86: amd-pmc: Set QOS during suspend on CZN w/ timer wakeup

3 years agonet: mv643xx_eth: process retval from of_get_mac_address
Mauri Sandberg [Wed, 23 Feb 2022 14:23:37 +0000 (16:23 +0200)]
net: mv643xx_eth: process retval from of_get_mac_address

Obtaining a MAC address may be deferred in cases when the MAC is stored
in an NVMEM block, for example, and it may not be ready upon the first
retrieval attempt and return EPROBE_DEFER.

It is also possible that a port that does not rely on NVMEM has been
already created when getting the defer request. Thus, also the resources
allocated previously must be freed when doing a roll-back.

Fixes: 76723bca2802 ("net: mv643xx_eth: add DT parsing support")
Signed-off-by: Mauri Sandberg <maukka@ext.kapsi.fi>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220223142337.41757-1-maukka@ext.kapsi.fi
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoping: remove pr_err from ping_lookup
Xin Long [Thu, 24 Feb 2022 03:41:08 +0000 (22:41 -0500)]
ping: remove pr_err from ping_lookup

As Jakub noticed, prints should be avoided on the datapath.
Also, as packets would never come to the else branch in
ping_lookup(), remove pr_err() from ping_lookup().

Fixes: 35a79e64de29 ("ping: fix the dif and sdif check in ping_lookup")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/1ef3f2fcd31bd681a193b1fcf235eee1603819bd.1645674068.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoRevert "i40e: Fix reset bw limit when DCB enabled with 1 TC"
Mateusz Palczewski [Wed, 23 Feb 2022 17:53:47 +0000 (09:53 -0800)]
Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC"

Revert of a patch that instead of fixing a AQ error when trying
to reset BW limit introduced several regressions related to
creation and managing TC. Currently there are errors when creating
a TC on both PF and VF.

Error log:
[17428.783095] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14
[17428.783107] i40e 0000:3b:00.1: Failed configuring TC map 0 for VSI 391
[17428.783254] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14
[17428.783259] i40e 0000:3b:00.1: Unable to  configure TC map 0 for VSI 391

This reverts commit 3d2504663c41104b4359a15f35670cfa82de1bbf.

Fixes: 3d2504663c41 (i40e: Fix reset bw limit when DCB enabled with 1 TC)
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220223175347.1690692-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoopenvswitch: Fix setting ipv6 fields causing hw csum failure
Paul Blakey [Wed, 23 Feb 2022 16:34:16 +0000 (18:34 +0200)]
openvswitch: Fix setting ipv6 fields causing hw csum failure

Ipv6 ttl, label and tos fields are modified without first
pulling/pushing the ipv6 header, which would have updated
the hw csum (if available). This might cause csum validation
when sending the packet to the stack, as can be seen in
the trace below.

Fix this by updating skb->csum if available.

Trace resulted by ipv6 ttl dec and then sending packet
to conntrack [actions: set(ipv6(hlimit=63)),ct(zone=99)]:
[295241.900063] s_pf0vf2: hw csum failure
[295241.923191] Call Trace:
[295241.925728]  <IRQ>
[295241.927836]  dump_stack+0x5c/0x80
[295241.931240]  __skb_checksum_complete+0xac/0xc0
[295241.935778]  nf_conntrack_tcp_packet+0x398/0xba0 [nf_conntrack]
[295241.953030]  nf_conntrack_in+0x498/0x5e0 [nf_conntrack]
[295241.958344]  __ovs_ct_lookup+0xac/0x860 [openvswitch]
[295241.968532]  ovs_ct_execute+0x4a7/0x7c0 [openvswitch]
[295241.979167]  do_execute_actions+0x54a/0xaa0 [openvswitch]
[295242.001482]  ovs_execute_actions+0x48/0x100 [openvswitch]
[295242.006966]  ovs_dp_process_packet+0x96/0x1d0 [openvswitch]
[295242.012626]  ovs_vport_receive+0x6c/0xc0 [openvswitch]
[295242.028763]  netdev_frame_hook+0xc0/0x180 [openvswitch]
[295242.034074]  __netif_receive_skb_core+0x2ca/0xcb0
[295242.047498]  netif_receive_skb_internal+0x3e/0xc0
[295242.052291]  napi_gro_receive+0xba/0xe0
[295242.056231]  mlx5e_handle_rx_cqe_mpwrq_rep+0x12b/0x250 [mlx5_core]
[295242.062513]  mlx5e_poll_rx_cq+0xa0f/0xa30 [mlx5_core]
[295242.067669]  mlx5e_napi_poll+0xe1/0x6b0 [mlx5_core]
[295242.077958]  net_rx_action+0x149/0x3b0
[295242.086762]  __do_softirq+0xd7/0x2d6
[295242.090427]  irq_exit+0xf7/0x100
[295242.093748]  do_IRQ+0x7f/0xd0
[295242.096806]  common_interrupt+0xf/0xf
[295242.100559]  </IRQ>
[295242.102750] RIP: 0033:0x7f9022e88cbd
[295242.125246] RSP: 002b:00007f9022282b20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
[295242.132900] RAX: 0000000000000005 RBX: 0000000000000010 RCX: 0000000000000000
[295242.140120] RDX: 00007f9022282ba8 RSI: 00007f9022282a30 RDI: 00007f9014005c30
[295242.147337] RBP: 00007f9014014d60 R08: 0000000000000020 R09: 00007f90254a8340
[295242.154557] R10: 00007f9022282a28 R11: 0000000000000246 R12: 0000000000000000
[295242.161775] R13: 00007f902308c000 R14: 000000000000002b R15: 00007f9022b71f40

Fixes: 3fdbd1ce11e5 ("openvswitch: add ipv6 'set' action")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Link: https://lore.kernel.org/r/20220223163416.24096-1-paulb@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoipv6: prevent a possible race condition with lifetimes
Niels Dossche [Wed, 23 Feb 2022 13:19:56 +0000 (14:19 +0100)]
ipv6: prevent a possible race condition with lifetimes

valid_lft, prefered_lft and tstamp are always accessed under the lock
"lock" in other places. Reading these without taking the lock may result
in inconsistencies regarding the calculation of the valid and preferred
variables since decisions are taken on these fields for those variables.

Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Niels Dossche <niels.dossche@ugent.be>
Link: https://lore.kernel.org/r/20220223131954.6570-1-niels.dossche@ugent.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet/smc: Use a mutex for locking "struct smc_pnettable"
Fabio M. De Francesco [Wed, 23 Feb 2022 10:02:52 +0000 (11:02 +0100)]
net/smc: Use a mutex for locking "struct smc_pnettable"

smc_pnetid_by_table_ib() uses read_lock() and then it calls smc_pnet_apply_ib()
which, in turn, calls mutex_lock(&smc_ib_devices.mutex).

read_lock() disables preemption. Therefore, the code acquires a mutex while in
atomic context and it leads to a SAC bug.

Fix this bug by replacing the rwlock with a mutex.

Reported-and-tested-by: syzbot+4f322a6d84e991c38775@syzkaller.appspotmail.com
Fixes: 64e28b52c7a6 ("net/smc: add pnet table namespace support")
Confirmed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20220223100252.22562-1-fmdefrancesco@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agobnx2x: fix driver load from initrd
Manish Chopra [Wed, 23 Feb 2022 08:57:20 +0000 (00:57 -0800)]
bnx2x: fix driver load from initrd

Commit b7a49f73059f ("bnx2x: Utilize firmware 7.13.21.0") added
new firmware support in the driver with maintaining older firmware
compatibility. However, older firmware was not added in MODULE_FIRMWARE()
which caused missing firmware files in initrd image leading to driver load
failure from initrd. This patch adds MODULE_FIRMWARE() for older firmware
version to have firmware files included in initrd.

Fixes: b7a49f73059f ("bnx2x: Utilize firmware 7.13.21.0")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215627
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Link: https://lore.kernel.org/r/20220223085720.12021-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoRevert "xen-netback: Check for hotplug-status existence before watching"
Marek Marczykowski-Górecki [Tue, 22 Feb 2022 00:18:17 +0000 (01:18 +0100)]
Revert "xen-netback: Check for hotplug-status existence before watching"

This reverts commit 2afeec08ab5c86ae21952151f726bfe184f6b23d.

The reasoning in the commit was wrong - the code expected to setup the
watch even if 'hotplug-status' didn't exist. In fact, it relied on the
watch being fired the first time - to check if maybe 'hotplug-status' is
already set to 'connected'. Not registering a watch for non-existing
path (which is the case if hotplug script hasn't been executed yet),
made the backend not waiting for the hotplug script to execute. This in
turns, made the netfront think the interface is fully operational, while
in fact it was not (the vif interface on xen-netback side might not be
configured yet).

This was a workaround for 'hotplug-status' erroneously being removed.
But since that is reverted now, the workaround is not necessary either.

More discussion at
https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#u

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Michael Brown <mbrown@fensystems.co.uk>
Link: https://lore.kernel.org/r/20220222001817.2264967-2-marmarek@invisiblethingslab.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoRevert "xen-netback: remove 'hotplug-status' once it has served its purpose"
Marek Marczykowski-Górecki [Tue, 22 Feb 2022 00:18:16 +0000 (01:18 +0100)]
Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"

This reverts commit 1f2565780e9b7218cf92c7630130e82dcc0fe9c2.

The 'hotplug-status' node should not be removed as long as the vif
device remains configured. Otherwise the xen-netback would wait for
re-running the network script even if it was already called (in case of
the frontent re-connecting). But also, it _should_ be removed when the
vif device is destroyed (for example when unbinding the driver) -
otherwise hotplug script would not configure the device whenever it
re-appear.

Moving removal of the 'hotplug-status' node was a workaround for nothing
calling network script after xen-netback module is reloaded. But when
vif interface is re-created (on xen-netback unbind/bind for example),
the script should be called, regardless of who does that - currently
this case is not handled by the toolstack, and requires manual
script call. Keeping hotplug-status=connected to skip the call is wrong
and leads to not configured interface.

More discussion at
https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#u

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20220222001817.2264967-1-marmarek@invisiblethingslab.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge tag 'nvme-5.17-2022-02-24' of git://git.infradead.org/nvme into block-5.17
Jens Axboe [Thu, 24 Feb 2022 14:02:15 +0000 (07:02 -0700)]
Merge tag 'nvme-5.17-2022-02-24' of git://git.infradead.org/nvme into block-5.17

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 5.17

 - send H2CData PDUs based on MAXH2CDATA (Varun Prakash)
 - fix passthrough to namespaces with unsupported features (me)"

* tag 'nvme-5.17-2022-02-24' of git://git.infradead.org/nvme:
  nvme-tcp: send H2CData PDUs based on MAXH2CDATA
  nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info
  nvme: don't return an error from nvme_configure_metadata

3 years agosurface: surface3_power: Fix battery readings on batteries without a serial number
Hans de Goede [Thu, 24 Feb 2022 10:18:48 +0000 (11:18 +0100)]
surface: surface3_power: Fix battery readings on batteries without a serial number

The battery on the 2nd hand Surface 3 which I recently bought appears to
not have a serial number programmed in. This results in any I2C reads from
the registers containing the serial number failing with an I2C NACK.

This was causing mshw0011_bix() to fail causing the battery readings to
not work at all.

Ignore EREMOTEIO (I2C NACK) errors when retrieving the serial number and
continue with an empty serial number to fix this.

Fixes: b1f81b496b0d ("platform/x86: surface3_power: MSHW0011 rev-eng implementation")
BugLink: https://github.com/linux-surface/linux-surface/issues/608
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220224101848.7219-1-hdegoede@redhat.com
3 years agoplatform/x86: amd-pmc: Set QOS during suspend on CZN w/ timer wakeup
Mario Limonciello [Wed, 23 Feb 2022 17:52:37 +0000 (11:52 -0600)]
platform/x86: amd-pmc: Set QOS during suspend on CZN w/ timer wakeup

commit 59348401ebed ("platform/x86: amd-pmc: Add special handling for
timer based S0i3 wakeup") adds support for using another platform timer
in lieu of the RTC which doesn't work properly on some systems. This path
was validated and worked well before submission. During the 5.16-rc1 merge
window other patches were merged that caused this to stop working properly.

When this feature was used with 5.16-rc1 or later some OEM laptops with the
matching firmware requirements from that commit would shutdown instead of
program a timer based wakeup.

This was bisected to commit 8d89835b0467 ("PM: suspend: Do not pause
cpuidle in the suspend-to-idle path").  This wasn't supposed to cause any
negative impacts and also tested well on both Intel and ARM platforms.
However this changed the semantics of when CPUs are allowed to be in the
deepest state. For the AMD systems in question it appears this causes a
firmware crash for timer based wakeup.

It's hypothesized to be caused by the `amd-pmc` driver sending `OS_HINT`
and all the CPUs going into a deep state while the timer is still being
programmed. It's likely a firmware bug, but to avoid it don't allow setting
CPUs into the deepest state while using CZN timer wakeup path.

If later it's discovered that this also occurs from "regular" suspends
without a timer as well or on other silicon, this may be later expanded to
run in the suspend path for more scenarios.

Cc: stable@vger.kernel.org # 5.16+
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/linux-acpi/BL1PR12MB51570F5BD05980A0DCA1F3F4E23A9@BL1PR12MB5157.namprd12.prod.outlook.com/T/#mee35f39c41a04b624700ab2621c795367f19c90e
Fixes: 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path")
Fixes: 23f62d7ab25b ("PM: sleep: Pause cpuidle later and resume it earlier during system transitions")
Fixes: 59348401ebed ("platform/x86: amd-pmc: Add special handling for timer based S0i3 wakeup"
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20220223175237.6209-1-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
3 years agoMerge tag 'mlx5-fixes-2022-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Thu, 24 Feb 2022 04:30:00 +0000 (20:30 -0800)]
Merge tag 'mlx5-fixes-2022-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2022-02-22

This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.

* tag 'mlx5-fixes-2022-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: Fix VF min/max rate parameters interchange mistake
  net/mlx5e: Add missing increment of count
  net/mlx5e: MPLSoUDP decap, fix check for unsupported matches
  net/mlx5e: Fix MPLSoUDP encap to use MPLS action information
  net/mlx5e: Add feature check for set fec counters
  net/mlx5e: TC, Skip redundant ct clear actions
  net/mlx5e: TC, Reject rules with forward and drop actions
  net/mlx5e: TC, Reject rules with drop and modify hdr action
  net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
  net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
  net/mlx5: Fix possible deadlock on rule deletion
  net/mlx5: Fix tc max supported prio for nic mode
  net/mlx5: Fix wrong limitation of metadata match on ecpf
  net/mlx5: Update log_max_qp value to be 17 at most
  net/mlx5: DR, Fix the threshold that defines when pool sync is initiated
  net/mlx5: DR, Don't allow match on IP w/o matching on full ethertype/ip_version
  net/mlx5: DR, Fix slab-out-of-bounds in mlx5_cmd_dr_create_fte
  net/mlx5: DR, Cache STE shadow memory
  net/mlx5: Update the list of the PCI supported devices
====================

Link: https://lore.kernel.org/r/20220224001123.365265-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge tag 'devicetree-fixes-for-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 24 Feb 2022 01:25:22 +0000 (17:25 -0800)]
Merge tag 'devicetree-fixes-for-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

 - Update some maintainers email addresses

 - Fix handling of elfcorehdr reservation for crash dump kernel

 - Fix unittest expected warnings text

* tag 'devicetree-fixes-for-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: update Roger Quadros email
  MAINTAINERS: sifive: drop Yash Shah
  of/fdt: move elfcorehdr reservation early for crash dump kernel
  of: unittest: update text of expected warnings

3 years agoMerge tag 'selinux-pr-20220223' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 24 Feb 2022 01:19:55 +0000 (17:19 -0800)]
Merge tag 'selinux-pr-20220223' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux fix from Paul Moore:
 "A second small SELinux fix which addresses an incorrect
  mutex_is_locked() check"

* tag 'selinux-pr-20220223' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: fix misuse of mutex_is_locked()

3 years agonet/mlx5e: Fix VF min/max rate parameters interchange mistake
Gal Pressman [Mon, 21 Feb 2022 15:54:34 +0000 (17:54 +0200)]
net/mlx5e: Fix VF min/max rate parameters interchange mistake

The VF min and max rate were passed incorrectly and resulted in wrongly
interchanging them. Fix the order of parameters in
mlx5_esw_qos_set_vport_rate().

Fixes: d7df09f5e7b4 ("net/mlx5: E-switch, Enable vport QoS on demand")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Add missing increment of count
Lama Kayal [Mon, 21 Feb 2022 10:26:11 +0000 (12:26 +0200)]
net/mlx5e: Add missing increment of count

Add mistakenly missing increment of count variable when looping over
output buffer in mlx5e_self_test().

This resolves the issue of garbage values output when querying with self
test via ethtool.

before:
$ ethtool -t eth2
The test result is PASS
The test extra info:
Link Test        0
Speed Test       1768697188
Health Test      758528120
Loopback Test    3288687

after:
$ ethtool -t eth2
The test result is PASS
The test extra info:
Link Test        0
Speed Test       0
Health Test      0
Loopback Test    0

Fixes: 7990b1b5e8bd ("net/mlx5e: loopback test is not supported in switchdev mode")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: MPLSoUDP decap, fix check for unsupported matches
Maor Dickman [Thu, 6 Jan 2022 12:46:24 +0000 (14:46 +0200)]
net/mlx5e: MPLSoUDP decap, fix check for unsupported matches

Currently offload of rule on bareudp device require tunnel key
in order to match on mpls fields and without it the mpls fields
are ignored, this is incorrect due to the fact udp tunnel doesn't
have key to match on.

Fix by returning error in case flow is matching on tunnel key.

Fixes: 72046a91d134 ("net/mlx5e: Allow to match on mpls parameters")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Fix MPLSoUDP encap to use MPLS action information
Maor Dickman [Thu, 6 Jan 2022 12:10:18 +0000 (14:10 +0200)]
net/mlx5e: Fix MPLSoUDP encap to use MPLS action information

Currently the MPLSoUDP encap builds the MPLS header using encap action
information (tunnel id, ttl and tos) instead of the MPLS action
information (label, ttl, tc and bos) which is wrong.

Fix by storing the MPLS action information during the flow action
parse and later using it to create the encap MPLS header.

Fixes: f828ca6a2fb6 ("net/mlx5e: Add support for hw encapsulation of MPLS over UDP")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Add feature check for set fec counters
Lama Kayal [Tue, 1 Feb 2022 09:24:41 +0000 (11:24 +0200)]
net/mlx5e: Add feature check for set fec counters

Fec counters support is checked via the PCAM feature_cap_mask,
bit 0: PPCNT_counter_group_Phy_statistical_counter_group.
Add feature check to avoid faulty behavior.

Fixes: 0a1498ebfa55 ("net/mlx5e: Expose FEC counters via ethtool")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: TC, Skip redundant ct clear actions
Roi Dayan [Thu, 3 Feb 2022 07:42:19 +0000 (09:42 +0200)]
net/mlx5e: TC, Skip redundant ct clear actions

Offload of ct clear action is just resetting the reg_c register.
It's done by allocating modify hdr resources which is limited.
Doing it multiple times is redundant and wasting modify hdr resources
and if resources depleted the driver will fail offloading the rule.
Ignore redundant ct clear actions after the first one.

Fixes: 806401c20a0f ("net/mlx5e: CT, Fix multiple allocations and memleak of mod acts")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: TC, Reject rules with forward and drop actions
Roi Dayan [Mon, 17 Jan 2022 13:00:30 +0000 (15:00 +0200)]
net/mlx5e: TC, Reject rules with forward and drop actions

Such rules are redundant but allowed and passed to the driver.
The driver does not support offloading such rules so return an error.

Fixes: 03a9d11e6eeb ("net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: TC, Reject rules with drop and modify hdr action
Roi Dayan [Tue, 4 Jan 2022 08:38:02 +0000 (10:38 +0200)]
net/mlx5e: TC, Reject rules with drop and modify hdr action

This kind of action is not supported by firmware and generates a
syndrome.

kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)

Fixes: d7e75a325cb2 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
Tariq Toukan [Mon, 31 Jan 2022 08:26:19 +0000 (10:26 +0200)]
net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets

For RX TLS device-offloaded packets, the HW spec guarantees checksum
validation for the offloaded packets, but does not define whether the
CQE.checksum field matches the original packet (ciphertext) or
the decrypted one (plaintext). This latitude allows architetctural
improvements between generations of chips, resulting in different decisions
regarding the value type of CQE.checksum.

Hence, for these packets, the device driver should not make use of this CQE
field. Here we block CHECKSUM_COMPLETE usage for RX TLS device-offloaded
packets, and use CHECKSUM_UNNECESSARY instead.

Value of the packet's tcp_hdr.csum is not modified by the HW, and it always
matches the original ciphertext.

Fixes: 1182f3659357 ("net/mlx5e: kTLS, Add kTLS RX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5e: Fix wrong return value on ioctl EEPROM query failure
Gal Pressman [Wed, 2 Feb 2022 14:07:21 +0000 (16:07 +0200)]
net/mlx5e: Fix wrong return value on ioctl EEPROM query failure

The ioctl EEPROM query wrongly returns success on read failures, fix
that by returning the appropriate error code.

Fixes: bb64143eee8c ("net/mlx5e: Add ethtool support for dump module EEPROM")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Fix possible deadlock on rule deletion
Maor Gottlieb [Mon, 24 Jan 2022 19:25:04 +0000 (21:25 +0200)]
net/mlx5: Fix possible deadlock on rule deletion

Add missing call to up_write_ref_node() which releases the semaphore
in case the FTE doesn't have destinations, such in drop rule case.

Fixes: 465e7baab6d9 ("net/mlx5: Fix deletion of duplicate rules")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Fix tc max supported prio for nic mode
Chris Mi [Tue, 14 Dec 2021 01:52:53 +0000 (03:52 +0200)]
net/mlx5: Fix tc max supported prio for nic mode

Only prio 1 is supported if firmware doesn't support ignore flow
level for nic mode. The offending commit removed the check wrongly.
Add it back.

Fixes: 9a99c8f1253a ("net/mlx5e: E-Switch, Offload all chain 0 priorities when modify header and forward action is not supported")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Fix wrong limitation of metadata match on ecpf
Ariel Levkovich [Fri, 28 Jan 2022 23:39:24 +0000 (01:39 +0200)]
net/mlx5: Fix wrong limitation of metadata match on ecpf

Match metadata support check returns false for ecpf device.
However, this support does exist for ecpf and therefore this
limitation should be removed to allow feature such as stacked
devices and internal port offloaded to be supported.

Fixes: 92ab1eb392c6 ("net/mlx5: E-Switch, Enable vport metadata matching if firmware supports it")
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Update log_max_qp value to be 17 at most
Maher Sanalla [Wed, 16 Feb 2022 09:01:04 +0000 (11:01 +0200)]
net/mlx5: Update log_max_qp value to be 17 at most

Currently, log_max_qp value is dependent on what FW reports as its max capability.
In reality, due to a bug, some FWs report a value greater than 17, even though they
don't support log_max_qp > 17.

This FW issue led the driver to exhaust memory on startup.
Thus, log_max_qp value is set to be no more than 17 regardless
of what FW reports, as it was before the cited commit.

Fixes: f79a609ea6bf ("net/mlx5: Update log_max_qp value to FW max capability")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Fix the threshold that defines when pool sync is initiated
Yevgeny Kliteynik [Wed, 29 Dec 2021 20:22:05 +0000 (22:22 +0200)]
net/mlx5: DR, Fix the threshold that defines when pool sync is initiated

When deciding whether to start syncing and actually free all the "hot"
ICM chunks, we need to consider the type of the ICM chunks that we're
dealing with. For instance, the amount of available ICM for MODIFY_ACTION
is significantly lower than the usual STE ICM, so the threshold should
account for that - otherwise we can deplete MODIFY_ACTION memory just by
creating and deleting the same modify header action in a continuous loop.

This patch replaces the hard-coded threshold with a dynamic value.

Fixes: 1c58651412bb ("net/mlx5: DR, ICM memory pools sync optimization")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Don't allow match on IP w/o matching on full ethertype/ip_version
Yevgeny Kliteynik [Thu, 13 Jan 2022 12:52:48 +0000 (14:52 +0200)]
net/mlx5: DR, Don't allow match on IP w/o matching on full ethertype/ip_version

Currently SMFS allows adding rule with matching on src/dst IP w/o matching
on full ethertype or ip_version, which is not supported by HW.
This patch fixes this issue and adds the check as it is done in DMFS.

Fixes: 26d688e33f88 ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Fix slab-out-of-bounds in mlx5_cmd_dr_create_fte
Yevgeny Kliteynik [Tue, 11 Jan 2022 01:00:03 +0000 (03:00 +0200)]
net/mlx5: DR, Fix slab-out-of-bounds in mlx5_cmd_dr_create_fte

When adding a rule with 32 destinations, we hit the following out-of-band
access issue:

  BUG: KASAN: slab-out-of-bounds in mlx5_cmd_dr_create_fte+0x18ee/0x1e70

This patch fixes the issue by both increasing the allocated buffers to
accommodate for the needed actions and by checking the number of actions
to prevent this issue when a rule with too many actions is provided.

Fixes: 1ffd498901c1 ("net/mlx5: DR, Increase supported num of actions to 32")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: DR, Cache STE shadow memory
Yevgeny Kliteynik [Thu, 23 Dec 2021 23:07:30 +0000 (01:07 +0200)]
net/mlx5: DR, Cache STE shadow memory

During rule insertion on each ICM memory chunk we also allocate shadow memory
used for management. This includes the hw_ste, dr_ste and miss list per entry.
Since the scale of these allocations is large we noticed a performance hiccup
that happens once malloc and free are stressed.
In extreme usecases when ~1M chunks are freed at once, it might take up to 40
seconds to complete this, up to the point the kernel sees this as self-detected
stall on CPU:

 rcu: INFO: rcu_sched self-detected stall on CPU

To resolve this we will increase the reuse of shadow memory.
Doing this we see that a time in the aforementioned usecase dropped from ~40
seconds to ~8-10 seconds.

Fixes: 29cf8febd185 ("net/mlx5: DR, ICM pool memory allocator")
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Update the list of the PCI supported devices
Meir Lichtinger [Mon, 10 Jan 2022 08:14:41 +0000 (10:14 +0200)]
net/mlx5: Update the list of the PCI supported devices

Add the upcoming BlueField-4 and ConnectX-8 device IDs.

Fixes: 2e9d3e83ab82 ("net/mlx5: Update the list of the PCI supported devices")
Signed-off-by: Meir Lichtinger <meirl@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agoMerge tag 'for-5.17/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Wed, 23 Feb 2022 20:06:23 +0000 (12:06 -0800)]
Merge tag 'for-5.17/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc unaligned handler fixes from Helge Deller:
 "Two patches which fix a few bugs in the unalignment handlers.

  The fldd and fstd instructions weren't handled at all on 32-bit
  kernels, the stw instruction didn't check for fault errors and the
  fldw_l and ldw_m were handled wrongly as integer vs floating point
  instructions.

  Both patches are tagged for stable series"

* tag 'for-5.17/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc/unaligned: Fix ldw() and stw() unalignment handlers
  parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel

3 years agoMerge tag 'hwmon-for-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 23 Feb 2022 19:51:35 +0000 (11:51 -0800)]
Merge tag 'hwmon-for-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
 "Fix two old bugs and one new bug in the hwmon subsystem:

   - In pmbus core, clear pmbus fault/warning status bits after read to
     follow PMBus standard

   - In hwmon core, handle failure to register sensor with thermal zone
     correctly

   - In ntc_thermal driver, use valid thermistor names for Samsung
     thermistors"

* tag 'hwmon-for-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (pmbus) Clear pmbus fault/warning bits after read
  hwmon: Handle failure to register sensor with thermal zone correctly
  hwmon: (ntc_thermistor) Underscore Samsung thermistor

3 years agoMerge tag 'slab-for-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka...
Linus Torvalds [Wed, 23 Feb 2022 19:33:12 +0000 (11:33 -0800)]
Merge tag 'slab-for-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab fixes from Vlastimil Babka:

 - Build fix (workaround) for clang.

 - Fix a /proc/kcore based slabinfo script broken by struct slab changes
   in 5.17-rc1.

* tag 'slab-for-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  tools/cgroup/slabinfo: update to work with struct slab
  slab: remove __alloc_size attribute from __kmalloc_track_caller

3 years agoPCI: Mark all AMD Navi10 and Navi14 GPU ATS as broken
Alex Deucher [Tue, 22 Feb 2022 16:08:01 +0000 (11:08 -0500)]
PCI: Mark all AMD Navi10 and Navi14 GPU ATS as broken

There are enough VBIOS escapes without the proper workaround that some
users still hit this.  Microsoft never productized ATS on Windows so OEM
platforms that were Windows-only didn't always validate ATS.

The advantages of ATS are not worth it compared to the potential
instabilities on harvested boards.  Disable ATS on all Navi10 and Navi14
boards.

Symptoms include:

  amdgpu 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0007 address=0xffffc02000 flags=0x0000]
  AMD-Vi: Event logged [IO_PAGE_FAULT device=07:00.0 domain=0x0007 address=0xffffc02000 flags=0x0000]
  [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=6047, emitted seq=6049
  amdgpu 0000:07:00.0: amdgpu: GPU reset begin!
  amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume
  amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
  [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <sdma_v4_0> failed -110
  amdgpu 0000:07:00.0: amdgpu: GPU reset(1) failed

Related commits:

  e8946a53e2a6 ("PCI: Mark AMD Navi14 GPU ATS as broken")
  a2da5d8cc0b0 ("PCI: Mark AMD Raven iGPU ATS as broken in some platforms")
  45beb31d3afb ("PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken")
  5e89cd303e3a ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken")
  d28ca864c493 ("PCI: Mark AMD Stoney Radeon R7 GPU ATS as broken")
  9b44b0b09dec ("PCI: Mark AMD Stoney GPU ATS as broken")

[bhelgaas: add symptoms and related commits]
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1760
Link: https://lore.kernel.org/r/20220222160801.841643-1-alexander.deucher@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
3 years agoparisc/unaligned: Fix ldw() and stw() unalignment handlers
Helge Deller [Fri, 18 Feb 2022 22:40:14 +0000 (23:40 +0100)]
parisc/unaligned: Fix ldw() and stw() unalignment handlers

Fix 3 bugs:

a) emulate_stw() doesn't return the error code value, so faulting
instructions are not reported and aborted.

b) Tell emulate_ldw() to handle fldw_l as floating point instruction

c) Tell emulate_ldw() to handle ldw_m as integer instruction

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
3 years agoparisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel
Helge Deller [Fri, 18 Feb 2022 08:25:20 +0000 (09:25 +0100)]
parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel

Usually the kernel provides fixup routines to emulate the fldd and fstd
floating-point instructions if they load or store 8-byte from/to a not
natuarally aligned memory location.

On a 32-bit kernel I noticed that those unaligned handlers didn't worked and
instead the application got a SEGV.
While checking the code I found two problems:

First, the OPCODE_FLDD_L and OPCODE_FSTD_L cases were ifdef'ed out by the
CONFIG_PA20 option, and as such those weren't built on a pure 32-bit kernel.
This is now fixed by moving the CONFIG_PA20 #ifdef to prevent the compilation
of OPCODE_LDD_L and OPCODE_FSTD_L only, and handling the fldd and fstd
instructions.

The second problem are two bugs in the 32-bit inline assembly code, where the
wrong registers where used. The calculation of the natural alignment used %2
(vall) instead of %3 (ior), and the first word was stored back to address %1
(valh) instead of %3 (ior).

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
3 years agonvme-tcp: send H2CData PDUs based on MAXH2CDATA
Varun Prakash [Sat, 22 Jan 2022 16:57:44 +0000 (22:27 +0530)]
nvme-tcp: send H2CData PDUs based on MAXH2CDATA

As per NVMe/TCP specification (revision 1.0a, section 3.6.2.3)
Maximum Host to Controller Data length (MAXH2CDATA): Specifies the
maximum number of PDU-Data bytes per H2CData PDU in bytes. This value
is a multiple of dwords and should be no less than 4,096.

Current code sets H2CData PDU data_length to r2t_length,
it does not check MAXH2CDATA value. Fix this by setting H2CData PDU
data_length to min(req->h2cdata_left, queue->maxh2cdata).

Also validate MAXH2CDATA value returned by target in ICResp PDU,
if it is not a multiple of dword or if it is less than 4096 return
-EINVAL from nvme_tcp_init_connection().

Signed-off-by: Varun Prakash <varun@chelsio.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme: also mark passthrough-only namespaces ready in nvme_update_ns_info
Christoph Hellwig [Wed, 16 Feb 2022 13:14:58 +0000 (14:14 +0100)]
nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info

Commit e7d65803e2bb ("nvme-multipath: revalidate paths during rescan")
introduced the NVME_NS_READY flag, which nvme_path_is_disabled() uses
to check if a path can be used or not.  We also need to set this flag
for devices that fail the ZNS feature validation and which are available
through passthrough devices only to that they can be used in multipathing
setups.

Fixes: e7d65803e2bb ("nvme-multipath: revalidate paths during rescan")
Reported-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Tested-by: Kanchan Joshi <joshi.k@samsung.com>
3 years agonvme: don't return an error from nvme_configure_metadata
Christoph Hellwig [Wed, 16 Feb 2022 14:07:15 +0000 (15:07 +0100)]
nvme: don't return an error from nvme_configure_metadata

When a fabrics controller claims to support an invalidate metadata
configuration we already warn and disable metadata support.  No need to
also return an error during revalidation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Tested-by: Kanchan Joshi <joshi.k@samsung.com>
3 years agoMerge branch 'ftgmac100-fixes'
David S. Miller [Wed, 23 Feb 2022 12:50:19 +0000 (12:50 +0000)]
Merge branch 'ftgmac100-fixes'

Heyi Guo says:

====================
drivers/net/ftgmac100: fix occasional DHCP failure

This patch set is to fix the issues discussed in the mail thread:
https://lore.kernel.org/netdev/51f5b7a7-330f-6b3c-253d-10e45cdb6805@linux.alibaba.com/
and follows the advice from Andrew Lunn.

The first 2 patches refactors the code to enable adjust_link calling reset
function directly.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodrivers/net/ftgmac100: fix DHCP potential failure with systemd
Heyi Guo [Wed, 23 Feb 2022 03:14:36 +0000 (11:14 +0800)]
drivers/net/ftgmac100: fix DHCP potential failure with systemd

DHCP failures were observed with systemd 247.6. The issue could be
reproduced by rebooting Aspeed 2600 and then running ifconfig ethX
down/up.

It is caused by below procedures in the driver:

1. ftgmac100_open() enables net interface and call phy_start()
2. When PHY is link up, it calls netif_carrier_on() and then
adjust_link callback
3. ftgmac100_adjust_link() will schedule the reset task
4. ftgmac100_reset_task() will then reset the MAC in another schedule

After step 2, systemd will be notified to send DHCP discover packet,
while the packet might be corrupted by MAC reset operation in step 4.

Call ftgmac100_reset() directly instead of scheduling task to fix the
issue.

Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodrivers/net/ftgmac100: adjust code place for function call dependency
Heyi Guo [Wed, 23 Feb 2022 03:14:35 +0000 (11:14 +0800)]
drivers/net/ftgmac100: adjust code place for function call dependency

This is to prepare for ftgmac100_adjust_link() to call
ftgmac100_reset() directly. Only code places are changed.

Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodrivers/net/ftgmac100: refactor ftgmac100_reset_task to enable direct function call
Heyi Guo [Wed, 23 Feb 2022 03:14:34 +0000 (11:14 +0800)]
drivers/net/ftgmac100: refactor ftgmac100_reset_task to enable direct function call

This is to prepare for ftgmac100_adjust_link() to call reset function
directly, instead of task schedule.

Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sched: avoid newline at end of message in NL_SET_ERR_MSG_MOD
Wan Jiabing [Wed, 23 Feb 2022 02:34:19 +0000 (10:34 +0800)]
net: sched: avoid newline at end of message in NL_SET_ERR_MSG_MOD

Fix following coccicheck warning:
./net/sched/act_api.c:277:7-49: WARNING avoid newline at end of message
in NL_SET_ERR_MSG_MOD

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMAINTAINERS: add myself as co-maintainer for Realtek DSA switch drivers
Alvin Šipraga [Tue, 22 Feb 2022 16:14:08 +0000 (17:14 +0100)]
MAINTAINERS: add myself as co-maintainer for Realtek DSA switch drivers

Adding myself (Alvin Šipraga) as another maintainer for the Realtek DSA
switch drivers. I intend to help Linus out with reviewing and testing
changes to these drivers, particularly the rtl8365mb driver which I
authored and have hardware access to.

Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotipc: Fix end of loop tests for list_for_each_entry()
Dan Carpenter [Tue, 22 Feb 2022 13:43:12 +0000 (16:43 +0300)]
tipc: Fix end of loop tests for list_for_each_entry()

These tests are supposed to check if the loop exited via a break or not.
However the tests are wrong because if we did not exit via a break then
"p" is not a valid pointer.  In that case, it's the equivalent of
"if (*(u32 *)sr == *last_key) {".  That's going to work most of the time,
but there is a potential for those to be equal.

Fixes: 1593123a6a49 ("tipc: add name table dump to new netlink api")
Fixes: 1a1a143daf84 ("tipc: add publication dump to new netlink api")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoudp_tunnel: Fix end of loop test in udp_tunnel_nic_unregister()
Dan Carpenter [Tue, 22 Feb 2022 13:42:51 +0000 (16:42 +0300)]
udp_tunnel: Fix end of loop test in udp_tunnel_nic_unregister()

This test is checking if we exited the list via break or not.  However
if it did not exit via a break then "node" does not point to a valid
udp_tunnel_nic_shared_node struct.  It will work because of the way
the structs are laid out it's the equivalent of
"if (info->shared->udp_tunnel_nic_info != dev)" which will always be
true, but it's not the right way to test.

Fixes: 74cc6d182d03 ("udp_tunnel: add the ability to share port tables")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agovhost/vsock: don't check owner in vhost_vsock_stop() while releasing
Stefano Garzarella [Tue, 22 Feb 2022 09:47:42 +0000 (10:47 +0100)]
vhost/vsock: don't check owner in vhost_vsock_stop() while releasing

vhost_vsock_stop() calls vhost_dev_check_owner() to check the device
ownership. It expects current->mm to be valid.

vhost_vsock_stop() is also called by vhost_vsock_dev_release() when
the user has not done close(), so when we are in do_exit(). In this
case current->mm is invalid and we're releasing the device, so we
should clean it anyway.

Let's check the owner only when vhost_vsock_stop() is called
by an ioctl.

When invoked from release we can not fail so we don't check return
code of vhost_vsock_stop(). We need to stop vsock even if it's not
the owner.

Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
Cc: stable@vger.kernel.org
Reported-by: syzbot+1e3ea63db39f2b4440e0@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+3140b17cb44a7b174008@syzkaller.appspotmail.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>