]> www.infradead.org Git - users/dwmw2/linux.git/log
users/dwmw2/linux.git
5 years agoNFSv4: pnfs_roc() must use cred_fscmp() to compare creds
Trond Myklebust [Sun, 26 Jan 2020 22:31:13 +0000 (17:31 -0500)]
NFSv4: pnfs_roc() must use cred_fscmp() to compare creds

commit 387122478775be5d9816c34aa29de53d0b926835 upstream.

When comparing two 'struct cred' for equality w.r.t. behaviour under
filesystem access, we need to use cred_fscmp().

Fixes: a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoNFS: Fix fix of show_nfs_errors
Trond Myklebust [Mon, 6 Jan 2020 20:25:06 +0000 (15:25 -0500)]
NFS: Fix fix of show_nfs_errors

commit 118b6292195cfb86a9f43cb65610fc6d980c65f4 upstream.

Casting a negative value to an unsigned long is not the same as
converting it to its absolute value.

Fixes: 96650e2effa2 ("NFS: Fix show_nfs_errors macros again")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoNFS/pnfs: Fix pnfs_generic_prepare_to_resend_writes()
Trond Myklebust [Mon, 6 Jan 2020 20:25:04 +0000 (15:25 -0500)]
NFS/pnfs: Fix pnfs_generic_prepare_to_resend_writes()

commit 221203ce6406273cf00e5c6397257d986c003ee6 upstream.

Instead of making assumptions about the commit verifier contents, change
the commit code to ensure we always check that the verifier was set
by the XDR code.

Fixes: f54bcf2ecee9 ("pnfs: Prepare for flexfiles by pulling out common code")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoNFS: Revalidate the file size on a fatal write error
Trond Myklebust [Mon, 6 Jan 2020 20:25:00 +0000 (15:25 -0500)]
NFS: Revalidate the file size on a fatal write error

commit 0df68ced55443243951d02cc497be31fadf28173 upstream.

If we suffer a fatal error upon writing a file, which causes us to
need to revalidate the entire mapping, then we should also revalidate
the file size.

Fixes: d2ceb7e57086 ("NFS: Don't use page_file_mapping after removing the page")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonfs: NFS_SWAP should depend on SWAP
Geert Uytterhoeven [Mon, 30 Dec 2019 15:32:38 +0000 (16:32 +0100)]
nfs: NFS_SWAP should depend on SWAP

commit 474c4f306eefbb21b67ebd1de802d005c7d7ecdc upstream.

If CONFIG_SWAP=n, it does not make much sense to offer the user the
option to enable support for swapping over NFS, as that will still fail
at run time:

    # swapon /swap
    swapon: /swap: swapon failed: Function not implemented

Fix this by adding a dependency on CONFIG_SWAP.

Fixes: a564b8f0398636ba ("nfs: enable swap on NFS")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoNFSv4.x recover from pre-mature loss of openstateid
Olga Kornievskaia [Wed, 18 Dec 2019 21:50:42 +0000 (16:50 -0500)]
NFSv4.x recover from pre-mature loss of openstateid

commit d826e5b827641ae1bebb33d23a774f4e9bb8e94f upstream.

Ever since the commit 0e0cb35b417f, it's possible to lose an open stateid
while retrying a CLOSE due to ERR_OLD_STATEID. Once that happens,
operations that require openstateid fail with EAGAIN which is propagated
to the application then tests like generic/446 and generic/168 fail with
"Resource temporarily unavailable".

Instead of returning this error, initiate state recovery when possible to
recover the open stateid and then try calling nfs4_select_rw_stateid()
again.

Fixes: 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonetfilter: flowtable: Fix missing flush hardware on table free
Paul Blakey [Thu, 30 Jan 2020 16:04:36 +0000 (18:04 +0200)]
netfilter: flowtable: Fix missing flush hardware on table free

commit 0f34f30a1be80f3f59efeaab596396bc698e7337 upstream.

If entries exist when freeing a hardware offload enabled table,
we queue work for hardware while running the gc iteration.

Execute it (flush) after queueing.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonetfilter: flowtable: Fix hardware flush order on nf_flow_table_cleanup
Paul Blakey [Thu, 30 Jan 2020 16:04:35 +0000 (18:04 +0200)]
netfilter: flowtable: Fix hardware flush order on nf_flow_table_cleanup

commit 91bfaa15a379e9af24f71fb4ee08d8019b6e8ec7 upstream.

On netdev down event, nf_flow_table_cleanup() is called for the relevant
device and it cleans all the tables that are on that device.
If one of those tables has hardware offload flag,
nf_flow_table_iterate_cleanup flushes hardware and then runs the gc.
But the gc can queue more hardware work, which will take time to execute.

Instead first add the work, then flush it, to execute it now.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonetfilter: flowtable: restrict flow dissector match on meta ingress device
Pablo Neira Ayuso [Mon, 6 Jan 2020 11:42:55 +0000 (12:42 +0100)]
netfilter: flowtable: restrict flow dissector match on meta ingress device

commit a7521a60a5f3e1f58a015fedb6e69aed40455feb upstream.

Set on FLOW_DISSECTOR_KEY_META meta key using flow tuple ingress interface.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonetfilter: flowtable: fetch stats only if flow is still alive
Pablo Neira Ayuso [Sun, 5 Jan 2020 21:26:38 +0000 (22:26 +0100)]
netfilter: flowtable: fetch stats only if flow is still alive

commit 79b9b685dde1d1bf43cf84163c76953dc3781c85 upstream.

Do not fetch statistics if flow has expired since it might not in
hardware anymore. After this update, remove the FLOW_OFFLOAD_HW_DYING
check from nf_flow_offload_stats() since this flag is never set on.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoiwlwifi: mvm: fix TDLS discovery with the new firmware API
Emmanuel Grumbach [Fri, 31 Jan 2020 13:45:29 +0000 (15:45 +0200)]
iwlwifi: mvm: fix TDLS discovery with the new firmware API

commit b5b878e36c1836c0195575132cc7c199e5a34a7b upstream.

I changed the API for asking for a session protection but
I omitted the TDLS flows. Fix that now.
Note that for the TDLS flow, we need to block until the
session protection actually starts, so add this option
to iwl_mvm_schedule_session_protection.
This patch fixes a firmware assert in the TDLS flow since
the old TIME_EVENT_CMD is not supported anymore by newer
firwmare versions.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Fixes: fe959c7b2049 ("iwlwifi: mvm: use the new session protection command")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoiwlwifi: mvm: avoid use after free for pmsr request
Avraham Stern [Fri, 31 Jan 2020 13:45:27 +0000 (15:45 +0200)]
iwlwifi: mvm: avoid use after free for pmsr request

commit cc4255eff523f25187bb95561642941de0e57497 upstream.

When a FTM request is aborted, the driver sends the abort command to
the fw and waits for a response. When the response arrives, the driver
calls cfg80211_pmsr_complete() for that request.
However, cfg80211 frees the requested data immediately after sending
the abort command, so this may lead to use after free.

Fix it by clearing the request data in the driver when the abort
command arrives and ignoring the fw notification that will come
afterwards.

Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Fixes: fc36ffda3267 ("iwlwifi: mvm: support FTM initiator")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoPCI/AER: Initialize aer_fifo
Dongdong Liu [Thu, 23 Jan 2020 08:26:31 +0000 (16:26 +0800)]
PCI/AER: Initialize aer_fifo

commit d95f20c4f07020ebc605f3b46af4b6db9eb5fc99 upstream.

Previously we did not call INIT_KFIFO() for aer_fifo.  This leads to
kfifo_put() sometimes returning 0 (queue full) when in fact it is not.

It is easy to reproduce the problem by using aer-inject:

  $ aer-inject -s :82:00.0 multiple-corr-nonfatal

The content of the multiple-corr-nonfatal file is as below:

  AER
  COR RCVR
  HL 0 1 2 3
  AER
  UNCOR POISON_TLP
  HL 4 5 6 7

Fixes: 27c1ce8bbed7 ("PCI/AER: Use kfifo for tracking events instead of reimplementing it")
Link: https://lore.kernel.org/r/1579767991-103898-1-git-send-email-liudongdong3@huawei.com
Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoPCI: Don't disable bridge BARs when assigning bus resources
Logan Gunthorpe [Wed, 8 Jan 2020 21:32:08 +0000 (14:32 -0700)]
PCI: Don't disable bridge BARs when assigning bus resources

commit 9db8dc6d0785225c42a37be7b44d1b07b31b8957 upstream.

Some PCI bridges implement BARs in addition to bridge windows.  For
example, here's a PLX switch:

  04:00.0 PCI bridge: PLX Technology, Inc. PEX 8724 24-Lane, 6-Port PCI
            Express Gen 3 (8 GT/s) Switch, 19 x 19mm FCBGA (rev ca)
    (prog-if 00 [Normal decode])
      Flags: bus master, fast devsel, latency 0, IRQ 30, NUMA node 0
      Memory at 90a00000 (32-bit, non-prefetchable) [size=256K]
      Bus: primary=04, secondary=05, subordinate=0a, sec-latency=0
      I/O behind bridge: 00002000-00003fff
      Memory behind bridge: 90000000-909fffff
      Prefetchable memory behind bridge: 0000380000800000-0000380000bfffff

Previously, when the kernel assigned resource addresses (with the
pci=realloc command line parameter, for example) it could clear the struct
resource corresponding to the BAR.  When this happened, lspci would report
this BAR as "ignored":

   Region 0: Memory at <ignored> (32-bit, non-prefetchable) [size=256K]

This is because the kernel reports a zero start address and zero flags
in the corresponding sysfs resource file and in /proc/bus/pci/devices.
Investigation with 'lspci -x', however, shows the BIOS-assigned address
will still be programmed in the device's BAR registers.

It's clearly a bug that the kernel lost track of the BAR value, but in most
cases, this still won't result in a visible issue because nothing uses the
memory, so nothing is affected.  However, when an IOMMU is in use, it will
not reserve this space in the IOVA because the kernel no longer thinks the
range is valid.  (See dmar_init_reserved_ranges() for the Intel
implementation of this.)

Without the proper reserved range, a DMA mapping may allocate an IOVA that
matches a bridge BAR, which results in DMA accesses going to the BAR
instead of the intended RAM.

The problem was in pci_assign_unassigned_root_bus_resources().  When any
resource from a bridge device fails to get assigned, the code set the
resource's flags to zero.  This makes sense for bridge windows, as they
will be re-enabled later, but for regular BARs, it makes the kernel
permanently lose track of the fact that they decode address space.

Change pci_assign_unassigned_root_bus_resources() and
pci_assign_unassigned_bridge_resources() so they only clear "res->flags"
for bridge *windows*, not bridge BARs.

Fixes: da7822e5ad71 ("PCI: update bridge resources to get more big ranges when allocating space (again)")
Link: https://lore.kernel.org/r/20200108213208.4612-1-logang@deltatee.com
[bhelgaas: commit log, check for pci_is_bridge()]
Reported-by: Kit Chow <kchow@gigaio.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoPCI: tegra: Fix afi_pex2_ctrl reg offset for Tegra30
Marcel Ziswiler [Tue, 7 Jan 2020 08:14:02 +0000 (09:14 +0100)]
PCI: tegra: Fix afi_pex2_ctrl reg offset for Tegra30

commit 21a92676e1fe292acb077b13106b08c22ed36b14 upstream.

Fix AFI_PEX2_CTRL reg offset for Tegra30 by moving it from the Tegra20
SoC struct where it erroneously got added. This fixes the AFI_PEX2_CTRL
reg offset being uninitialised subsequently failing to bring up the
third PCIe port.

Fixes: adb2653b3d2e ("PCI: tegra: Add AFI_PEX2_CTRL reg offset as part of SoC struct")
Signed-off-by: Marcel Ziswiler <marcel@ziswiler.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoPCI/switchtec: Fix vep_vector_number ioread width
Logan Gunthorpe [Mon, 6 Jan 2020 19:03:27 +0000 (12:03 -0700)]
PCI/switchtec: Fix vep_vector_number ioread width

commit 9375646b4cf03aee81bc6c305aa18cc80b682796 upstream.

vep_vector_number is actually a 16 bit register which should be read with
ioread16() instead of ioread32().

Fixes: 080b47def5e5 ("MicroSemi Switchtec management interface driver")
Link: https://lore.kernel.org/r/20200106190337.2428-3-logang@deltatee.com
Reported-by: Doug Meyer <dmeyer@gigaio.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoPCI/switchtec: Use dma_set_mask_and_coherent()
Wesley Sheng [Mon, 6 Jan 2020 19:03:26 +0000 (12:03 -0700)]
PCI/switchtec: Use dma_set_mask_and_coherent()

commit aa82130a22f77c1aa5794703730304d035a0c1f4 upstream.

Use dma_set_mask_and_coherent() instead of dma_set_coherent_mask() as the
Switchtec hardware fully supports 64bit addressing and we should set both
the streaming and coherent masks the same.

[logang@deltatee.com: reworked commit message]
Fixes: aff614c6339c ("switchtec: Set DMA coherent mask")
Link: https://lore.kernel.org/r/20200106190337.2428-2-logang@deltatee.com
Signed-off-by: Wesley Sheng <wesley.sheng@microchip.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoath10k: pci: Only dump ATH10K_MEM_REGION_TYPE_IOREG when safe
Bryan O'Donoghue [Thu, 19 Dec 2019 13:15:38 +0000 (13:15 +0000)]
ath10k: pci: Only dump ATH10K_MEM_REGION_TYPE_IOREG when safe

commit d239380196c4e27a26fa4bea73d2bf994c14ec2d upstream.

ath10k_pci_dump_memory_reg() will try to access memory of type
ATH10K_MEM_REGION_TYPE_IOREG however, if a hardware restart is in progress
this can crash a system.

Individual ioread32() time has been observed to jump from 15-20 ticks to >
80k ticks followed by a secure-watchdog bite and a system reset.

Work around this corner case by only issuing the read transaction when the
driver state is ATH10K_STATE_ON.

Tested-on: QCA9988 PCI 10.4-3.9.0.2-00044

Fixes: 219cc084c6706 ("ath10k: add memory dump support QCA9984")
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoPCI/IOV: Fix memory leak in pci_iov_add_virtfn()
Navid Emamdoost [Mon, 25 Nov 2019 19:52:52 +0000 (13:52 -0600)]
PCI/IOV: Fix memory leak in pci_iov_add_virtfn()

commit 8c386cc817878588195dde38e919aa6ba9409d58 upstream.

In the implementation of pci_iov_add_virtfn() the allocated virtfn is
leaked if pci_setup_device() fails. The error handling is not calling
pci_stop_and_remove_bus_device(). Change the goto label to failed2.

Fixes: 156c55325d30 ("PCI: Check for pci_setup_device() failure in pci_iov_add_virtfn()")
Link: https://lore.kernel.org/r/20191125195255.23740-1-navid.emamdoost@gmail.com
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoscsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails
Bean Huo [Mon, 20 Jan 2020 13:08:13 +0000 (14:08 +0100)]
scsi: ufs: Fix ufshcd_probe_hba() reture value in case ufshcd_scsi_add_wlus() fails

commit b9fc5320212efdfb4e08b825aaa007815fd11d16 upstream.

A non-zero error value likely being returned by ufshcd_scsi_add_wlus() in
case of failure of adding the WLs, but ufshcd_probe_hba() doesn't use this
value, and doesn't report this failure to upper caller.  This patch is to
fix this issue.

Fixes: 2a8fa600445c ("ufs: manually add well known logical units")
Link: https://lore.kernel.org/r/20200120130820.1737-2-huobean@gmail.com
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoRDMA/umem: Fix ib_umem_find_best_pgsz()
Artemy Kovalyov [Tue, 28 Jan 2020 13:56:12 +0000 (15:56 +0200)]
RDMA/umem: Fix ib_umem_find_best_pgsz()

commit 36798d5ae1af62e830c5e045b2e41ce038690c61 upstream.

Except for the last entry, the ending iova alignment sets the maximum
possible page size as the low bits of the iova must be zero when starting
the next chunk.

Fixes: 4a35339958f1 ("RDMA/umem: Add API to find best driver supported page size in an MR")
Link: https://lore.kernel.org/r/20200128135612.174820-1-leon@kernel.org
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoRDMA/cma: Fix unbalanced cm_id reference count during address resolve
Parav Pandit [Sun, 26 Jan 2020 14:26:46 +0000 (16:26 +0200)]
RDMA/cma: Fix unbalanced cm_id reference count during address resolve

commit b4fb4cc5ba83b20dae13cef116c33648e81d2f44 upstream.

Below commit missed the AF_IB and loopback code flow in
rdma_resolve_addr().  This leads to an unbalanced cm_id refcount in
cma_work_handler() which puts the refcount which was not incremented prior
to queuing the work.

A call trace is observed with such code flow:

 BUG: unable to handle kernel NULL pointer dereference at (null)
 [<ffffffff96b67e16>] __mutex_lock_slowpath+0x166/0x1d0
 [<ffffffff96b6715f>] mutex_lock+0x1f/0x2f
 [<ffffffffc0beabb5>] cma_work_handler+0x25/0xa0
 [<ffffffff964b9ebf>] process_one_work+0x17f/0x440
 [<ffffffff964baf56>] worker_thread+0x126/0x3c0

Hence, hold the cm_id reference when scheduling the resolve work item.

Fixes: 722c7b2bfead ("RDMA/{cma, core}: Avoid callback on rdma_addr_cancel()")
Link: https://lore.kernel.org/r/20200126142652.104803-2-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoRDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence
Jason Gunthorpe [Wed, 15 Jan 2020 20:20:44 +0000 (20:20 +0000)]
RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence

commit 6b3712c0246ca7b2b8fa05eab2362cf267410f7e upstream.

The set of entry->driver_removed is missing locking, protect it with
xa_lock() which is held by the only reader.

Otherwise readers may continue to see driver_removed = false after
rdma_user_mmap_entry_remove() returns and may continue to try and
establish new mmaps.

Fixes: 3411f9f01b76 ("RDMA/core: Create mmap database and cookie helper functions")
Link: https://lore.kernel.org/r/20200115202041.GA17199@ziepe.ca
Reviewed-by: Gal Pressman <galpress@amazon.com>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoRDMA/mlx5: Fix handling of IOVA != user_va in ODP paths
Jason Gunthorpe [Wed, 15 Jan 2020 12:43:37 +0000 (14:43 +0200)]
RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths

commit 8ffc32485158528f870b62707077ab494ba31deb upstream.

Till recently it was not possible for userspace to specify a different
IOVA, but with the new ibv_reg_mr_iova() library call this can be done.

To compute the user_va we must compute:
  user_va = (iova - iova_start) + user_va_start

while being cautious of overflow and other math problems.

The iova is not reliably stored in the mmkey when the MR is created. Only
the cached creation path (the common one) set it, so it must also be set
when creating uncached MRs.

Fix the weird use of iova when computing the starting page index in the
MR. In the normal case, when iova == umem.address:
  iova & (~(BIT(page_shift) - 1)) ==
  ALIGN_DOWN(umem.address, odp->page_size) ==
  ib_umem_start(odp)

And when iova is different using it in math with a user_va is wrong.

Finally, do not allow an implicit ODP to be created with a non-zero IOVA
as we have no support for that.

Fixes: 7bdf65d411c1 ("IB/mlx5: Handle page faults")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoRDMA/uverbs: Verify MR access flags
Michael Guralnik [Wed, 8 Jan 2020 18:05:35 +0000 (20:05 +0200)]
RDMA/uverbs: Verify MR access flags

commit ca95c1411198c2d87217c19d44571052cdc94725 upstream.

Verify that MR access flags that are passed from user are all supported
ones, otherwise an error is returned.

Fixes: 4fca03778351 ("IB/uverbs: Move ib_access_flags and ib_read_counters_flags to uapi")
Link: https://lore.kernel.org/r/1578506740-22188-6-git-send-email-yishaih@mellanox.com
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoRDMA/core: Fix locking in ib_uverbs_event_read
Jason Gunthorpe [Wed, 8 Jan 2020 17:22:03 +0000 (19:22 +0200)]
RDMA/core: Fix locking in ib_uverbs_event_read

commit 14e23bd6d22123f6f3b2747701fa6cd4c6d05873 upstream.

This should not be using ib_dev to test for disassociation, during
disassociation is_closed is set under lock and the waitq is triggered.

Instead check is_closed and be sure to re-obtain the lock to test the
value after the wait_event returns.

Fixes: 036b10635739 ("IB/uverbs: Enable device removal when there are active user space applications")
Link: https://lore.kernel.org/r/1578504126-9400-12-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoRDMA/i40iw: fix a potential NULL pointer dereference
Xiyu Yang [Mon, 30 Dec 2019 02:24:28 +0000 (10:24 +0800)]
RDMA/i40iw: fix a potential NULL pointer dereference

commit 04db1580b5e48a79e24aa51ecae0cd4b2296ec23 upstream.

A NULL pointer can be returned by in_dev_get(). Thus add a corresponding
check so that a NULL pointer dereference will be avoided at this place.

Fixes: 8e06af711bf2 ("i40iw: add main, hdr, status")
Link: https://lore.kernel.org/r/1577672668-46499-1-git-send-email-xiyuyang19@fudan.edu.cn
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoRDMA/netlink: Do not always generate an ACK for some netlink operations
HÃ¥kon Bugge [Mon, 16 Dec 2019 12:04:36 +0000 (13:04 +0100)]
RDMA/netlink: Do not always generate an ACK for some netlink operations

commit a242c36951ecd24bc16086940dbe6b522205c461 upstream.

In rdma_nl_rcv_skb(), the local variable err is assigned the return value
of the supplied callback function, which could be one of
ib_nl_handle_resolve_resp(), ib_nl_handle_set_timeout(), or
ib_nl_handle_ip_res_resp(). These three functions all return skb->len on
success.

rdma_nl_rcv_skb() is merely a copy of netlink_rcv_skb(). The callback
functions used by the latter have the convention: "Returns 0 on success or
a negative error code".

In particular, the statement (equal for both functions):

   if (nlh->nlmsg_flags & NLM_F_ACK || err)

implies that rdma_nl_rcv_skb() always will ack a message, independent of
the NLM_F_ACK being set in nlmsg_flags or not.

The fix could be to change the above statement, but it is better to keep
the two *_rcv_skb() functions equal in this respect and instead change the
three callback functions in the rdma subsystem to the correct convention.

Fixes: 2ca546b92a02 ("IB/sa: Route SA pathrecord query through netlink")
Fixes: ae43f8286730 ("IB/core: Add IP to GID netlink offload")
Link: https://lore.kernel.org/r/20191216120436.3204814-1-haakon.bugge@oracle.com
Suggested-by: Mark Haywood <mark.haywood@oracle.com>
Signed-off-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Tested-by: Mark Haywood <mark.haywood@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoIB/mlx4: Fix leak in id_map_find_del
HÃ¥kon Bugge [Thu, 23 Jan 2020 15:55:21 +0000 (16:55 +0100)]
IB/mlx4: Fix leak in id_map_find_del

commit ea660ad7c1c476fd6e5e3b17780d47159db71dea upstream.

Using CX-3 virtual functions, either from a bare-metal machine or
pass-through from a VM, MAD packets are proxied through the PF driver.

Since the VF drivers have separate name spaces for MAD Transaction Ids
(TIDs), the PF driver has to re-map the TIDs and keep the book keeping in
a cache.

Following the RDMA Connection Manager (CM) protocol, it is clear when an
entry has to evicted from the cache. When a DREP is sent from
mlx4_ib_multiplex_cm_handler(), id_map_find_del() is called. Similar when
a REJ is received by the mlx4_ib_demux_cm_handler(), id_map_find_del() is
called.

This function wipes out the TID in use from the IDR or XArray and removes
the id_map_entry from the table.

In short, it does everything except the topping of the cake, which is to
remove the entry from the list and free it. In other words, for the REJ
case enumerated above, one id_map_entry will be leaked.

For the other case above, a DREQ has been received first. The reception of
the DREQ will trigger queuing of a delayed work to delete the
id_map_entry, for the case where the VM doesn't send back a DREP.

In the normal case, the VM _will_ send back a DREP, and id_map_find_del()
will be called.

But this scenario introduces a secondary leak. First, when the DREQ is
received, a delayed work is queued. The VM will then return a DREP, which
will call id_map_find_del(). As stated above, this will free the TID used
from the XArray or IDR. Now, there is window where that particular TID can
be re-allocated, lets say by an outgoing REQ. This TID will later be wiped
out by the delayed work, when the function id_map_ent_timeout() is
called. But the id_map_entry allocated by the outgoing REQ will not be
de-allocated, and we have a leak.

Both leaks are fixed by removing the id_map_find_del() function and only
using schedule_delayed(). Of course, a check in schedule_delayed() to see
if the work already has been queued, has been added.

Another benefit of always using the delayed version for deleting entries,
is that we do get a TimeWait effect; a TID no longer in use, will occupy
the XArray or IDR for CM_CLEANUP_CACHE_TIMEOUT time, without any ability
of being re-used for that time period.

Fixes: 3cf69cc8dbeb ("IB/mlx4: Add CM paravirtualization")
Link: https://lore.kernel.org/r/20200123155521.1212288-1-haakon.bugge@oracle.com
Signed-off-by: HÃ¥kon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoIB/mlx5: Return the administrative GUID if exists
Danit Goldberg [Thu, 16 Jan 2020 12:00:48 +0000 (14:00 +0200)]
IB/mlx5: Return the administrative GUID if exists

commit 4bbd4923d1f5627b0c47a9d7dfb5cc91224cfe0c upstream.

A user can change the operational GUID (a.k.a affective GUID) through
link/infiniband. Therefore it is preferred to return the currently set
GUID if it exists instead of the operational.

This way the PF can query which VF GUID will be set in the next bind.  In
order to align with MAC address, zero is returned if administrative GUID
is not set.

For example, before setting administrative GUID:
 $ ip link show
 ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256
 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
 vf 0     link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
 spoof checking off, NODE_GUID 00:00:00:00:00:00:00:00, PORT_GUID 00:00:00:00:00:00:00:00, link-state auto, trust off, query_rss off

Then:

 $ ip link set ib0 vf 0 node_guid 11:00:af:21:cb:05:11:00
 $ ip link set ib0 vf 0 port_guid 22:11:af:21:cb:05:11:00

After setting administrative GUID:
 $ ip link show
 ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256
 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
 vf 0     link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
 spoof checking off, NODE_GUID 11:00:af:21:cb:05:11:00, PORT_GUID 22:11:af:21:cb:05:11:00, link-state auto, trust off, query_rss off

Fixes: 9c0015ef0928 ("IB/mlx5: Implement callbacks for getting VFs GUID attributes")
Link: https://lore.kernel.org/r/20200116120048.12744-1-leon@kernel.org
Signed-off-by: Danit Goldberg <danitg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoIB/srp: Never use immediate data if it is disabled by a user
Sergey Gorenko [Wed, 15 Jan 2020 13:30:55 +0000 (13:30 +0000)]
IB/srp: Never use immediate data if it is disabled by a user

commit 0fbb37dd82998b5c83355997b3bdba2806968ac7 upstream.

Some SRP targets that do not support specification SRP-2, put the garbage
to the reserved bits of the SRP login response.  The problem was not
detected for a long time because the SRP initiator ignored those bits. But
now one of them is used as SRP_LOGIN_RSP_IMMED_SUPP. And it causes a
critical error on the target when the initiator sends immediate data.

The ib_srp module has a use_imm_date parameter to enable or disable
immediate data manually. But it does not help in the above case, because
use_imm_date is ignored at handling the SRP login response. The problem is
definitely caused by a bug on the target side, but the initiator's
behavior also does not look correct.  The initiator should not use
immediate data if use_imm_date is disabled by a user.

This commit adds an additional checking of use_imm_date at the handling of
SRP login response to avoid unexpected use of immediate data.

Fixes: 882981f4a411 ("RDMA/srp: Add support for immediate data")
Link: https://lore.kernel.org/r/20200115133055.30232-1-sergeygo@mellanox.com
Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoIB/mlx4: Fix memory leak in add_gid error flow
Jack Morgenstein [Wed, 15 Jan 2020 08:50:50 +0000 (10:50 +0200)]
IB/mlx4: Fix memory leak in add_gid error flow

commit eaad647e5cc27f7b46a27f3b85b14c4c8a64bffa upstream.

In procedure mlx4_ib_add_gid(), if the driver is unable to update the FW
gid table, there is a memory leak in the driver's copy of the gid table:
the gid entry's context buffer is not freed.

If such an error occurs, free the entry's context buffer, and mark the
entry as available (by setting its context pointer to NULL).

Fixes: e26be1bfef81 ("IB/mlx4: Implement ib_device callbacks")
Link: https://lore.kernel.org/r/20200115085050.73746-1-leon@kernel.org
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoLinux 5.5.3 v5.5.3
Greg Kroah-Hartman [Tue, 11 Feb 2020 12:37:31 +0000 (04:37 -0800)]
Linux 5.5.3

5 years agocompat: ARM64: always include asm-generic/compat.h
Arnd Bergmann [Mon, 9 Dec 2019 15:16:20 +0000 (16:16 +0100)]
compat: ARM64: always include asm-generic/compat.h

commit 556d687a4ccd54ab50a721ddde42c820545effd9 upstream.

In order to use compat_* type defininitions in device drivers
outside of CONFIG_COMPAT, move the inclusion of asm-generic/compat.h
ahead of the #ifdef.

All other architectures already do this.

Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agopowerpc/kuap: Fix set direction in allow/prevent_user_access()
Christophe Leroy [Fri, 24 Jan 2020 11:54:41 +0000 (11:54 +0000)]
powerpc/kuap: Fix set direction in allow/prevent_user_access()

[ Upstream commit 1d8f739b07bd538f272f60bf53f10e7e6248d295 ]

__builtin_constant_p() always return 0 for pointers, so on RADIX
we always end up opening both direction (by writing 0 in SPR29):

  0000000000000170 <._copy_to_user>:
  ...
   1b0: 4c 00 01 2c  isync
   1b4: 39 20 00 00  li      r9,0
   1b8: 7d 3d 03 a6  mtspr   29,r9
   1bc: 4c 00 01 2c  isync
   1c0: 48 00 00 01  bl      1c0 <._copy_to_user+0x50>
   1c0: R_PPC64_REL24 .__copy_tofrom_user
  ...
  0000000000000220 <._copy_from_user>:
  ...
   2ac: 4c 00 01 2c  isync
   2b0: 39 20 00 00  li      r9,0
   2b4: 7d 3d 03 a6  mtspr   29,r9
   2b8: 4c 00 01 2c  isync
   2bc: 7f c5 f3 78  mr      r5,r30
   2c0: 7f 83 e3 78  mr      r3,r28
   2c4: 48 00 00 01  bl      2c4 <._copy_from_user+0xa4>
   2c4: R_PPC64_REL24 .__copy_tofrom_user
  ...

Use an explicit parameter for direction selection, so that GCC
is able to see it is a constant:

  00000000000001b0 <._copy_to_user>:
  ...
   1f0: 4c 00 01 2c  isync
   1f4: 3d 20 40 00  lis     r9,16384
   1f8: 79 29 07 c6  rldicr  r9,r9,32,31
   1fc: 7d 3d 03 a6  mtspr   29,r9
   200: 4c 00 01 2c  isync
   204: 48 00 00 01  bl      204 <._copy_to_user+0x54>
   204: R_PPC64_REL24 .__copy_tofrom_user
  ...
  0000000000000260 <._copy_from_user>:
  ...
   2ec: 4c 00 01 2c  isync
   2f0: 39 20 ff ff  li      r9,-1
   2f4: 79 29 00 04  rldicr  r9,r9,0,0
   2f8: 7d 3d 03 a6  mtspr   29,r9
   2fc: 4c 00 01 2c  isync
   300: 7f c5 f3 78  mr      r5,r30
   304: 7f 83 e3 78  mr      r3,r28
   308: 48 00 00 01  bl      308 <._copy_from_user+0xa8>
   308: R_PPC64_REL24 .__copy_tofrom_user
  ...

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Spell out the directions, s/KUAP_R/KUAP_READ/ etc.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f4e88ec4941d5facb35ce75026b0112f980086c3.1579866752.git.christophe.leroy@c-s.fr
Signed-off-by: Sasha Levin <sashal@kernel.org>
5 years agoregulator fix for "regulator: core: Add regulator_is_equal() helper"
Stephen Rothwell [Wed, 15 Jan 2020 01:02:58 +0000 (12:02 +1100)]
regulator fix for "regulator: core: Add regulator_is_equal() helper"

[ Upstream commit 0468e667a5bead9c1b7ded92861b5a98d8d78745 ]

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/r/20200115120258.0e535fcb@canb.auug.org.au
Acked-by: Marek Vasut <marex@denx.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
5 years agocrypto: atmel-tdes - Map driver data flags to Mode Register
Tudor Ambarus [Thu, 5 Dec 2019 09:53:56 +0000 (09:53 +0000)]
crypto: atmel-tdes - Map driver data flags to Mode Register

[ Upstream commit 848572f817721499c05b66553afc7ce0c08b1723 ]

Simplifies the configuration of the TDES IP.

Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
5 years agocrypto: atmel-aes - Fix CTR counter overflow when multiple fragments
Tudor Ambarus [Fri, 13 Dec 2019 14:45:44 +0000 (14:45 +0000)]
crypto: atmel-aes - Fix CTR counter overflow when multiple fragments

[ Upstream commit 3907ccfaec5d9965e306729936fc732c94d2c1e7 ]

The CTR transfer works in fragments of data of maximum 1 MByte because
of the 16 bit CTR counter embedded in the IP. Fix the CTR counter
overflow handling for messages larger than 1 MByte.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 781a08d9740a ("crypto: atmel-aes - Fix counter overflow in CTR mode")
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
5 years agocrypto: atmel-aes - Fix saving of IV for CTR mode
Tudor Ambarus [Thu, 5 Dec 2019 09:54:03 +0000 (09:54 +0000)]
crypto: atmel-aes - Fix saving of IV for CTR mode

[ Upstream commit 371731ec2179d5810683406e7fc284b41b127df7 ]

The req->iv of the skcipher_request is expected to contain the
last used IV. Update the req->iv for CTR mode.

Fixes: bd3c7b5c2aba ("crypto: atmel - add Atmel AES driver")
Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
5 years agocrypto: atmel-{aes,tdes} - Do not save IV for ECB mode
Tudor Ambarus [Thu, 5 Dec 2019 09:54:00 +0000 (09:54 +0000)]
crypto: atmel-{aes,tdes} - Do not save IV for ECB mode

[ Upstream commit c65d123742a7bf2a5bc9fa8398e1fd2376eb4c43 ]

ECB mode does not use IV.

Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
5 years agoIB/core: Fix build failure without hugepages
Arnd Bergmann [Thu, 9 Jan 2020 08:47:30 +0000 (09:47 +0100)]
IB/core: Fix build failure without hugepages

[ Upstream commit 74f75cda754eb69a77f910ceb5bc85f8e9ba56a5 ]

HPAGE_SHIFT is only defined on architectures that support hugepages:

drivers/infiniband/core/umem_odp.c: In function 'ib_umem_odp_get':
drivers/infiniband/core/umem_odp.c:245:26: error: 'HPAGE_SHIFT' undeclared (first use in this function); did you mean 'PAGE_SHIFT'?

Enclose this in an #ifdef.

Fixes: 9ff1b6466a29 ("IB/core: Fix ODP with IB_ACCESS_HUGETLB handling")
Link: https://lore.kernel.org/r/20200109084740.2872079-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
5 years agorxrpc: Fix service call disconnection
David Howells [Thu, 6 Feb 2020 13:55:01 +0000 (13:55 +0000)]
rxrpc: Fix service call disconnection

[ Upstream commit b39a934ec72fa2b5a74123891f25273a38378b90 ]

The recent patch that substituted a flag on an rxrpc_call for the
connection pointer being NULL as an indication that a call was disconnected
puts the set_bit in the wrong place for service calls.  This is only a
problem if a call is implicitly terminated by a new call coming in on the
same connection channel instead of a terminating ACK packet.

In such a case, rxrpc_input_implicit_end_call() calls
__rxrpc_disconnect_call(), which is now (incorrectly) setting the
disconnection bit, meaning that when rxrpc_release_call() is later called,
it doesn't call rxrpc_disconnect_call() and so the call isn't removed from
the peer's error distribution list and the list gets corrupted.

KASAN finds the issue as an access after release on a call, but the
position at which it occurs is confusing as it appears to be related to a
different call (the call site is where the latter call is being removed
from the error distribution list and either the next or pprev pointer
points to a previously released call).

Fix this by moving the setting of the flag from __rxrpc_disconnect_call()
to rxrpc_disconnect_call() in the same place that the connection pointer
was being cleared.

Fixes: 5273a191dca6 ("rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
5 years agoKVM: Play nice with read-only memslots when querying host page size
Sean Christopherson [Wed, 8 Jan 2020 20:24:38 +0000 (12:24 -0800)]
KVM: Play nice with read-only memslots when querying host page size

[ Upstream commit 42cde48b2d39772dba47e680781a32a6c4b7dc33 ]

Avoid the "writable" check in __gfn_to_hva_many(), which will always fail
on read-only memslots due to gfn_to_hva() assuming writes.  Functionally,
this allows x86 to create large mappings for read-only memslots that
are backed by HugeTLB mappings.

Note, the changelog for commit 05da45583de9 ("KVM: MMU: large page
support") states "If the largepage contains write-protected pages, a
large pte is not used.", but "write-protected" refers to pages that are
temporarily read-only, e.g. read-only memslots didn't even exist at the
time.

Fixes: 4d8b81abc47b ("KVM: introduce readonly memslot")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Redone using kvm_vcpu_gfn_to_memslot_prot. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
5 years agoKVM: Use vcpu-specific gva->hva translation when querying host page size
Sean Christopherson [Wed, 8 Jan 2020 20:24:37 +0000 (12:24 -0800)]
KVM: Use vcpu-specific gva->hva translation when querying host page size

[ Upstream commit f9b84e19221efc5f493156ee0329df3142085f28 ]

Use kvm_vcpu_gfn_to_hva() when retrieving the host page size so that the
correct set of memslots is used when handling x86 page faults in SMM.

Fixes: 54bf36aac520 ("KVM: x86: use vcpu-specific functions to read/write/translate GFNs")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
5 years agoKVM: nVMX: vmread should not set rflags to specify success in case of #PF
Miaohe Lin [Sat, 28 Dec 2019 06:25:24 +0000 (14:25 +0800)]
KVM: nVMX: vmread should not set rflags to specify success in case of #PF

[ Upstream commit a4d956b9390418623ae5d07933e2679c68b6f83c ]

In case writing to vmread destination operand result in a #PF, vmread
should not call nested_vmx_succeed() to set rflags to specify success.
Similar to as done in VMPTRST (See handle_vmptrst()).

Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
5 years agoKVM: x86: Protect exit_reason from being used in Spectre-v1/L1TF attacks
Marios Pomonis [Wed, 11 Dec 2019 20:47:51 +0000 (12:47 -0800)]
KVM: x86: Protect exit_reason from being used in Spectre-v1/L1TF attacks

[ Upstream commit c926f2f7230b1a29e31914b51db680f8cbf3103f ]

This fixes a Spectre-v1/L1TF vulnerability in vmx_handle_exit().
While exit_reason is set by the hardware and therefore should not be
attacker-influenced, an unknown exit_reason could potentially be used to
perform such an attack.

Fixes: 55d2375e58a6 ("KVM: nVMX: Move nested code to dedicated files")
Signed-off-by: Marios Pomonis <pomonis@google.com>
Signed-off-by: Nick Finco <nifi@google.com>
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Andrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
5 years agoio_uring: prevent potential eventfd recursion on poll
Jens Axboe [Sun, 2 Feb 2020 04:30:11 +0000 (21:30 -0700)]
io_uring: prevent potential eventfd recursion on poll

[ Upstream commit f0b493e6b9a8959356983f57112229e69c2f7b8c ]

If we have nested or circular eventfd wakeups, then we can deadlock if
we run them inline from our poll waitqueue wakeup handler. It's also
possible to have very long chains of notifications, to the extent where
we could risk blowing the stack.

Check the eventfd recursion count before calling eventfd_signal(). If
it's non-zero, then punt the signaling to async context. This is always
safe, as it takes us out-of-line in terms of stack and locking context.

Cc: stable@vger.kernel.org # 5.1+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
5 years agoio_uring: enable option to only trigger eventfd for async completions
Sasha Levin [Sun, 9 Feb 2020 18:30:59 +0000 (13:30 -0500)]
io_uring: enable option to only trigger eventfd for async completions

[ Upstream commit f2842ab5b72d7ee5f7f8385c2d4f32c133f5837b ]

If an application is using eventfd notifications with poll to know when
new SQEs can be issued, it's expecting the following read/writes to
complete inline. And with that, it knows that there are events available,
and don't want spurious wakeups on the eventfd for those requests.

This adds IORING_REGISTER_EVENTFD_ASYNC, which works just like
IORING_REGISTER_EVENTFD, except it only triggers notifications for events
that happen from async completions (IRQ, or io-wq worker completions).
Any completions inline from the submission itself will not trigger
notifications.

Suggested-by: Mark Papadakis <markuspapadakis@icloud.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
5 years agodrm/dp_mst: Remove VCPI while disabling topology mgr
Wayne Lin [Thu, 5 Dec 2019 09:00:43 +0000 (17:00 +0800)]
drm/dp_mst: Remove VCPI while disabling topology mgr

[ Upstream commit 64e62bdf04ab8529f45ed0a85122c703035dec3a ]

[Why]

This patch is trying to address the issue observed when hotplug DP
daisy chain monitors.

e.g.
src-mstb-mstb-sst -> src (unplug) mstb-mstb-sst -> src-mstb-mstb-sst
(plug in again)

Once unplug a DP MST capable device, driver will call
drm_dp_mst_topology_mgr_set_mst() to disable MST. In this function,
it cleans data of topology manager while disabling mst_state. However,
it doesn't clean up the proposed_vcpis of topology manager.
If proposed_vcpi is not reset, once plug in MST daisy chain monitors
later, code will fail at checking port validation while trying to
allocate payloads.

When MST capable device is plugged in again and try to allocate
payloads by calling drm_dp_update_payload_part1(), this
function will iterate over all proposed virtual channels to see if
any proposed VCPI's num_slots is greater than 0. If any proposed
VCPI's num_slots is greater than 0 and the port which the
specific virtual channel directed to is not in the topology, code then
fails at the port validation. Since there are stale VCPI allocations
from the previous topology enablement in proposed_vcpi[], code will fail
at port validation and reurn EINVAL.

[How]

Clean up the data of stale proposed_vcpi[] and reset mgr->proposed_vcpis
to NULL while disabling mst in drm_dp_mst_topology_mgr_set_mst().

Changes since v1:
*Add on more details in commit message to describe the issue which the
patch is trying to fix

Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
[added cc to stable]
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191205090043.7580-1-Wayne.Lin@amd.com
Cc: <stable@vger.kernel.org> # v3.17+
Signed-off-by: Sasha Levin <sashal@kernel.org>
5 years agoperf/cgroups: Install cgroup events to correct cpuctx
Song Liu [Wed, 22 Jan 2020 19:50:27 +0000 (11:50 -0800)]
perf/cgroups: Install cgroup events to correct cpuctx

commit 07c5972951f088094776038006a0592a46d14bbc upstream.

cgroup events are always installed in the cpuctx. However, when it is not
installed via IPI, list_update_cgroup_event() adds it to cpuctx of current
CPU, which triggers list corruption:

  [] list_add double add: new=ffff888ff7cf0db0, prev=ffff888ff7ce82f0, next=ffff888ff7cf0db0.

To reproduce this, we can simply run:

  # perf stat -e cs -a &
  # perf stat -e cs -G anycgroup

Fix this by installing it to cpuctx that contains event->ctx, and the
proper cgrp_cpuctx_list.

Fixes: db0503e4f675 ("perf/core: Optimize perf_install_in_event()")
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200122195027.2112449-1-songliubraving@fb.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoperf/core: Fix mlock accounting in perf_mmap()
Song Liu [Thu, 23 Jan 2020 18:11:46 +0000 (10:11 -0800)]
perf/core: Fix mlock accounting in perf_mmap()

commit 003461559ef7a9bd0239bae35a22ad8924d6e9ad upstream.

Decreasing sysctl_perf_event_mlock between two consecutive perf_mmap()s of
a perf ring buffer may lead to an integer underflow in locked memory
accounting. This may lead to the undesired behaviors, such as failures in
BPF map creation.

Address this by adjusting the accounting logic to take into account the
possibility that the amount of already locked memory may exceed the
current limit.

Fixes: c4b75479741c ("perf/core: Make the mlock accounting simple again")
Suggested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://lkml.kernel.org/r/20200123181146.2238074-1-songliubraving@fb.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoclocksource: Prevent double add_timer_on() for watchdog_timer
Konstantin Khlebnikov [Fri, 31 Jan 2020 16:08:59 +0000 (19:08 +0300)]
clocksource: Prevent double add_timer_on() for watchdog_timer

commit febac332a819f0e764aa4da62757ba21d18c182b upstream.

Kernel crashes inside QEMU/KVM are observed:

  kernel BUG at kernel/time/timer.c:1154!
  BUG_ON(timer_pending(timer) || !timer->function) in add_timer_on().

At the same time another cpu got:

  general protection fault: 0000 [#1] SMP PTI of poinson pointer 0xdead000000000200 in:

  __hlist_del at include/linux/list.h:681
  (inlined by) detach_timer at kernel/time/timer.c:818
  (inlined by) expire_timers at kernel/time/timer.c:1355
  (inlined by) __run_timers at kernel/time/timer.c:1686
  (inlined by) run_timer_softirq at kernel/time/timer.c:1699

Unfortunately kernel logs are badly scrambled, stacktraces are lost.

Printing the timer->function before the BUG_ON() pointed to
clocksource_watchdog().

The execution of clocksource_watchdog() can race with a sequence of
clocksource_stop_watchdog() .. clocksource_start_watchdog():

expire_timers()
 detach_timer(timer, true);
  timer->entry.pprev = NULL;
 raw_spin_unlock_irq(&base->lock);
 call_timer_fn
  clocksource_watchdog()

clocksource_watchdog_kthread() or
clocksource_unbind()

spin_lock_irqsave(&watchdog_lock, flags);
clocksource_stop_watchdog();
 del_timer(&watchdog_timer);
 watchdog_running = 0;
spin_unlock_irqrestore(&watchdog_lock, flags);

spin_lock_irqsave(&watchdog_lock, flags);
clocksource_start_watchdog();
 add_timer_on(&watchdog_timer, ...);
 watchdog_running = 1;
spin_unlock_irqrestore(&watchdog_lock, flags);

  spin_lock(&watchdog_lock);
  add_timer_on(&watchdog_timer, ...);
   BUG_ON(timer_pending(timer) || !timer->function);
    timer_pending() -> true
    BUG()

I.e. inside clocksource_watchdog() watchdog_timer could be already armed.

Check timer_pending() before calling add_timer_on(). This is sufficient as
all operations are synchronized by watchdog_lock.

Fixes: 75c5158f70c0 ("timekeeping: Update clocksource with stop_machine")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/158048693917.4378.13823603769948933793.stgit@buzz
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agox86/apic/msi: Plug non-maskable MSI affinity race
Thomas Gleixner [Fri, 31 Jan 2020 14:26:52 +0000 (15:26 +0100)]
x86/apic/msi: Plug non-maskable MSI affinity race

commit 6f1a4891a5928a5969c87fa5a584844c983ec823 upstream.

Evan tracked down a subtle race between the update of the MSI message and
the device raising an interrupt internally on PCI devices which do not
support MSI masking. The update of the MSI message is non-atomic and
consists of either 2 or 3 sequential 32bit wide writes to the PCI config
space.

   - Write address low 32bits
   - Write address high 32bits (If supported by device)
   - Write data

When an interrupt is migrated then both address and data might change, so
the kernel attempts to mask the MSI interrupt first. But for MSI masking is
optional, so there exist devices which do not provide it. That means that
if the device raises an interrupt internally between the writes then a MSI
message is sent built from half updated state.

On x86 this can lead to spurious interrupts on the wrong interrupt
vector when the affinity setting changes both address and data. As a
consequence the device interrupt can be lost causing the device to
become stuck or malfunctioning.

Evan tried to handle that by disabling MSI accross an MSI message
update. That's not feasible because disabling MSI has issues on its own:

 If MSI is disabled the PCI device is routing an interrupt to the legacy
 INTx mechanism. The INTx delivery can be disabled, but the disablement is
 not working on all devices.

 Some devices lose interrupts when both MSI and INTx delivery are disabled.

Another way to solve this would be to enforce the allocation of the same
vector on all CPUs in the system for this kind of screwed devices. That
could be done, but it would bring back the vector space exhaustion problems
which got solved a few years ago.

Fortunately the high address (if supported by the device) is only relevant
when X2APIC is enabled which implies interrupt remapping. In the interrupt
remapping case the affinity setting is happening at the interrupt remapping
unit and the PCI MSI message is programmed only once when the PCI device is
initialized.

That makes it possible to solve it with a two step update:

  1) Target the MSI msg to the new vector on the current target CPU

  2) Target the MSI msg to the new vector on the new target CPU

In both cases writing the MSI message is only changing a single 32bit word
which prevents the issue of inconsistency.

After writing the final destination it is necessary to check whether the
device issued an interrupt while the intermediate state #1 (new vector,
current CPU) was in effect.

This is possible because the affinity change is always happening on the
current target CPU. The code runs with interrupts disabled, so the
interrupt can be detected by checking the IRR of the local APIC. If the
vector is pending in the IRR then the interrupt is retriggered on the new
target CPU by sending an IPI for the associated vector on the target CPU.

This can cause spurious interrupts on both the local and the new target
CPU.

 1) If the new vector is not in use on the local CPU and the device
    affected by the affinity change raised an interrupt during the
    transitional state (step #1 above) then interrupt entry code will
    ignore that spurious interrupt. The vector is marked so that the
    'No irq handler for vector' warning is supressed once.

 2) If the new vector is in use already on the local CPU then the IRR check
    might see an pending interrupt from the device which is using this
    vector. The IPI to the new target CPU will then invoke the handler of
    the device, which got the affinity change, even if that device did not
    issue an interrupt

 3) If the new vector is in use already on the local CPU and the device
    affected by the affinity change raised an interrupt during the
    transitional state (step #1 above) then the handler of the device which
    uses that vector on the local CPU will be invoked.

expose issues in device driver interrupt handlers which are not prepared to
handle a spurious interrupt correctly. This not a regression, it's just
exposing something which was already broken as spurious interrupts can
happen for a lot of reasons and all driver handlers need to be able to deal
with them.

Reported-by: Evan Green <evgreen@chromium.org>
Debugged-by: Evan Green <evgreen@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Evan Green <evgreen@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87imkr4s7n.fsf@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agocifs: fix mode bits from dir listing when mounted with modefromsid
Aurelien Aptel [Thu, 6 Feb 2020 17:16:55 +0000 (18:16 +0100)]
cifs: fix mode bits from dir listing when mounted with modefromsid

commit e3e056c35108661e418c803adfc054bf683426e7 upstream.

When mounting with -o modefromsid, the mode bits are stored in an
ACE. Directory enumeration (e.g. ls -l /mnt) triggers an SMB Query Dir
which does not include ACEs in its response. The mode bits in this
case are silently set to a default value of 755 instead.

This patch marks the dentry created during the directory enumeration
as needing re-evaluation (i.e. additional Query Info with ACEs) so
that the mode bits can be properly extracted.

Quick repro:

$ mount.cifs //win19.test/data /mnt -o ...,modefromsid
$ touch /mnt/foo && chmod 751 /mnt/foo
$ stat /mnt/foo
  # reports 751 (OK)
$ sleep 2
  # dentry older than 1s by default get invalidated
$ ls -l /mnt
  # since dentry invalid, ls does a Query Dir
  # and reports foo as 755 (WRONG)

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agocifs: fail i/o on soft mounts if sessionsetup errors out
Ronnie Sahlberg [Wed, 5 Feb 2020 01:08:01 +0000 (11:08 +1000)]
cifs: fail i/o on soft mounts if sessionsetup errors out

commit b0dd940e582b6a60296b9847a54012a4b080dc72 upstream.

RHBZ: 1579050

If we have a soft mount we should fail commands for session-setup
failures (such as the password having changed/ account being deleted/ ...)
and return an error back to the application.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet/mlx5e: TX, Error completion is for last WQE in batch
Tariq Toukan [Thu, 9 Jan 2020 13:53:37 +0000 (15:53 +0200)]
net/mlx5e: TX, Error completion is for last WQE in batch

[ Upstream commit b57e66ad42d051ed31319c28ed1b62b191299a29 ]

For a cyclic work queue, when not requesting a completion per WQE,
a single CQE might indicate the completion of several WQEs.
However, in case some WQE in the batch causes an error, then an error
completion is issued, breaking the batch, and pointing to the offending
WQE in the wqe_counter field.

Hence, WQE-specific error CQE handling (like printing, breaking, etc...)
should be performed only for the last WQE in batch.

Fixes: 130c7b46c93d ("net/mlx5e: TX, Dump WQs wqe descriptors on CQE with error events")
Fixes: fd9b4be8002c ("net/mlx5e: RX, Support multiple outstanding UMR posts")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agor8169: fix performance regression related to PCIe max read request size
Heiner Kallweit [Wed, 5 Feb 2020 20:22:46 +0000 (21:22 +0100)]
r8169: fix performance regression related to PCIe max read request size

[ Upstream commit 21b5f672fb2eb1366dedc4ac9d32431146b378d3 ]

It turned out that on low performance systems the original change can
cause lower tx performance. On a N3450-based mini-PC tx performance
in iperf3 was reduced from 950Mbps to ~900Mbps. Therefore effectively
revert the original change, just use pcie_set_readrq() now instead of
changing the PCIe capability register directly.

Fixes: 2df49d365498 ("r8169: remove fiddling with the PCIe max read request size")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet/mlx5: Deprecate usage of generic TLS HW capability bit
Tariq Toukan [Mon, 27 Jan 2020 12:18:14 +0000 (14:18 +0200)]
net/mlx5: Deprecate usage of generic TLS HW capability bit

[ Upstream commit 61c00cca41aeeaa8e5263c2f81f28534bc1efafb ]

Deprecate the generic TLS cap bit, use the new TX-specific
TLS cap bit instead.

Fixes: a12ff35e0fb7 ("net/mlx5: Introduce TLS TX offload hardware bits and structures")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet/mlx5: Fix deadlock in fs_core
Maor Gottlieb [Mon, 27 Jan 2020 07:27:51 +0000 (09:27 +0200)]
net/mlx5: Fix deadlock in fs_core

[ Upstream commit c1948390d78b5183ee9b7dd831efd7f6ac496ab0 ]

free_match_list could be called when the flow table is already
locked. We need to pass this notation to tree_put_node.

It fixes the following lockdep warnning:

[ 1797.268537] ============================================
[ 1797.276837] WARNING: possible recursive locking detected
[ 1797.285101] 5.5.0-rc5+ #10 Not tainted
[ 1797.291641] --------------------------------------------
[ 1797.299917] handler10/9296 is trying to acquire lock:
[ 1797.307885] ffff889ad399a0a0 (&node->lock){++++}, at:
tree_put_node+0x1d5/0x210 [mlx5_core]
[ 1797.319694]
[ 1797.319694] but task is already holding lock:
[ 1797.330904] ffff889ad399a0a0 (&node->lock){++++}, at:
nested_down_write_ref_node.part.33+0x1a/0x60 [mlx5_core]
[ 1797.344707]
[ 1797.344707] other info that might help us debug this:
[ 1797.356952]  Possible unsafe locking scenario:
[ 1797.356952]
[ 1797.368333]        CPU0
[ 1797.373357]        ----
[ 1797.378364]   lock(&node->lock);
[ 1797.384222]   lock(&node->lock);
[ 1797.390031]
[ 1797.390031]  *** DEADLOCK ***
[ 1797.390031]
[ 1797.403003]  May be due to missing lock nesting notation
[ 1797.403003]
[ 1797.414691] 3 locks held by handler10/9296:
[ 1797.421465]  #0: ffff889cf2c5a110 (&block->cb_lock){++++}, at:
tc_setup_cb_add+0x70/0x250
[ 1797.432810]  #1: ffff88a030081490 (&comp->sem){++++}, at:
mlx5_devcom_get_peer_data+0x4c/0xb0 [mlx5_core]
[ 1797.445829]  #2: ffff889ad399a0a0 (&node->lock){++++}, at:
nested_down_write_ref_node.part.33+0x1a/0x60 [mlx5_core]
[ 1797.459913]
[ 1797.459913] stack backtrace:
[ 1797.469436] CPU: 1 PID: 9296 Comm: handler10 Kdump: loaded Not
tainted 5.5.0-rc5+ #10
[ 1797.480643] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS
2.4.3 01/17/2017
[ 1797.491480] Call Trace:
[ 1797.496701]  dump_stack+0x96/0xe0
[ 1797.502864]  __lock_acquire.cold.63+0xf8/0x212
[ 1797.510301]  ? lockdep_hardirqs_on+0x250/0x250
[ 1797.517701]  ? mark_held_locks+0x55/0xa0
[ 1797.524547]  ? quarantine_put+0xb7/0x160
[ 1797.531422]  ? lockdep_hardirqs_on+0x17d/0x250
[ 1797.538913]  lock_acquire+0xd6/0x1f0
[ 1797.545529]  ? tree_put_node+0x1d5/0x210 [mlx5_core]
[ 1797.553701]  down_write+0x94/0x140
[ 1797.560206]  ? tree_put_node+0x1d5/0x210 [mlx5_core]
[ 1797.568464]  ? down_write_killable_nested+0x170/0x170
[ 1797.576925]  ? del_hw_flow_group+0xde/0x1f0 [mlx5_core]
[ 1797.585629]  tree_put_node+0x1d5/0x210 [mlx5_core]
[ 1797.593891]  ? free_match_list.part.25+0x147/0x170 [mlx5_core]
[ 1797.603389]  free_match_list.part.25+0xe0/0x170 [mlx5_core]
[ 1797.612654]  _mlx5_add_flow_rules+0x17e2/0x20b0 [mlx5_core]
[ 1797.621838]  ? lock_acquire+0xd6/0x1f0
[ 1797.629028]  ? esw_get_prio_table+0xb0/0x3e0 [mlx5_core]
[ 1797.637981]  ? alloc_insert_flow_group+0x420/0x420 [mlx5_core]
[ 1797.647459]  ? try_to_wake_up+0x4c7/0xc70
[ 1797.654881]  ? lock_downgrade+0x350/0x350
[ 1797.662271]  ? __mutex_unlock_slowpath+0xb1/0x3f0
[ 1797.670396]  ? find_held_lock+0xac/0xd0
[ 1797.677540]  ? mlx5_add_flow_rules+0xdc/0x360 [mlx5_core]
[ 1797.686467]  mlx5_add_flow_rules+0xdc/0x360 [mlx5_core]
[ 1797.695134]  ? _mlx5_add_flow_rules+0x20b0/0x20b0 [mlx5_core]
[ 1797.704270]  ? irq_exit+0xa5/0x170
[ 1797.710764]  ? retint_kernel+0x10/0x10
[ 1797.717698]  ? mlx5_eswitch_set_rule_source_port.isra.9+0x122/0x230
[mlx5_core]
[ 1797.728708]  mlx5_eswitch_add_offloaded_rule+0x465/0x6d0 [mlx5_core]
[ 1797.738713]  ? mlx5_eswitch_get_prio_range+0x30/0x30 [mlx5_core]
[ 1797.748384]  ? mlx5_fc_stats_work+0x670/0x670 [mlx5_core]
[ 1797.757400]  mlx5e_tc_offload_fdb_rules.isra.27+0x24/0x90 [mlx5_core]
[ 1797.767665]  mlx5e_tc_add_fdb_flow+0xaf8/0xd40 [mlx5_core]
[ 1797.776886]  ? mlx5e_encap_put+0xd0/0xd0 [mlx5_core]
[ 1797.785562]  ? mlx5e_alloc_flow.isra.43+0x18c/0x1c0 [mlx5_core]
[ 1797.795353]  __mlx5e_add_fdb_flow+0x2e2/0x440 [mlx5_core]
[ 1797.804558]  ? mlx5e_tc_update_neigh_used_value+0x8c0/0x8c0
[mlx5_core]
[ 1797.815093]  ? wait_for_completion+0x260/0x260
[ 1797.823272]  mlx5e_configure_flower+0xe94/0x1620 [mlx5_core]
[ 1797.832792]  ? __mlx5e_add_fdb_flow+0x440/0x440 [mlx5_core]
[ 1797.842096]  ? down_read+0x11a/0x2e0
[ 1797.849090]  ? down_write+0x140/0x140
[ 1797.856142]  ? mlx5e_rep_indr_setup_block_cb+0xc0/0xc0 [mlx5_core]
[ 1797.866027]  tc_setup_cb_add+0x11a/0x250
[ 1797.873339]  fl_hw_replace_filter+0x25e/0x320 [cls_flower]
[ 1797.882385]  ? fl_hw_destroy_filter+0x1c0/0x1c0 [cls_flower]
[ 1797.891607]  fl_change+0x1d54/0x1fb6 [cls_flower]
[ 1797.899772]  ? __rhashtable_insert_fast.constprop.50+0x9f0/0x9f0
[cls_flower]
[ 1797.910728]  ? lock_downgrade+0x350/0x350
[ 1797.918187]  ? __radix_tree_lookup+0xa5/0x130
[ 1797.926046]  ? fl_set_key+0x1590/0x1590 [cls_flower]
[ 1797.934611]  ? __rhashtable_insert_fast.constprop.50+0x9f0/0x9f0
[cls_flower]
[ 1797.945673]  tc_new_tfilter+0xcd1/0x1240
[ 1797.953138]  ? tc_del_tfilter+0xb10/0xb10
[ 1797.960688]  ? avc_has_perm_noaudit+0x92/0x320
[ 1797.968721]  ? avc_has_perm_noaudit+0x1df/0x320
[ 1797.976816]  ? avc_has_extended_perms+0x990/0x990
[ 1797.985090]  ? mark_lock+0xaa/0x9e0
[ 1797.991988]  ? match_held_lock+0x1b/0x240
[ 1797.999457]  ? match_held_lock+0x1b/0x240
[ 1798.006859]  ? find_held_lock+0xac/0xd0
[ 1798.014045]  ? symbol_put_addr+0x40/0x40
[ 1798.021317]  ? rcu_read_lock_sched_held+0xd0/0xd0
[ 1798.029460]  ? tc_del_tfilter+0xb10/0xb10
[ 1798.036810]  rtnetlink_rcv_msg+0x4d5/0x620
[ 1798.044236]  ? rtnl_bridge_getlink+0x460/0x460
[ 1798.052034]  ? lockdep_hardirqs_on+0x250/0x250
[ 1798.059837]  ? match_held_lock+0x1b/0x240
[ 1798.067146]  ? find_held_lock+0xac/0xd0
[ 1798.074246]  netlink_rcv_skb+0xc6/0x1f0
[ 1798.081339]  ? rtnl_bridge_getlink+0x460/0x460
[ 1798.089104]  ? netlink_ack+0x440/0x440
[ 1798.096061]  netlink_unicast+0x2d4/0x3b0
[ 1798.103189]  ? netlink_attachskb+0x3f0/0x3f0
[ 1798.110724]  ? _copy_from_iter_full+0xda/0x370
[ 1798.118415]  netlink_sendmsg+0x3ba/0x6a0
[ 1798.125478]  ? netlink_unicast+0x3b0/0x3b0
[ 1798.132705]  ? netlink_unicast+0x3b0/0x3b0
[ 1798.139880]  sock_sendmsg+0x94/0xa0
[ 1798.146332]  ____sys_sendmsg+0x36c/0x3f0
[ 1798.153251]  ? copy_msghdr_from_user+0x165/0x230
[ 1798.160941]  ? kernel_sendmsg+0x30/0x30
[ 1798.167738]  ___sys_sendmsg+0xeb/0x150
[ 1798.174411]  ? sendmsg_copy_msghdr+0x30/0x30
[ 1798.181649]  ? lock_downgrade+0x350/0x350
[ 1798.188559]  ? rcu_read_lock_sched_held+0xd0/0xd0
[ 1798.196239]  ? __fget+0x21d/0x320
[ 1798.202335]  ? do_dup2+0x2a0/0x2a0
[ 1798.208499]  ? lock_downgrade+0x350/0x350
[ 1798.215366]  ? __fget_light+0xd6/0xf0
[ 1798.221808]  ? syscall_trace_enter+0x369/0x5d0
[ 1798.229112]  __sys_sendmsg+0xd3/0x160
[ 1798.235511]  ? __sys_sendmsg_sock+0x60/0x60
[ 1798.242478]  ? syscall_trace_enter+0x233/0x5d0
[ 1798.249721]  ? syscall_slow_exit_work+0x280/0x280
[ 1798.257211]  ? do_syscall_64+0x1e/0x2e0
[ 1798.263680]  do_syscall_64+0x72/0x2e0
[ 1798.269950]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: bd71b08ec2ee ("net/mlx5: Support multiple updates of steering rules in parallel")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agodrop_monitor: Do not cancel uninitialized work item
Ido Schimmel [Fri, 7 Feb 2020 17:29:28 +0000 (19:29 +0200)]
drop_monitor: Do not cancel uninitialized work item

[ Upstream commit dfa7f709596be5ca46c070d4f8acbb344322056a ]

Drop monitor uses a work item that takes care of constructing and
sending netlink notifications to user space. In case drop monitor never
started to monitor, then the work item is uninitialized and not
associated with a function.

Therefore, a stop command from user space results in canceling an
uninitialized work item which leads to the following warning [1].

Fix this by not processing a stop command if drop monitor is not
currently monitoring.

[1]
[   31.735402] ------------[ cut here ]------------
[   31.736470] WARNING: CPU: 0 PID: 143 at kernel/workqueue.c:3032 __flush_work+0x89f/0x9f0
...
[   31.738120] CPU: 0 PID: 143 Comm: dwdump Not tainted 5.5.0-custom-09491-g16d4077796b8 #727
[   31.741968] RIP: 0010:__flush_work+0x89f/0x9f0
...
[   31.760526] Call Trace:
[   31.771689]  __cancel_work_timer+0x2a6/0x3b0
[   31.776809]  net_dm_cmd_trace+0x300/0xef0
[   31.777549]  genl_rcv_msg+0x5c6/0xd50
[   31.781005]  netlink_rcv_skb+0x13b/0x3a0
[   31.784114]  genl_rcv+0x29/0x40
[   31.784720]  netlink_unicast+0x49f/0x6a0
[   31.787148]  netlink_sendmsg+0x7cf/0xc80
[   31.790426]  ____sys_sendmsg+0x620/0x770
[   31.793458]  ___sys_sendmsg+0xfd/0x170
[   31.802216]  __sys_sendmsg+0xdf/0x1a0
[   31.806195]  do_syscall_64+0xa0/0x540
[   31.806885]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 8e94c3bc922e ("drop_monitor: Allow user to start monitoring hardware drops")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoqed: Fix timestamping issue for L2 unicast ptp packets.
Sudarsana Reddy Kalluru [Wed, 5 Feb 2020 13:10:55 +0000 (05:10 -0800)]
qed: Fix timestamping issue for L2 unicast ptp packets.

[ Upstream commit 0202d293c2faecba791ba4afc5aec086249c393d ]

commit cedeac9df4b8 ("qed: Add support for Timestamping the unicast
PTP packets.") handles the timestamping of L4 ptp packets only.
This patch adds driver changes to detect/timestamp both L2/L4 unicast
PTP packets.

Fixes: cedeac9df4b8 ("qed: Add support for Timestamping the unicast PTP packets.")
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoipv6/addrconf: fix potential NULL deref in inet6_set_link_af()
Eric Dumazet [Fri, 7 Feb 2020 15:16:37 +0000 (07:16 -0800)]
ipv6/addrconf: fix potential NULL deref in inet6_set_link_af()

[ Upstream commit db3fa271022dacb9f741b96ea4714461a8911bb9 ]

__in6_dev_get(dev) called from inet6_set_link_af() can return NULL.

The needed check has been recently removed, let's add it back.

While do_setlink() does call validate_linkmsg() :
...
err = validate_linkmsg(dev, tb); /* OK at this point */
...

It is possible that the following call happening before the
->set_link_af() removes IPv6 if MTU is less than 1280 :

if (tb[IFLA_MTU]) {
    err = dev_set_mtu_ext(dev, nla_get_u32(tb[IFLA_MTU]), extack);
    if (err < 0)
          goto errout;
    status |= DO_SETLINK_MODIFIED;
}
...

if (tb[IFLA_AF_SPEC]) {
   ...
   err = af_ops->set_link_af(dev, af);
      ->inet6_set_link_af() // CRASH because idev is NULL

Please note that IPv4 is immune to the bug since inet_set_link_af() does :

struct in_device *in_dev = __in_dev_get_rcu(dev);
if (!in_dev)
    return -EAFNOSUPPORT;

This problem has been mentioned in commit cf7afbfeb8ce ("rtnl: make
link af-specific updates atomic") changelog :

    This method is not fail proof, while it is currently sufficient
    to make set_link_af() inerrable and thus 100% atomic, the
    validation function method will not be able to detect all error
    scenarios in the future, there will likely always be errors
    depending on states which are f.e. not protected by rtnl_mutex
    and thus may change between validation and setting.

IPv6: ADDRCONF(NETDEV_CHANGE): lo: link becomes ready
general protection fault, probably for non-canonical address 0xdffffc0000000056: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000002b0-0x00000000000002b7]
CPU: 0 PID: 9698 Comm: syz-executor712 Not tainted 5.5.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:inet6_set_link_af+0x66e/0xae0 net/ipv6/addrconf.c:5733
Code: 38 d0 7f 08 84 c0 0f 85 20 03 00 00 48 8d bb b0 02 00 00 45 0f b6 64 24 04 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 1a 03 00 00 44 89 a3 b0 02 00
RSP: 0018:ffffc90005b06d40 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff86df39a6
RDX: 0000000000000056 RSI: ffffffff86df3e74 RDI: 00000000000002b0
RBP: ffffc90005b06e70 R08: ffff8880a2ac0380 R09: ffffc90005b06db0
R10: fffff52000b60dbe R11: ffffc90005b06df7 R12: 0000000000000000
R13: 0000000000000000 R14: ffff8880a1fcc424 R15: dffffc0000000000
FS:  0000000000c46880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f0494ca0d0 CR3: 000000009e4ac000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 do_setlink+0x2a9f/0x3720 net/core/rtnetlink.c:2754
 rtnl_group_changelink net/core/rtnetlink.c:3103 [inline]
 __rtnl_newlink+0xdd1/0x1790 net/core/rtnetlink.c:3257
 rtnl_newlink+0x69/0xa0 net/core/rtnetlink.c:3377
 rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5438
 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5456
 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
 netlink_unicast+0x59e/0x7e0 net/netlink/af_netlink.c:1328
 netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
 sock_sendmsg_nosec net/socket.c:652 [inline]
 sock_sendmsg+0xd7/0x130 net/socket.c:672
 ____sys_sendmsg+0x753/0x880 net/socket.c:2343
 ___sys_sendmsg+0x100/0x170 net/socket.c:2397
 __sys_sendmsg+0x105/0x1d0 net/socket.c:2430
 __do_sys_sendmsg net/socket.c:2439 [inline]
 __se_sys_sendmsg net/socket.c:2437 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2437
 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4402e9
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fffd62fbcf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004402e9
RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 0000000000000008 R09: 00000000004002c8
R10: 0000000000000005 R11: 0000000000000246 R12: 0000000000401b70
R13: 0000000000401c00 R14: 0000000000000000 R15: 0000000000000000
Modules linked in:
---[ end trace cfa7664b8fdcdff3 ]---
RIP: 0010:inet6_set_link_af+0x66e/0xae0 net/ipv6/addrconf.c:5733
Code: 38 d0 7f 08 84 c0 0f 85 20 03 00 00 48 8d bb b0 02 00 00 45 0f b6 64 24 04 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 1a 03 00 00 44 89 a3 b0 02 00
RSP: 0018:ffffc90005b06d40 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff86df39a6
RDX: 0000000000000056 RSI: ffffffff86df3e74 RDI: 00000000000002b0
RBP: ffffc90005b06e70 R08: ffff8880a2ac0380 R09: ffffc90005b06db0
R10: fffff52000b60dbe R11: ffffc90005b06df7 R12: 0000000000000000
R13: 0000000000000000 R14: ffff8880a1fcc424 R15: dffffc0000000000
FS:  0000000000c46880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000004 CR3: 000000009e4ac000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 7dc2bccab0ee ("Validate required parameters in inet6_validate_link_af")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Bisected-and-reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agotaprio: Fix dropping packets when using taprio + ETF offloading
Vinicius Costa Gomes [Thu, 6 Feb 2020 21:46:10 +0000 (13:46 -0800)]
taprio: Fix dropping packets when using taprio + ETF offloading

[ Upstream commit bfabd41da34180d05382312533a3adc2e012dee0 ]

When using taprio offloading together with ETF offloading, configured
like this, for example:

$ tc qdisc replace dev $IFACE parent root handle 100 taprio \
   num_tc 4 \
        map 2 2 1 0 3 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 1@2 1@3 \
base-time $BASE_TIME \
sched-entry S 01 1000000 \
sched-entry S 0e 1000000 \
flags 0x2

$ tc qdisc replace dev $IFACE parent 100:1 etf \
      offload delta 300000 clockid CLOCK_TAI

During enqueue, it works out that the verification added for the
"txtime" assisted mode is run when using taprio + ETF offloading, the
only thing missing is initializing the 'next_txtime' of all the cycle
entries. (if we don't set 'next_txtime' all packets from SO_TXTIME
sockets are dropped)

Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agotaprio: Use taprio_reset_tc() to reset Traffic Classes configuration
Vinicius Costa Gomes [Thu, 6 Feb 2020 21:46:09 +0000 (13:46 -0800)]
taprio: Use taprio_reset_tc() to reset Traffic Classes configuration

[ Upstream commit 7c16680a08ee1e444a67d232c679ccf5b30fad16 ]

When destroying the current taprio instance, which can happen when the
creation of one fails, we should reset the traffic class configuration
back to the default state.

netdev_reset_tc() is a better way because in addition to setting the
number of traffic classes to zero, it also resets the priority to
traffic classes mapping to the default value.

Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agotaprio: Add missing policy validation for flags
Vinicius Costa Gomes [Thu, 6 Feb 2020 21:46:08 +0000 (13:46 -0800)]
taprio: Add missing policy validation for flags

[ Upstream commit 49c684d79cfdc3032344bf6f3deeea81c4efedbf ]

netlink policy validation for the 'flags' argument was missing.

Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agotaprio: Fix still allowing changing the flags during runtime
Vinicius Costa Gomes [Thu, 6 Feb 2020 21:46:07 +0000 (13:46 -0800)]
taprio: Fix still allowing changing the flags during runtime

[ Upstream commit a9d6227436f32142209f4428f2dc616761485112 ]

Because 'q->flags' starts as zero, and zero is a valid value, we
aren't able to detect the transition from zero to something else
during "runtime".

The solution is to initialize 'q->flags' with an invalid value, so we
can detect if 'q->flags' was set by the user or not.

To better solidify the behavior, 'flags' handling is moved to a
separate function. The behavior is:
 - 'flags' if unspecified by the user, is assumed to be zero;
 - 'flags' cannot change during "runtime" (i.e. a change() request
 cannot modify it);

With this new function we can remove taprio_flags, which should reduce
the risk of future accidents.

Allowing flags to be changed was causing the following RCU stall:

[ 1730.558249] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1730.558258] rcu:    6-...0: (190 ticks this GP) idle=922/0/0x1 softirq=25580/25582 fqs=16250
[ 1730.558264]    (detected by 2, t=65002 jiffies, g=33017, q=81)
[ 1730.558269] Sending NMI from CPU 2 to CPUs 6:
[ 1730.559277] NMI backtrace for cpu 6
[ 1730.559277] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G            E     5.5.0-rc6+ #35
[ 1730.559278] Hardware name: Gigabyte Technology Co., Ltd. Z390 AORUS ULTRA/Z390 AORUS ULTRA-CF, BIOS F7 03/14/2019
[ 1730.559278] RIP: 0010:__hrtimer_run_queues+0xe2/0x440
[ 1730.559278] Code: 48 8b 43 28 4c 89 ff 48 8b 75 c0 48 89 45 c8 e8 f4 bb 7c 00 0f 1f 44 00 00 65 8b 05 40 31 f0 68 89 c0 48 0f a3 05 3e 5c 25 01 <0f> 82 fc 01 00 00 48 8b 45 c8 48 89 df ff d0 89 45 c8 0f 1f 44 00
[ 1730.559279] RSP: 0018:ffff9970802d8f10 EFLAGS: 00000083
[ 1730.559279] RAX: 0000000000000006 RBX: ffff8b31645bff38 RCX: 0000000000000000
[ 1730.559280] RDX: 0000000000000000 RSI: ffffffff9710f2ec RDI: ffffffff978daf0e
[ 1730.559280] RBP: ffff9970802d8f68 R08: 0000000000000000 R09: 0000000000000000
[ 1730.559280] R10: 0000018336d7944e R11: 0000000000000001 R12: ffff8b316e39f9c0
[ 1730.559281] R13: ffff8b316e39f940 R14: ffff8b316e39f998 R15: ffff8b316e39f7c0
[ 1730.559281] FS:  0000000000000000(0000) GS:ffff8b316e380000(0000) knlGS:0000000000000000
[ 1730.559281] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1730.559281] CR2: 00007f1105303760 CR3: 0000000227210005 CR4: 00000000003606e0
[ 1730.559282] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1730.559282] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1730.559282] Call Trace:
[ 1730.559282]  <IRQ>
[ 1730.559283]  ? taprio_dequeue_soft+0x2d0/0x2d0 [sch_taprio]
[ 1730.559283]  hrtimer_interrupt+0x104/0x220
[ 1730.559283]  ? irqtime_account_irq+0x34/0xa0
[ 1730.559283]  smp_apic_timer_interrupt+0x6d/0x230
[ 1730.559284]  apic_timer_interrupt+0xf/0x20
[ 1730.559284]  </IRQ>
[ 1730.559284] RIP: 0010:cpu_idle_poll+0x35/0x1a0
[ 1730.559285] Code: 88 82 ff 65 44 8b 25 12 7d 73 68 0f 1f 44 00 00 e8 90 c3 89 ff fb 65 48 8b 1c 25 c0 7e 01 00 48 8b 03 a8 08 74 0b eb 1c f3 90 <48> 8b 03 a8 08 75 13 8b 05 be a8 a8 00 85 c0 75 ed e8 75 48 84 ff
[ 1730.559285] RSP: 0018:ffff997080137ea8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 1730.559285] RAX: 0000000000000001 RBX: ffff8b316bc3c580 RCX: 0000000000000000
[ 1730.559286] RDX: 0000000000000001 RSI: 000000002819aad9 RDI: ffffffff978da730
[ 1730.559286] RBP: ffff997080137ec0 R08: 0000018324a6d387 R09: 0000000000000000
[ 1730.559286] R10: 0000000000000400 R11: 0000000000000001 R12: 0000000000000006
[ 1730.559286] R13: ffff8b316bc3c580 R14: 0000000000000000 R15: 0000000000000000
[ 1730.559287]  ? cpu_idle_poll+0x20/0x1a0
[ 1730.559287]  ? cpu_idle_poll+0x20/0x1a0
[ 1730.559287]  do_idle+0x4d/0x1f0
[ 1730.559287]  ? complete+0x44/0x50
[ 1730.559288]  cpu_startup_entry+0x1b/0x20
[ 1730.559288]  start_secondary+0x142/0x180
[ 1730.559288]  secondary_startup_64+0xb6/0xc0
[ 1776.686313] nvme nvme0: I/O 96 QID 1 timeout, completion polled

Fixes: 4cfd5779bd6e ("taprio: Add support for txtime-assist mode")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agotaprio: Fix enabling offload with wrong number of traffic classes
Vinicius Costa Gomes [Thu, 6 Feb 2020 21:46:06 +0000 (13:46 -0800)]
taprio: Fix enabling offload with wrong number of traffic classes

[ Upstream commit 5652e63df3303c2a702bac25fbf710b9cb64dfba ]

If the driver implementing taprio offloading depends on the value of
the network device number of traffic classes (dev->num_tc) for
whatever reason, it was going to receive the value zero. The value was
only set after the offloading function is called.

So, moving setting the number of traffic classes to before the
offloading function is called fixes this issue. This is safe because
this only happens when taprio is instantiated (we don't allow this
configuration to be changed without first removing taprio).

Fixes: 9c66d1564676 ("taprio: Add support for hardware offloading")
Reported-by: Po Liu <po.liu@nxp.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: stmmac: update pci platform data to use phy_interface
Voon Weifeng [Fri, 7 Feb 2020 07:34:28 +0000 (15:34 +0800)]
net: stmmac: update pci platform data to use phy_interface

[ Upstream commit 909c1dde67c433f1e4122f2619cbd8ac370fcf0a ]

The recent patch to support passive mode converter did not take care the
phy interface configuration in PCI platform data. Hence, converting all
the PCI platform data from plat->interface to plat->phy_interface as the
default mode is meant for PHY.

Fixes: 0060c8783330 ("net: stmmac: implement support for passive mode converters via dt")
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Tested-by: Tan, Tee Min <tee.min.tan@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: stmmac: xgmac: fix missing IFF_MULTICAST checki in dwxgmac2_set_filter
Tan, Tee Min [Fri, 7 Feb 2020 07:34:15 +0000 (15:34 +0800)]
net: stmmac: xgmac: fix missing IFF_MULTICAST checki in dwxgmac2_set_filter

[ Upstream commit 2f633d5820e4ed870f408957322acb9263bce2f4 ]

Without checking for IFF_MULTICAST flag, it is wrong to assume multicast
filtering is always enabled. By checking against IFF_MULTICAST, now
the driver behaves correctly when the multicast support is toggled by below
command:-
  ip link set <devname> multicast off|on

Fixes: 0efedbf11f07a ("net: stmmac: xgmac: Fix XGMAC selftests")
Signed-off-by: Tan, Tee Min <tee.min.tan@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: stmmac: fix missing IFF_MULTICAST check in dwmac4_set_filter
Verma, Aashish [Fri, 7 Feb 2020 07:33:54 +0000 (15:33 +0800)]
net: stmmac: fix missing IFF_MULTICAST check in dwmac4_set_filter

[ Upstream commit 2ba31cd93784b61813226d259fd94a221ecd9d61 ]

Without checking for IFF_MULTICAST flag, it is wrong to assume multicast
filtering is always enabled. By checking against IFF_MULTICAST, now
the driver behaves correctly when the multicast support is toggled by below
command:-
  ip link set <devname> multicast off|on

Fixes: 477286b53f55 ("stmmac: add GMAC4 core support")
Signed-off-by: Verma, Aashish <aashishx.verma@intel.com>
Tested-by: Tan, Tee Min <tee.min.tan@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: stmmac: xgmac: fix incorrect XGMAC_VLAN_TAG register writting
Ong Boon Leong [Fri, 7 Feb 2020 07:33:40 +0000 (15:33 +0800)]
net: stmmac: xgmac: fix incorrect XGMAC_VLAN_TAG register writting

[ Upstream commit 907a076881f171254219faad05f46ac5baabedfb ]

We should always do a read of current value of XGMAC_VLAN_TAG instead of
directly overwriting the register value.

Fixes: 3cd1cfcba26e2 ("net: stmmac: Implement VLAN Hash Filtering in XGMAC")
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: stmmac: fix incorrect GMAC_VLAN_TAG register writting in GMAC4+
Tan, Tee Min [Fri, 7 Feb 2020 07:33:20 +0000 (15:33 +0800)]
net: stmmac: fix incorrect GMAC_VLAN_TAG register writting in GMAC4+

[ Upstream commit 9eeeb3c9de4e3aeaa2bec097162f09305dd9f4c3 ]

It should always do a read of current value of GMAC_VLAN_TAG instead of
directly overwriting the register value.

Fixes: c1be0022df0d ("net: stmmac: Add VLAN HASH filtering support in GMAC4+")
Signed-off-by: Tan, Tee Min <tee.min.tan@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: macb: Limit maximum GEM TX length in TSO
Harini Katakam [Wed, 5 Feb 2020 12:38:12 +0000 (18:08 +0530)]
net: macb: Limit maximum GEM TX length in TSO

[ Upstream commit f822e9c4ffa511a5c681cf866287d9383a3b6f1b ]

GEM_MAX_TX_LEN currently resolves to 0x3FF8 for any IP version supporting
TSO with full 14bits of length field in payload descriptor. But an IP
errata causes false amba_error (bit 6 of ISR) when length in payload
descriptors is specified above 16387. The error occurs because the DMA
falsely concludes that there is not enough space in SRAM for incoming
payload. These errors were observed continuously under stress of large
packets using iperf on a version where SRAM was 16K for each queue. This
errata will be documented shortly and affects all versions since TSO
functionality was added. Hence limit the max length to 0x3FC0 (rounded).

Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: macb: Remove unnecessary alignment check for TSO
Harini Katakam [Wed, 5 Feb 2020 12:38:11 +0000 (18:08 +0530)]
net: macb: Remove unnecessary alignment check for TSO

[ Upstream commit 41c1ef978c8d0259c6636e6d2d854777e92650eb ]

The IP TSO implementation does NOT require the length to be a
multiple of 8. That is only a requirement for UFO as per IP
documentation. Hence, exit macb_features_check function in the
beginning if the protocol is not UDP. Only when it is UDP,
proceed further to the alignment checks. Update comments to
reflect the same. Also remove dead code checking for protocol
TCP when calculating header length.

Fixes: 1629dd4f763c ("cadence: Add LSO support.")
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet/mlx5: IPsec, fix memory leak at mlx5_fpga_ipsec_delete_sa_ctx
Raed Salem [Wed, 23 Oct 2019 13:41:21 +0000 (16:41 +0300)]
net/mlx5: IPsec, fix memory leak at mlx5_fpga_ipsec_delete_sa_ctx

[ Upstream commit 08db2cf577487f5123aebcc2f913e0b8a2c14b43 ]

SA context is allocated at mlx5_fpga_ipsec_create_sa_ctx,
however the counterpart mlx5_fpga_ipsec_delete_sa_ctx function
nullifies sa_ctx pointer without freeing the memory allocated,
hence the memory leak.

Fix by free SA context when the SA is released.

Fixes: d6c4f0298cec ("net/mlx5: Refactor accel IPSec code")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet/mlx5: IPsec, Fix esp modify function attribute
Raed Salem [Tue, 24 Dec 2019 07:54:45 +0000 (09:54 +0200)]
net/mlx5: IPsec, Fix esp modify function attribute

[ Upstream commit 0dc2c534f17c05bed0622b37a744bc38b48ca88a ]

The function mlx5_fpga_esp_validate_xfrm_attrs is wrongly used
with negative negation as zero value indicates success but it
used as failure return value instead.

Fix by remove the unary not negation operator.

Fixes: 05564d0ae075 ("net/mlx5: Add flow-steering commands for FPGA IPSec implementation")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: systemport: Avoid RBUF stuck in Wake-on-LAN mode
Florian Fainelli [Wed, 5 Feb 2020 20:32:04 +0000 (12:32 -0800)]
net: systemport: Avoid RBUF stuck in Wake-on-LAN mode

[ Upstream commit 263a425a482fc495d6d3f9a29b9103a664c38b69 ]

After a number of suspend and resume cycles, it is possible for the RBUF
to be stuck in Wake-on-LAN mode, despite the MPD enable bit being
cleared which instructed the RBUF to exit that mode.

Avoid creating that problematic condition by clearing the RX_EN and
TX_EN bits in the UniMAC prior to disable the Magic Packet Detector
logic which is guaranteed to make the RBUF exit Wake-on-LAN mode.

Fixes: 83e82f4c706b ("net: systemport: add Wake-on-LAN support")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: stmmac: fix a possible endless loop
Dejin Zheng [Thu, 6 Feb 2020 15:29:17 +0000 (23:29 +0800)]
net: stmmac: fix a possible endless loop

[ Upstream commit 7d10f0774f9e32aa2f2e012f7fcb312a2ce422b9 ]

It forgot to reduce the value of the variable retry in a while loop
in the ethqos_configure() function. It may cause an endless loop and
without timeout.

Fixes: a7c30e62d4b8 ("net: stmmac: Add driver for Qualcomm ethqos")
Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet_sched: fix a resource leak in tcindex_set_parms()
Cong Wang [Tue, 4 Feb 2020 19:10:12 +0000 (11:10 -0800)]
net_sched: fix a resource leak in tcindex_set_parms()

[ Upstream commit 52b5ae501c045010aeeb1d5ac0373ff161a88291 ]

Jakub noticed there is a potential resource leak in
tcindex_set_parms(): when tcindex_filter_result_init() fails
and it jumps to 'errout1' which doesn't release the memory
and resources allocated by tcindex_alloc_perfect_hash().

We should just jump to 'errout_alloc' which calls
tcindex_free_perfect_hash().

Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: mvneta: move rx_dropped and rx_errors in per-cpu stats
Lorenzo Bianconi [Thu, 6 Feb 2020 09:14:39 +0000 (10:14 +0100)]
net: mvneta: move rx_dropped and rx_errors in per-cpu stats

[ Upstream commit c35947b8ff8acca33134ee39c31708233765c31a ]

Move rx_dropped and rx_errors counters in mvneta_pcpu_stats in order to
avoid possible races updating statistics

Fixes: 562e2f467e71 ("net: mvneta: Improve the buffer allocation method for SWBM")
Fixes: dc35a10f68d3 ("net: mvneta: bm: add support for hardware buffer management")
Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: dsa: microchip: enable module autoprobe
Razvan Stefanescu [Fri, 7 Feb 2020 15:44:04 +0000 (17:44 +0200)]
net: dsa: microchip: enable module autoprobe

[ Upstream commit f8c2afa66d5397b0b9293c4347dac6dabb327685 ]

This matches /sys/devices/.../spi1.0/modalias content.

Fixes: 9b2d9f05cddf ("net: dsa: microchip: add ksz9567 to ksz9477 driver")
Fixes: d9033ae95cf4 ("net: dsa: microchip: add KSZ8563 compatibility string")
Fixes: 8c29bebb1f8a ("net: dsa: microchip: add KSZ9893 switch support")
Fixes: 45316818371d ("net: dsa: add support for ksz9897 ethernet switch")
Fixes: b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477")
Signed-off-by: Razvan Stefanescu <razvan.stefanescu@microchip.com>
Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: dsa: bcm_sf2: Only 7278 supports 2Gb/sec IMP port
Florian Fainelli [Thu, 6 Feb 2020 19:23:52 +0000 (11:23 -0800)]
net: dsa: bcm_sf2: Only 7278 supports 2Gb/sec IMP port

[ Upstream commit de34d7084edd069dac5aa010cfe32bd8c4619fa6 ]

The 7445 switch clocking profiles do not allow us to run the IMP port at
2Gb/sec in a way that it is reliable and consistent. Make sure that the
setting is only applied to the 7278 family.

Fixes: 8f1880cbe8d0 ("net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agonet: dsa: b53: Always use dev->vlan_enabled in b53_configure_vlan()
Florian Fainelli [Thu, 6 Feb 2020 19:07:45 +0000 (11:07 -0800)]
net: dsa: b53: Always use dev->vlan_enabled in b53_configure_vlan()

[ Upstream commit df373702bc0f8f2d83980ea441e71639fc1efcf8 ]

b53_configure_vlan() is called by the bcm_sf2 driver upon setup and
indirectly through resume as well. During the initial setup, we are
guaranteed that dev->vlan_enabled is false, so there is no change in
behavior, however during suspend, we may have enabled VLANs before, so we
do want to restore that setting.

Fixes: dad8d7c6452b ("net: dsa: b53: Properly account for VLAN filtering")
Fixes: 967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agodpaa_eth: support all modes with rate adapting PHYs
Madalin Bucur [Tue, 4 Feb 2020 10:08:58 +0000 (12:08 +0200)]
dpaa_eth: support all modes with rate adapting PHYs

[ Upstream commit 73a21fa817f0cc8022dc6226250a86bca727a56d ]

Stop removing modes that are not supported on the system interface
when the connected PHY is capable of rate adaptation. This addresses
an issue with the LS1046ARDB board 10G interface no longer working
with an 1G link partner after autonegotiation support was added
for the Aquantia PHY on board in

commit 09c4c57f7bc4 ("net: phy: aquantia: add support for auto-negotiation configuration")

Before this commit the values advertised by the PHY were not
influenced by the dpaa_eth driver removal of system-side unsupported
modes as the aqr_config_aneg() was basically a no-op. After this
commit, the modes removed by the dpaa_eth driver were no longer
advertised thus autonegotiation with 1G link partners failed.

Reported-by: Mian Yousaf Kaukab <ykaukab@suse.de>
Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agodevlink: report 0 after hitting end in region read
Jacob Keller [Tue, 4 Feb 2020 23:59:50 +0000 (15:59 -0800)]
devlink: report 0 after hitting end in region read

[ Upstream commit d5b90e99e1d51b7b5d2b74fbc4c2db236a510913 ]

commit fdd41ec21e15 ("devlink: Return right error code in case of errors
for region read") modified the region read code to report errors
properly in unexpected cases.

In the case where the start_offset and ret_offset match, it unilaterally
converted this into an error. This causes an issue for the "dump"
version of the command. In this case, the devlink region dump will
always report an invalid argument:

000000000000ffd0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
000000000000ffe0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
devlink answers: Invalid argument
000000000000fff0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

This occurs because the expected flow for the dump is to return 0 after
there is no further data.

The simplest fix would be to stop converting the error code to -EINVAL
if start_offset == ret_offset. However, avoid unnecessary work by
checking for when start_offset is larger than the region size and
returning 0 upfront.

Fixes: fdd41ec21e15 ("devlink: Return right error code in case of errors for region read")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobonding/alb: properly access headers in bond_alb_xmit()
Eric Dumazet [Wed, 5 Feb 2020 03:26:05 +0000 (19:26 -0800)]
bonding/alb: properly access headers in bond_alb_xmit()

[ Upstream commit 38f88c45404293bbc027b956def6c10cbd45c616 ]

syzbot managed to send an IPX packet through bond_alb_xmit()
and af_packet and triggered a use-after-free.

First, bond_alb_xmit() was using ipx_hdr() helper to reach
the IPX header, but ipx_hdr() was using the transport offset
instead of the network offset. In the particular syzbot
report transport offset was 0xFFFF

This patch removes ipx_hdr() since it was only (mis)used from bonding.

Then we need to make sure IPv4/IPv6/IPX headers are pulled
in skb->head before dereferencing anything.

BUG: KASAN: use-after-free in bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452
Read of size 2 at addr ffff8801ce56dfff by task syz-executor.2/18108
 (if (ipx_hdr(skb)->ipx_checksum != IPX_NO_CHECKSUM) ...)

Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 [<ffffffff8441fc42>] __dump_stack lib/dump_stack.c:17 [inline]
 [<ffffffff8441fc42>] dump_stack+0x14d/0x20b lib/dump_stack.c:53
 [<ffffffff81a7dec4>] print_address_description+0x6f/0x20b mm/kasan/report.c:282
 [<ffffffff81a7e0ec>] kasan_report_error mm/kasan/report.c:380 [inline]
 [<ffffffff81a7e0ec>] kasan_report mm/kasan/report.c:438 [inline]
 [<ffffffff81a7e0ec>] kasan_report.cold+0x8c/0x2a0 mm/kasan/report.c:422
 [<ffffffff81a7dc4f>] __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:469
 [<ffffffff82c8c00a>] bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452
 [<ffffffff82c60c74>] __bond_start_xmit drivers/net/bonding/bond_main.c:4199 [inline]
 [<ffffffff82c60c74>] bond_start_xmit+0x4f4/0x1570 drivers/net/bonding/bond_main.c:4224
 [<ffffffff83baa558>] __netdev_start_xmit include/linux/netdevice.h:4525 [inline]
 [<ffffffff83baa558>] netdev_start_xmit include/linux/netdevice.h:4539 [inline]
 [<ffffffff83baa558>] xmit_one net/core/dev.c:3611 [inline]
 [<ffffffff83baa558>] dev_hard_start_xmit+0x168/0x910 net/core/dev.c:3627
 [<ffffffff83bacf35>] __dev_queue_xmit+0x1f55/0x33b0 net/core/dev.c:4238
 [<ffffffff83bae3a8>] dev_queue_xmit+0x18/0x20 net/core/dev.c:4278
 [<ffffffff84339189>] packet_snd net/packet/af_packet.c:3226 [inline]
 [<ffffffff84339189>] packet_sendmsg+0x4919/0x70b0 net/packet/af_packet.c:3252
 [<ffffffff83b1ac0c>] sock_sendmsg_nosec net/socket.c:673 [inline]
 [<ffffffff83b1ac0c>] sock_sendmsg+0x12c/0x160 net/socket.c:684
 [<ffffffff83b1f5a2>] __sys_sendto+0x262/0x380 net/socket.c:1996
 [<ffffffff83b1f700>] SYSC_sendto net/socket.c:2008 [inline]
 [<ffffffff83b1f700>] SyS_sendto+0x40/0x60 net/socket.c:2004

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agox86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode
Thomas Gleixner [Thu, 23 Jan 2020 11:54:53 +0000 (12:54 +0100)]
x86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode

commit 979923871f69a4dc926658f9f9a1a4c1bde57552 upstream.

Tony reported a boot regression caused by the recent workaround for systems
which have a disabled (clock gate off) PIT.

On his machine the kernel fails to initialize the PIT because
apic_needs_pit() does not take into account whether the local APIC
interrupt delivery mode will actually allow to setup and use the local
APIC timer. This should be easy to reproduce with acpi=off on the
command line which also disables HPET.

Due to the way the PIT/HPET and APIC setup ordering works (APIC setup can
require working PIT/HPET) the information is not available at the point
where apic_needs_pit() makes this decision.

To address this, split out the interrupt mode selection from
apic_intr_mode_init(), invoke the selection before making the decision
whether PIT is required or not, and add the missing checks into
apic_needs_pit().

Fixes: c8c4076723da ("x86/timer: Skip PIT initialization on modern chipsets")
Reported-by: Anthony Buckley <tony.buckley000@gmail.com>
Tested-by: Anthony Buckley <tony.buckley000@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Daniel Drake <drake@endlessm.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206125
Link: https://lore.kernel.org/r/87sgk6tmk2.fsf@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agolibbpf: Extract and generalize CPU mask parsing logic
Andrii Nakryiko [Thu, 12 Dec 2019 01:35:48 +0000 (17:35 -0800)]
libbpf: Extract and generalize CPU mask parsing logic

commit 6803ee25f0ead1e836808acb14396bb9a9849113 upstream.

This logic is re-used for parsing a set of online CPUs. Having it as an
isolated piece of code working with input string makes it conveninent to test
this logic as well. While refactoring, also improve the robustness of original
implementation.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191212013548.1690564-1-andriin@fb.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agobpf: Fix trampoline usage in preempt
Alexei Starovoitov [Tue, 21 Jan 2020 03:22:31 +0000 (19:22 -0800)]
bpf: Fix trampoline usage in preempt

commit 05d57f1793fb250c85028c9952c3720010baa853 upstream.

Though the second half of trampoline page is unused a task could be
preempted in the middle of the first half of trampoline and two
updates to trampoline would change the code from underneath the
preempted task. Hence wait for tasks to voluntarily schedule or go
to userspace. Add similar wait before freeing the trampoline.

Fixes: fec56f5890d9 ("bpf: Introduce BPF trampoline")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/bpf/20200121032231.3292185-1-ast@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agomfd: ab8500: Fix ab8500-clk typo
Linus Walleij [Wed, 11 Dec 2019 11:46:39 +0000 (12:46 +0100)]
mfd: ab8500: Fix ab8500-clk typo

commit 702204c22c53f94b2e25a3bdcda759cce7b26305 upstream.

Commit f4d41ad84433 ("mfd: ab8500: Example using new OF_MFD_CELL MACRO")
has a typo error renaming "ab8500-clk" to "abx500-clk"
with the result att ALSA SoC audio broke as the clock
driver was not probing anymore. Fixed it up.

Fixes: f4d41ad84433 ("mfd: ab8500: Example using new OF_MFD_CELL MACRO")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agomfd: bd70528: Fix hour register mask
Matti Vaittinen [Mon, 20 Jan 2020 13:45:11 +0000 (15:45 +0200)]
mfd: bd70528: Fix hour register mask

commit 6c883472e1c11cb05561b6dd0c28bb037c2bf2de upstream.

When RTC is used in 24H mode (and it is by this driver) the maximum
hour value is 24 in BCD. This occupies bits [5:0] - which means
correct mask for HOUR register is 0x3f not 0x1f. Fix the mask

Fixes: 32a4a4ebf768 ("rtc: bd70528: Initial support for ROHM bd70528 RTC")
Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agomfd: rn5t618: Mark ADC control register volatile
Andreas Kemnade [Fri, 17 Jan 2020 21:59:22 +0000 (22:59 +0100)]
mfd: rn5t618: Mark ADC control register volatile

commit 2f3dc25c0118de03a00ddc88b61f7216854f534d upstream.

There is a bit which gets cleared after conversion.

Fixes: 9bb9e29c78f8 ("mfd: Add Ricoh RN5T618 PMIC core driver")
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agomfd: da9062: Fix watchdog compatible string
Marco Felsch [Wed, 8 Jan 2020 09:57:02 +0000 (10:57 +0100)]
mfd: da9062: Fix watchdog compatible string

commit 1112ba02ff1190ca9c15a912f9269e54b46d2d82 upstream.

The watchdog driver compatible is "dlg,da9062-watchdog" and not
"dlg,da9062-wdt". Therefore the mfd-core can't populate the of_node and
fwnode. As result the watchdog driver can't parse the devicetree.

Fixes: 9b40b030c4ad ("mfd: da9062: Supply core driver")
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoASoC: Intel: skl_hda_dsp_common: Fix global-out-of-bounds bug
Cezary Rojewski [Wed, 22 Jan 2020 18:12:54 +0000 (19:12 +0100)]
ASoC: Intel: skl_hda_dsp_common: Fix global-out-of-bounds bug

commit 15adb20f64c302b31e10ad50f22bb224052ce1df upstream.

Definitions for idisp snd_soc_dai_links within skl_hda_dsp_common are
missing platform component. Add it to address following bug reported by
KASAN:

[   10.538502] BUG: KASAN: global-out-of-bounds in skl_hda_audio_probe+0x13a/0x2b0 [snd_soc_skl_hda_dsp]
[   10.538509] Write of size 8 at addr ffffffffc0606840 by task systemd-udevd/299
(...)
[   10.538519] Call Trace:
[   10.538524]  dump_stack+0x62/0x95
[   10.538528]  print_address_description+0x2f5/0x3b0
[   10.538532]  ? skl_hda_audio_probe+0x13a/0x2b0 [snd_soc_skl_hda_dsp]
[   10.538535]  __kasan_report+0x134/0x191
[   10.538538]  ? skl_hda_audio_probe+0x13a/0x2b0 [snd_soc_skl_hda_dsp]
[   10.538542]  ? skl_hda_audio_probe+0x13a/0x2b0 [snd_soc_skl_hda_dsp]
[   10.538544]  kasan_report+0x12/0x20
[   10.538546]  __asan_store8+0x57/0x90
[   10.538550]  skl_hda_audio_probe+0x13a/0x2b0 [snd_soc_skl_hda_dsp]
[   10.538553]  platform_drv_probe+0x51/0xb0
[   10.538556]  really_probe+0x311/0x600
[   10.538559]  driver_probe_device+0x87/0x1b0
[   10.538562]  device_driver_attach+0x8f/0xa0
[   10.538565]  ? device_driver_attach+0xa0/0xa0
[   10.538567]  __driver_attach+0x102/0x1a0
[   10.538569]  ? device_driver_attach+0xa0/0xa0
[   10.538572]  bus_for_each_dev+0xe8/0x160
[   10.538574]  ? subsys_dev_iter_exit+0x10/0x10
[   10.538577]  ? preempt_count_sub+0x18/0xc0
[   10.538580]  ? _raw_write_unlock+0x1f/0x40
[   10.538582]  driver_attach+0x2b/0x30
[   10.538585]  bus_add_driver+0x251/0x340
[   10.538588]  driver_register+0xd3/0x1c0
[   10.538590]  __platform_driver_register+0x6c/0x80
[   10.538592]  ? 0xffffffffc03e8000
[   10.538595]  skl_hda_audio_init+0x1c/0x1000 [snd_soc_skl_hda_dsp]
[   10.538598]  do_one_initcall+0xd0/0x36a
[   10.538600]  ? trace_event_raw_event_initcall_finish+0x160/0x160
[   10.538602]  ? kasan_unpoison_shadow+0x36/0x50
[   10.538605]  ? __kasan_kmalloc+0xcc/0xe0
[   10.538607]  ? kasan_unpoison_shadow+0x36/0x50
[   10.538609]  ? kasan_poison_shadow+0x2f/0x40
[   10.538612]  ? __asan_register_globals+0x65/0x80
[   10.538615]  do_init_module+0xf9/0x36f
[   10.538619]  load_module+0x398e/0x4590
[   10.538625]  ? module_frob_arch_sections+0x20/0x20
[   10.538628]  ? __kasan_check_write+0x14/0x20
[   10.538630]  ? kernel_read+0x9a/0xc0
[   10.538632]  ? __kasan_check_write+0x14/0x20
[   10.538634]  ? kernel_read_file+0x1d3/0x3c0
[   10.538638]  ? cap_capable+0xca/0x110
[   10.538642]  __do_sys_finit_module+0x190/0x1d0
[   10.538644]  ? __do_sys_finit_module+0x190/0x1d0
[   10.538646]  ? __x64_sys_init_module+0x50/0x50
[   10.538649]  ? expand_files+0x380/0x380
[   10.538652]  ? __kasan_check_write+0x14/0x20
[   10.538654]  ? fput_many+0x20/0xc0
[   10.538658]  __x64_sys_finit_module+0x43/0x50
[   10.538660]  do_syscall_64+0xce/0x700
[   10.538662]  ? syscall_return_slowpath+0x230/0x230
[   10.538665]  ? __do_page_fault+0x51e/0x640
[   10.538668]  ? __kasan_check_read+0x11/0x20
[   10.538670]  ? prepare_exit_to_usermode+0xc7/0x200
[   10.538673]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: a78959f407e6 ("ASoC: Intel: skl_hda_dsp_common: use modern dai_link style")
Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Link: https://lore.kernel.org/r/20200122181254.22801-1-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoASoC: sgtl5000: Fix VDDA and VDDIO comparison
Marek Vasut [Fri, 20 Dec 2019 16:44:50 +0000 (17:44 +0100)]
ASoC: sgtl5000: Fix VDDA and VDDIO comparison

commit e19ecbf105b236a6334fab64d8fd5437b12ee019 upstream.

Comparing the voltage of VDDA and VDDIO to determine whether or not to
enable VDDC manual override is insufficient. This is a problem in case
the VDDA is supplied from different regulator than VDDIO, while both
report the same voltage to the regulator framework. In that case where
VDDA and VDDIO is supplied by different regulators, the VDDC manual
override must not be applied.

Fixes: b6319b061ba2 ("ASoC: sgtl5000: Fix charge pump source assignment")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Igor Opaniuk <igor.opaniuk@toradex.com>
Cc: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oleksandr Suvorov <oleksandr.suvorov@toradex.com>
Link: https://lore.kernel.org/r/20191220164450.1395038-2-marex@denx.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoregulator: core: Add regulator_is_equal() helper
Marek Vasut [Fri, 20 Dec 2019 16:44:49 +0000 (17:44 +0100)]
regulator: core: Add regulator_is_equal() helper

commit b059b7e0ec3208ff1e17cff6387d75a9fbab4e02 upstream.

Add regulator_is_equal() helper to compare whether two regulators are
the same. This is useful for checking whether two separate regulators
in a driver are actually the same supply.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: Igor Opaniuk <igor.opaniuk@toradex.com>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Oleksandr Suvorov <oleksandr.suvorov@toradex.com>
Link: https://lore.kernel.org/r/20191220164450.1395038-1-marex@denx.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoubifs: Fix memory leak from c->sup_node
Quanyang Wang [Tue, 14 Jan 2020 05:43:11 +0000 (13:43 +0800)]
ubifs: Fix memory leak from c->sup_node

commit ff90bdfb206e49c8b418811efbdd0c77380fa8c2 upstream.

The c->sup_node is allocated in function ubifs_read_sb_node but
is not freed. This will cause memory leak as below:

unreferenced object 0xbc9ce000 (size 4096):
  comm "mount", pid 500, jiffies 4294952946 (age 315.820s)
  hex dump (first 32 bytes):
    31 18 10 06 06 7b f1 11 02 00 00 00 00 00 00 00  1....{..........
    00 10 00 00 06 00 00 00 00 00 00 00 08 00 00 00  ................
  backtrace:
    [<d1c503cd>] ubifs_read_superblock+0x48/0xebc
    [<a20e14bd>] ubifs_mount+0x974/0x1420
    [<8589ecc3>] legacy_get_tree+0x2c/0x50
    [<5f1fb889>] vfs_get_tree+0x28/0xfc
    [<bbfc7939>] do_mount+0x4f8/0x748
    [<4151f538>] ksys_mount+0x78/0xa0
    [<d59910a9>] ret_fast_syscall+0x0/0x54
    [<1cc40005>] 0x7ea02790

Free it in ubifs_umount and in the error path of mount_ubifs.

Fixes: fd6150051bec ("ubifs: Store read superblock node")
Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoubi: Fix an error pointer dereference in error handling code
Dan Carpenter [Mon, 13 Jan 2020 13:23:46 +0000 (16:23 +0300)]
ubi: Fix an error pointer dereference in error handling code

commit 5d3805af279c93ef49a64701f35254676d709622 upstream.

If "seen_pebs = init_seen(ubi);" fails then "seen_pebs" is an error pointer
and we try to kfree() it which results in an Oops.

This patch re-arranges the error handling so now it only frees things
which have been allocated successfully.

Fixes: daef3dd1f0ae ("UBI: Fastmap: Add self check to detect absent PEBs")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agoubi: fastmap: Fix inverted logic in seen selfcheck
Sascha Hauer [Wed, 23 Oct 2019 09:58:12 +0000 (11:58 +0200)]
ubi: fastmap: Fix inverted logic in seen selfcheck

commit ef5aafb6e4e9942a28cd300bdcda21ce6cbaf045 upstream.

set_seen() sets the bit corresponding to the PEB number in the bitmap,
so when self_check_seen() wants to find PEBs that haven't been seen we
have to print the PEBs that have their bit cleared, not the ones which
have it set.

Fixes: 5d71afb00840 ("ubi: Use bitmaps in Fastmap self-check code")
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
5 years agovirtio_balloon: Fix memory leaks on errors in virtballoon_probe()
David Hildenbrand [Wed, 5 Feb 2020 16:34:01 +0000 (17:34 +0100)]
virtio_balloon: Fix memory leaks on errors in virtballoon_probe()

commit 1ad6f58ea9364b0a5d8ae06249653ac9304a8578 upstream.

We forget to put the inode and unmount the kernfs used for compaction.

Fixes: 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Wei Wang <wei.w.wang@intel.com>
Cc: Liang Li <liang.z.li@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200205163402.42627-3-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>