]> www.infradead.org Git - users/dwmw2/linux.git/log
users/dwmw2/linux.git
4 years agoclk: qcom: gcc-sdm660: Fix wrong parent_map
Konrad Dybcio [Tue, 22 Sep 2020 12:09:09 +0000 (14:09 +0200)]
clk: qcom: gcc-sdm660: Fix wrong parent_map

[ Upstream commit d46e5a39f9be9288f1ce2170c4c7f8098f4e7f68 ]

This was likely overlooked while porting the driver upstream.

Reported-by: Pavel Dubrova <pashadubrova@gmail.com>
Signed-off-by: Konrad Dybcio <konradybcio@gmail.com>
Link: https://lore.kernel.org/r/20200922120909.97203-1-konradybcio@gmail.com
Fixes: f2a76a2955c0 ("clk: qcom: Add Global Clock controller (GCC) driver for SDM660")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agovfio/type1: fix dirty bitmap calculation in vfio_dma_rw
Yan Zhao [Wed, 16 Sep 2020 02:30:05 +0000 (10:30 +0800)]
vfio/type1: fix dirty bitmap calculation in vfio_dma_rw

[ Upstream commit 2c5af98592f65517170c7bcc714566590d3f7397 ]

The count of dirtied pages is not only determined by count of copied
pages, but also by the start offset.

e.g. if offset = PAGE_SIZE - 1, and *copied=2, the dirty pages count
is 2, instead of 1 or 0.

Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages tracking")
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agovfio: fix a missed vfio group put in vfio_pin_pages
Yan Zhao [Wed, 16 Sep 2020 02:29:27 +0000 (10:29 +0800)]
vfio: fix a missed vfio group put in vfio_pin_pages

[ Upstream commit 28b130244061863cf0437b7af1625fb45ec1a71e ]

When error occurs, need to put vfio group after a successful get.

Fixes: 95fc87b44104 ("vfio: Selective dirty page tracking if IOMMU backed device pins pages")
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agovfio/pci: Decouple PCI_COMMAND_MEMORY bit checks from is_virtfn
Matthew Rosato [Thu, 10 Sep 2020 14:59:57 +0000 (10:59 -0400)]
vfio/pci: Decouple PCI_COMMAND_MEMORY bit checks from is_virtfn

[ Upstream commit 515ecd5368f1510152fa4f9b9ce55b66ac56c334 ]

While it is true that devices with is_virtfn=1 will have a Memory Space
Enable bit that is hard-wired to 0, this is not the only case where we
see this behavior -- For example some bare-metal hypervisors lack
Memory Space Enable bit emulation for devices not setting is_virtfn
(s390). Fix this by instead checking for the newly-added
no_command_memory bit which directly denotes the need for
PCI_COMMAND_MEMORY emulation in vfio.

Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agos390/pci: Mark all VFs as not implementing PCI_COMMAND_MEMORY
Matthew Rosato [Thu, 10 Sep 2020 14:59:56 +0000 (10:59 -0400)]
s390/pci: Mark all VFs as not implementing PCI_COMMAND_MEMORY

[ Upstream commit 08b6e22b850c28b6032da1e4d767a33116e23dfb ]

For s390 we can have VFs that are passed-through without the associated
PF. Firmware provides an emulation layer to allow these devices to
operate independently, but is missing emulation of the Memory Space
Enable bit.  For these as well as linked VFs, set no_command_memory
which specifies these devices do not implement PCI_COMMAND_MEMORY.

Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agovfio: add a singleton check for vfio_group_pin_pages
Yan Zhao [Wed, 16 Sep 2020 02:28:33 +0000 (10:28 +0800)]
vfio: add a singleton check for vfio_group_pin_pages

[ Upstream commit 7ef32e52368f62a4e041a4f0abefb4fb64e7fd4a ]

Page pinning is used both to translate and pin device mappings for DMA
purpose, as well as to indicate to the IOMMU backend to limit the dirty
page scope to those pages that have been pinned, in the case of an IOMMU
backed device.
To support this, the vfio_pin_pages() interface limits itself to only
singleton groups such that the IOMMU backend can consider dirty page
scope only at the group level.  Implement the same requirement for the
vfio_group_pin_pages() interface.

Fixes: 95fc87b44104 ("vfio: Selective dirty page tracking if IOMMU backed device pins pages")
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoPCI/IOV: Mark VFs as not implementing PCI_COMMAND_MEMORY
Matthew Rosato [Thu, 10 Sep 2020 14:59:55 +0000 (10:59 -0400)]
PCI/IOV: Mark VFs as not implementing PCI_COMMAND_MEMORY

[ Upstream commit 12856e7acde4702b7c3238c15fcba86ff6aa507f ]

For VFs, the Memory Space Enable bit in the Command Register is
hard-wired to 0.

Add a new bit to signify devices where the Command Register Memory
Space Enable bit does not control the device's response to MMIO
accesses.

Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoremoteproc: stm32: Fix pointer assignement
Mathieu Poirier [Mon, 31 Aug 2020 21:37:58 +0000 (15:37 -0600)]
remoteproc: stm32: Fix pointer assignement

[ Upstream commit cb2d8d5b196c2e96e29343383c8c8d8db68b934e ]

Fix the assignment of the @state pointer - it is obviously wrong.

Acked-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
Fixes: 376ffdc04456 ("remoteproc: stm32: Properly set co-processor state when attaching")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20200831213758.206690-1-mathieu.poirier@linaro.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agorpmsg: Avoid double-free in mtk_rpmsg_register_device
Nicolas Boichat [Thu, 3 Sep 2020 00:05:58 +0000 (08:05 +0800)]
rpmsg: Avoid double-free in mtk_rpmsg_register_device

[ Upstream commit 231331b2dbd71487159a0400d9ffd967eb0d0e08 ]

If rpmsg_register_device fails, it will call
mtk_rpmsg_release_device which already frees mdev.

Fixes: 7017996951fd ("rpmsg: add rpmsg support for mt8183 SCP.")
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20200903080547.v3.1.I56cf27cd59f4013bd074dc622c8b8248b034a4cc@changeid
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agorpmsg: smd: Fix a kobj leak in in qcom_smd_parse_edge()
Dan Carpenter [Tue, 8 Sep 2020 07:18:41 +0000 (10:18 +0300)]
rpmsg: smd: Fix a kobj leak in in qcom_smd_parse_edge()

[ Upstream commit e69ee0cf655e8e0c4a80f4319e36019b74f17639 ]

We need to call of_node_put(node) on the error paths for this function.

Fixes: 53e2822e56c7 ("rpmsg: Introduce Qualcomm SMD backend")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20200908071841.GA294938@mwanda
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoPCI: iproc: Set affinity mask on MSI interrupts
Mark Tomlinson [Mon, 3 Aug 2020 03:52:40 +0000 (15:52 +1200)]
PCI: iproc: Set affinity mask on MSI interrupts

[ Upstream commit eb7eacaa5b9e4f665bd08d416c8f88e63d2f123c ]

The core interrupt code expects the irq_set_affinity call to update the
effective affinity for the interrupt. This was not being done, so update
iproc_msi_irq_set_affinity() to do so.

Link: https://lore.kernel.org/r/20200803035241.7737-1-mark.tomlinson@alliedtelesis.co.nz
Fixes: 3bc2b2348835 ("PCI: iproc: Add iProc PCIe MSI support")
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoPCI: aardvark: Check for errors from pci_bridge_emul_init() call
Pali Rohár [Mon, 7 Sep 2020 11:10:35 +0000 (13:10 +0200)]
PCI: aardvark: Check for errors from pci_bridge_emul_init() call

[ Upstream commit 7862a6134456c8b4f8c39e8c94aa97e5c2f7f2b7 ]

Function pci_bridge_emul_init() may fail so correctly check for errors.

Link: https://lore.kernel.org/r/20200907111038.5811-3-pali@kernel.org
Fixes: 8a3ebd8de328 ("PCI: aardvark: Implement emulated root PCI bridge config space")
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Marek Behún <marek.behun@nic.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoPCI: aardvark: Fix compilation on s390
Pali Rohár [Mon, 7 Sep 2020 11:10:34 +0000 (13:10 +0200)]
PCI: aardvark: Fix compilation on s390

[ Upstream commit b32c012e4b98f0126aa327be2d1f409963057643 ]

Include linux/gpio/consumer.h instead of linux/gpio.h, as is said in the
latter file.

This was reported by kernel test bot when compiling for s390.

  drivers/pci/controller/pci-aardvark.c:350:2: error: implicit declaration of function 'gpiod_set_value_cansleep' [-Werror,-Wimplicit-function-declaration]
  drivers/pci/controller/pci-aardvark.c:1074:21: error: implicit declaration of function 'devm_gpiod_get_from_of_node' [-Werror,-Wimplicit-function-declaration]
  drivers/pci/controller/pci-aardvark.c:1076:14: error: use of undeclared identifier 'GPIOD_OUT_LOW'

Link: https://lore.kernel.org/r/202006211118.LxtENQfl%25lkp@intel.com
Link: https://lore.kernel.org/r/20200907111038.5811-2-pali@kernel.org
Fixes: 5169a9851daa ("PCI: aardvark: Issue PERST via GPIO")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Marek Behún <marek.behun@nic.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoPCI: designware-ep: Fix the Header Type check
Hou Zhiqiang [Tue, 18 Aug 2020 09:27:46 +0000 (17:27 +0800)]
PCI: designware-ep: Fix the Header Type check

[ Upstream commit 16270a92355722e387e9ca19627c5a4d7bae1354 ]

The current check will result in the multiple function device
fails to initialize. So fix the check by masking out the
multiple function bit.

Link: https://lore.kernel.org/r/20200818092746.24366-1-Zhiqiang.Hou@nxp.com
Fixes: 0b24134f7888 ("PCI: dwc: Add validation that PCIe core is set to correct mode")
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoclk: meson: g12a: mark fclk_div2 as critical
Stefan Agner [Fri, 28 Aug 2020 15:52:05 +0000 (17:52 +0200)]
clk: meson: g12a: mark fclk_div2 as critical

[ Upstream commit 2c4e80e06790cb49ad2603855d30c5aac2209c47 ]

On Amlogic Meson G12b platform, similar to fclk_div3, the fclk_div2
seems to be necessary for the system to operate correctly as well.

Typically, the clock also gets chosen by the eMMC peripheral. This
probably masked the problem so far. However, when booting from a SD
card the clock seems to get disabled which leads to a system freeze.

Let's mark this clock as critical, fixing boot from SD card on G12b
platforms.

Fixes: 085a4ea93d54 ("clk: meson: g12a: add peripheral clock controller")
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Anand Moon <linux.amoon@gmail.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/577e0129e8ee93972d92f13187ff4e4286182f67.1598629915.git.stefan@agner.ch
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoi2c: rcar: Auto select RESET_CONTROLLER
Dirk Behme [Thu, 27 Aug 2020 09:23:30 +0000 (11:23 +0200)]
i2c: rcar: Auto select RESET_CONTROLLER

[ Upstream commit 5b9bacf28a973a6b16510493416baeefa2c06289 ]

The i2c-rcar driver utilizes the Generic Reset Controller kernel
feature, so select the RESET_CONTROLLER option when the I2C_RCAR
option is selected with a Gen3 SoC.

Fixes: 2b16fd63059ab9 ("i2c: rcar: handle RXDMA HW behaviour on Gen3")
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
Signed-off-by: Andy Lowe <andy_lowe@mentor.com>
[erosca: Add "if ARCH_RCAR_GEN3" per Wolfram's request]
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agortc: ds1307: Clear OSF flag on DS1388 when setting time
Chris Packham [Tue, 18 Aug 2020 01:35:43 +0000 (13:35 +1200)]
rtc: ds1307: Clear OSF flag on DS1388 when setting time

[ Upstream commit f471b05f76e4b1b6ba07ebc7681920a5c5b97c5d ]

Ensure the OSF flag is cleared on the DS1388 when the clock is set.

Fixes: df11b323b16f ("rtc: ds1307: handle oscillator failure flags for ds1388 variant")
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20200818013543.4283-1-chris.packham@alliedtelesis.co.nz
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoclk: meson: axg-audio: separate axg and g12a regmap tables
Jerome Brunet [Wed, 29 Jul 2020 15:43:58 +0000 (17:43 +0200)]
clk: meson: axg-audio: separate axg and g12a regmap tables

[ Upstream commit cdabb1ffc7c2349b8930f752df1edcafc1d37cc1 ]

There are more differences than what we initially thought.
Let's keeps things clear and separate the axg and g12a regmap tables of the
audio clock controller.

Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20200729154359.1983085-3-jbrunet@baylibre.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agomailbox: avoid timer start from callback
Jassi Brar [Fri, 16 Oct 2020 17:20:56 +0000 (12:20 -0500)]
mailbox: avoid timer start from callback

[ Upstream commit c7dacf5b0f32957b24ef29df1207dc2cd8307743 ]

If the txdone is done by polling, it is possible for msg_submit() to start
the timer while txdone_hrtimer() callback is running. If the timer needs
recheduling, it could already be enqueued by the time hrtimer_forward_now()
is called, leading hrtimer to loudly complain.

WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 hrtimer_forward+0xc4/0x110
CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 5.9.0-rc2-00236-gd3520067d01c-dirty #5
Hardware name: Libre Computer AML-S805X-AC (DT)
Workqueue: events_freezable_power_ thermal_zone_device_check
pstate: 20000085 (nzCv daIf -PAN -UAO BTYPE=--)
pc : hrtimer_forward+0xc4/0x110
lr : txdone_hrtimer+0xf8/0x118
[...]

This can be fixed by not starting the timer from the callback path. Which
requires the timer reloading as long as any message is queued on the
channel, and not just when current tx is not done yet.

Fixes: 0cc67945ea59 ("mailbox: switch to hrtimer for tx_complete polling")
Reported-by: Da Xue <da@libre.computer>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agorapidio: fix the missed put_device() for rio_mport_add_riodev
Jing Xiangfeng [Fri, 16 Oct 2020 03:13:18 +0000 (20:13 -0700)]
rapidio: fix the missed put_device() for rio_mport_add_riodev

[ Upstream commit 85094c05eeb47d195a74a25366a2db066f1c9d47 ]

rio_mport_add_riodev() misses to call put_device() when the device already
exists.  Add the missed function call to fix it.

Fixes: e8de370188d0 ("rapidio: add mport char device driver")
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Link: https://lkml.kernel.org/r/20200922072525.42330-1-jingxiangfeng@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agorapidio: fix error handling path
Souptick Joarder [Fri, 16 Oct 2020 03:13:15 +0000 (20:13 -0700)]
rapidio: fix error handling path

[ Upstream commit fa63f083b3492b5ed5332b8d7c90b03b5ef24a1d ]

rio_dma_transfer() attempts to clamp the return value of
pin_user_pages_fast() to be >= 0.  However, the attempt fails because
nr_pages is overridden a few lines later, and restored to the undesirable
-ERRNO value.

The return value is ultimately stored in nr_pages, which in turn is passed
to unpin_user_pages(), which expects nr_pages >= 0, else, disaster.

Fix this by fixing the nesting of the assignment to nr_pages: nr_pages
should be clamped to zero if pin_user_pages_fast() returns -ERRNO, or set
to the return value of pin_user_pages_fast(), otherwise.

[jhubbard@nvidia.com: new changelog]

Fixes: e8de370188d09 ("rapidio: add mport char device driver")
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lkml.kernel.org/r/1600227737-20785-1-git-send-email-jrdr.linux@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoramfs: fix nommu mmap with gaps in the page cache
Matthew Wilcox (Oracle) [Fri, 16 Oct 2020 03:13:04 +0000 (20:13 -0700)]
ramfs: fix nommu mmap with gaps in the page cache

[ Upstream commit 50b7d85680086126d7bd91dae81d57d4cb1ab6b7 ]

ramfs needs to check that pages are both physically contiguous and
contiguous in the file.  If the page cache happens to have, eg, page A for
index 0 of the file, no page for index 1, and page A+1 for index 2, then
an mmap of the first two pages of the file will succeed when it should
fail.

Fixes: 642fb4d1f1dd ("[PATCH] NOMMU: Provide shared-writable mmap support on ramfs")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>
Link: https://lkml.kernel.org/r/20200914122239.GO6583@casper.infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agolib/crc32.c: fix trivial typo in preprocessor condition
Tobias Jordan [Fri, 16 Oct 2020 03:11:38 +0000 (20:11 -0700)]
lib/crc32.c: fix trivial typo in preprocessor condition

[ Upstream commit 904542dc56524f921a6bab0639ff6249c01e775f ]

Whether crc32_be needs a lookup table is chosen based on CRC_LE_BITS.
Obviously, the _be function should be governed by the _BE_ define.

This probably never pops up as it's hard to come up with a configuration
where CRC_BE_BITS isn't the same as CRC_LE_BITS and as nobody is using
bitwise CRC anyway.

Fixes: 46c5801eaf86 ("crc32: bolt on crc32c")
Signed-off-by: Tobias Jordan <kernel@cdqe.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lkml.kernel.org/r/20200923182122.GA3338@agrajag.zerfleddert.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agomm: fix a race during THP splitting
Huang Ying [Fri, 16 Oct 2020 03:06:07 +0000 (20:06 -0700)]
mm: fix a race during THP splitting

[ Upstream commit c4f9c701f9b44299e6adbc58d1a4bb2c40383494 ]

It is reported that the following bug is triggered if the HDD is used as
swap device,

[ 5758.157556] BUG: kernel NULL pointer dereference, address: 0000000000000007
[ 5758.165331] #PF: supervisor write access in kernel mode
[ 5758.171161] #PF: error_code(0x0002) - not-present page
[ 5758.176894] PGD 0 P4D 0
[ 5758.179721] Oops: 0002 [#1] SMP PTI
[ 5758.183614] CPU: 10 PID: 316 Comm: kswapd1 Kdump: loaded Tainted: G S               --------- ---  5.9.0-0.rc3.1.tst.el8.x86_64 #1
[ 5758.196717] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.01.0002.082220131453 08/22/2013
[ 5758.208176] RIP: 0010:split_swap_cluster+0x47/0x60
[ 5758.213522] Code: c1 e3 06 48 c1 eb 0f 48 8d 1c d8 48 89 df e8 d0 20 6a 00 80 63 07 fb 48 85 db 74 16 48 89 df c6 07 00 66 66 66 90 31 c0 5b c3 <80> 24 25 07 00 00 00 fb 31 c0 5b c3 b8 f0 ff ff ff 5b c3 66 0f 1f
[ 5758.234478] RSP: 0018:ffffb147442d7af0 EFLAGS: 00010246
[ 5758.240309] RAX: 0000000000000000 RBX: 000000000014b217 RCX: ffffb14779fd9000
[ 5758.248281] RDX: 000000000014b217 RSI: ffff9c52f2ab1400 RDI: 000000000014b217
[ 5758.256246] RBP: ffffe00c51168080 R08: ffffe00c5116fe08 R09: ffff9c52fffd3000
[ 5758.264208] R10: ffffe00c511537c8 R11: ffff9c52fffd3c90 R12: 0000000000000000
[ 5758.272172] R13: ffffe00c51170000 R14: ffffe00c51170000 R15: ffffe00c51168040
[ 5758.280134] FS:  0000000000000000(0000) GS:ffff9c52f2a80000(0000) knlGS:0000000000000000
[ 5758.289163] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5758.295575] CR2: 0000000000000007 CR3: 0000000022a0e003 CR4: 00000000000606e0
[ 5758.303538] Call Trace:
[ 5758.306273]  split_huge_page_to_list+0x88b/0x950
[ 5758.311433]  deferred_split_scan+0x1ca/0x310
[ 5758.316202]  do_shrink_slab+0x12c/0x2a0
[ 5758.320491]  shrink_slab+0x20f/0x2c0
[ 5758.324482]  shrink_node+0x240/0x6c0
[ 5758.328469]  balance_pgdat+0x2d1/0x550
[ 5758.332652]  kswapd+0x201/0x3c0
[ 5758.336157]  ? finish_wait+0x80/0x80
[ 5758.340147]  ? balance_pgdat+0x550/0x550
[ 5758.344525]  kthread+0x114/0x130
[ 5758.348126]  ? kthread_park+0x80/0x80
[ 5758.352214]  ret_from_fork+0x22/0x30
[ 5758.356203] Modules linked in: fuse zram rfkill sunrpc intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mgag200 iTCO_wdt crct10dif_pclmul iTCO_vendor_support drm_kms_helper crc32_pclmul ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops cec rapl joydev intel_cstate ipmi_si ipmi_devintf drm intel_uncore i2c_i801 ipmi_msghandler pcspkr lpc_ich mei_me i2c_smbus mei ioatdma ip_tables xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg igb ahci libahci i2c_algo_bit crc32c_intel libata dca wmi dm_mirror dm_region_hash dm_log dm_mod
[ 5758.412673] CR2: 0000000000000007
[    0.000000] Linux version 5.9.0-0.rc3.1.tst.el8.x86_64 (mockbuild@x86-vm-15.build.eng.bos.redhat.com) (gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), GNU ld version 2.30-79.el8) #1 SMP Wed Sep 9 16:03:34 EDT 2020

After further digging it's found that the following race condition exists in the
original implementation,

CPU1                                                             CPU2
----                                                             ----
deferred_split_scan()
  split_huge_page(page) /* page isn't compound head */
    split_huge_page_to_list(page, NULL)
      __split_huge_page(page, )
        ClearPageCompound(head)
        /* unlock all subpages except page (not head) */
                                                                 add_to_swap(head)  /* not THP */
                                                                   get_swap_page(head)
                                                                   add_to_swap_cache(head, )
                                                                     SetPageSwapCache(head)
     if PageSwapCache(head)
       split_swap_cluster(/* swap entry of head */)
         /* Deref sis->cluster_info: NULL accessing! */

So, in split_huge_page_to_list(), PageSwapCache() is called for the already
split and unlocked "head", which may be added to swap cache in another CPU.  So
split_swap_cluster() may be called wrongly.

To fix the race, the call to split_swap_cluster() is moved to
__split_huge_page() before all subpages are unlocked.  So that the
PageSwapCache() is stable.

Fixes: 59807685a7e77 ("mm, THP, swap: support splitting THP for THP swap out")
Reported-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Rafael Aquini <aquini@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Link: https://lkml.kernel.org/r/20201009073647.1531083-1-ying.huang@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agomm/huge_memory: fix split assumption of page size
Kirill A. Shutemov [Fri, 16 Oct 2020 03:05:36 +0000 (20:05 -0700)]
mm/huge_memory: fix split assumption of page size

[ Upstream commit 8cce54756806e5777069c46011c5f54f9feac717 ]

File THPs may now be of arbitrary size, and we can't rely on that size
after doing the split so remember the number of pages before we start the
split.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: SeongJae Park <sjpark@amazon.de>
Cc: Huang Ying <ying.huang@intel.com>
Link: https://lkml.kernel.org/r/20200908195539.25896-6-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agomm/page_owner: change split_page_owner to take a count
Matthew Wilcox (Oracle) [Fri, 16 Oct 2020 03:05:29 +0000 (20:05 -0700)]
mm/page_owner: change split_page_owner to take a count

[ Upstream commit 8fb156c9ee2db94f7127c930c89917634a1a9f56 ]

The implementation of split_page_owner() prefers a count rather than the
old order of the page.  When we support a variable size THP, we won't
have the order at this point, but we will have the number of pages.
So change the interface to what the caller and callee would prefer.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: SeongJae Park <sjpark@amazon.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Link: https://lkml.kernel.org/r/20200908195539.25896-4-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/rxe: Handle skb_clone() failure in rxe_recv.c
Bob Pearson [Tue, 13 Oct 2020 18:42:37 +0000 (13:42 -0500)]
RDMA/rxe: Handle skb_clone() failure in rxe_recv.c

[ Upstream commit 71abf20b28ff87fee6951ec2218d5ce7969c4e87 ]

If skb_clone() is unable to allocate memory for a new sk_buff this is not
detected by the current code.

Check for a NULL return and continue. This is similar to other errors in
this loop over QPs attached to the multicast address and consistent with
the unreliable UD transport.

Fixes: e7ec96fc7932f ("RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()")
Addresses-Coverity-ID: 1497804: Null pointer dereferences (NULL_RETURNS)
Link: https://lore.kernel.org/r/20201013184236.5231-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoafs: Fix cell removal
David Howells [Fri, 16 Oct 2020 12:21:14 +0000 (13:21 +0100)]
afs: Fix cell removal

[ Upstream commit 1d0e850a49a5b56f8f3cb51e74a11e2fedb96be6 ]

Fix cell removal by inserting a more final state than AFS_CELL_FAILED that
indicates that the cell has been unpublished in case the manager is already
requeued and will go through again.  The new AFS_CELL_REMOVED state will
just immediately leave the manager function.

Going through a second time in the AFS_CELL_FAILED state will cause it to
try to remove the cell again, potentially leading to the proc list being
removed.

Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+b994ecf2b023f14832c1@syzkaller.appspotmail.com
Reported-by: syzbot+0e0db88e1eb44a91ae8d@syzkaller.appspotmail.com
Reported-by: syzbot+2d0585e5efcd43d113c2@syzkaller.appspotmail.com
Reported-by: syzbot+1ecc2f9d3387f1d79d42@syzkaller.appspotmail.com
Reported-by: syzbot+18d51774588492bf3f69@syzkaller.appspotmail.com
Reported-by: syzbot+a5e4946b04d6ca8fa5f3@syzkaller.appspotmail.com
Suggested-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoafs: Fix cell purging with aliases
David Howells [Thu, 15 Oct 2020 10:05:01 +0000 (11:05 +0100)]
afs: Fix cell purging with aliases

[ Upstream commit 286377f6bdf71568a4cf07104fe44006ae0dba6d ]

When the afs module is removed, one of the things that has to be done is to
purge the cell database.  afs_cell_purge() cancels the management timer and
then starts the cell manager work item to do the purging.  This does a
single run through and then assumes that all cells are now purged - but
this is no longer the case.

With the introduction of alias detection, a later cell in the database can
now be holding an active count on an earlier cell (cell->alias_of).  The
purge scan passes by the earlier cell first, but this can't be got rid of
until it has discarded the alias.  Ordinarily, afs_unuse_cell() would
handle this by setting the management timer to trigger another pass - but
afs_set_cell_timer() doesn't do anything if the namespace is being removed
(net->live == false).  rmmod then hangs in the wait on cells_outstanding in
afs_cell_purge().

Fix this by making afs_set_cell_timer() directly queue the cell manager if
net->live is false.  This causes additional management passes.

Queueing the cell manager increments cells_outstanding to make sure the
wait won't complete until all cells are destroyed.

Fixes: 8a070a964877 ("afs: Detect cell aliases 1 - Cells with root volumes")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoafs: Fix cell refcounting by splitting the usage counter
David Howells [Tue, 23 Jul 2019 10:24:59 +0000 (11:24 +0100)]
afs: Fix cell refcounting by splitting the usage counter

[ Upstream commit 88c853c3f5c0a07c5db61b494ee25152535cfeee ]

Management of the lifetime of afs_cell struct has some problems due to the
usage counter being used to determine whether objects of that type are in
use in addition to whether anyone might be interested in the structure.

This is made trickier by cell objects being cached for a period of time in
case they're quickly reused as they hold the result of a setup process that
may be slow (DNS lookups, AFS RPC ops).

Problems include the cached root volume from alias resolution pinning its
parent cell record, rmmod occasionally hanging and occasionally producing
assertion failures.

Fix this by splitting the count of active users from the struct reference
count.  Things then work as follows:

 (1) The cell cache keeps +1 on the cell's activity count and this has to
     be dropped before the cell can be removed.  afs_manage_cell() tries to
     exchange the 1 to a 0 with the cells_lock write-locked, and if
     successful, the record is removed from the net->cells.

 (2) One struct ref is 'owned' by the activity count.  That is put when the
     active count is reduced to 0 (final_destruction label).

 (3) A ref can be held on a cell whilst it is queued for management on a
     work queue without confusing the active count.  afs_queue_cell() is
     added to wrap this.

 (4) The queue's ref is dropped at the end of the management.  This is
     split out into a separate function, afs_manage_cell_work().

 (5) The root volume record is put after a cell is removed (at the
     final_destruction label) rather then in the RCU destruction routine.

 (6) Volumes hold struct refs, but aren't active users.

 (7) Both counts are displayed in /proc/net/afs/cells.

There are some management function changes:

 (*) afs_put_cell() now just decrements the refcount and triggers the RCU
     destruction if it becomes 0.  It no longer sets a timer to have the
     manager do this.

 (*) afs_use_cell() and afs_unuse_cell() are added to increase and decrease
     the active count.  afs_unuse_cell() sets the management timer.

 (*) afs_queue_cell() is added to queue a cell with approprate refs.

There are also some other fixes:

 (*) Don't let /proc/net/afs/cells access a cell's vllist if it's NULL.

 (*) Make sure that candidate cells in lookups are properly destroyed
     rather than being simply kfree'd.  This ensures the bits it points to
     are destroyed also.

 (*) afs_dec_cells_outstanding() is now called in cell destruction rather
     than at "final_destruction".  This ensures that cell->net is still
     valid to the end of the destructor.

 (*) As a consequence of the previous two changes, move the increment of
     net->cells_outstanding that was at the point of insertion into the
     tree to the allocation routine to correctly balance things.

Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoafs: Fix rapid cell addition/removal by not using RCU on cells tree
David Howells [Fri, 9 Oct 2020 13:11:58 +0000 (14:11 +0100)]
afs: Fix rapid cell addition/removal by not using RCU on cells tree

[ Upstream commit 92e3cc91d8f51ce64a8b7c696377180953dd316e ]

There are a number of problems that are being seen by the rapidly mounting
and unmounting an afs dynamic root with an explicit cell and volume
specified (which should probably be rejected, but that's a separate issue):

What the tests are doing is to look up/create a cell record for the name
given and then tear it down again without actually using it to try to talk
to a server.  This is repeated endlessly, very fast, and the new cell
collides with the old one if it's not quick enough to reuse it.

It appears (as suggested by Hillf Danton) that the search through the RB
tree under a read_seqbegin_or_lock() under RCU conditions isn't safe and
that it's not blocking the write_seqlock(), despite taking two passes at
it.  He suggested that the code should take a ref on the cell it's
attempting to look at - but this shouldn't be necessary until we've
compared the cell names.  It's possible that I'm missing a barrier
somewhere.

However, using an RCU search for this is overkill, really - we only need to
access the cell name in a few places, and they're places where we're may
end up sleeping anyway.

Fix this by switching to an R/W semaphore instead.

Additionally, draw the down_read() call inside the function (renamed to
afs_find_cell()) since all the callers were taking the RCU read lock (or
should've been[*]).

[*] afs_probe_cell_name() should have been, but that doesn't appear to be
involved in the bug reports.

The symptoms of this look like:

general protection fault, probably for non-canonical address 0xf27d208691691fdb: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x93e924348b48fed8-0x93e924348b48fedf]
...
RIP: 0010:strncasecmp lib/string.c:52 [inline]
RIP: 0010:strncasecmp+0x5f/0x240 lib/string.c:43
 afs_lookup_cell_rcu+0x313/0x720 fs/afs/cell.c:88
 afs_lookup_cell+0x2ee/0x1440 fs/afs/cell.c:249
 afs_parse_source fs/afs/super.c:290 [inline]
...

Fixes: 989782dcdc91 ("afs: Overhaul cell database management")
Reported-by: syzbot+459a5dce0b4cb70fd076@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
cc: syzkaller-bugs@googlegroups.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agof2fs: wait for sysfs kobject removal before freeing f2fs_sb_info
Jamie Iles [Mon, 12 Oct 2020 13:09:48 +0000 (14:09 +0100)]
f2fs: wait for sysfs kobject removal before freeing f2fs_sb_info

[ Upstream commit ae284d87abade58c8db7760c808f311ef1ce693c ]

syzkaller found that with CONFIG_DEBUG_KOBJECT_RELEASE=y, unmounting an
f2fs filesystem could result in the following splat:

  kobject: 'loop5' ((____ptrval____)): kobject_release, parent 0000000000000000 (delayed 250)
  kobject: 'f2fs_xattr_entry-7:5' ((____ptrval____)): kobject_release, parent 0000000000000000 (delayed 750)
  ------------[ cut here ]------------
  ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x98
  WARNING: CPU: 0 PID: 699 at lib/debugobjects.c:485 debug_print_object+0x180/0x240
  Kernel panic - not syncing: panic_on_warn set ...
  CPU: 0 PID: 699 Comm: syz-executor.5 Tainted: G S                5.9.0-rc8+ #101
  Hardware name: linux,dummy-virt (DT)
  Call trace:
   dump_backtrace+0x0/0x4d8
   show_stack+0x34/0x48
   dump_stack+0x174/0x1f8
   panic+0x360/0x7a0
   __warn+0x244/0x2ec
   report_bug+0x240/0x398
   bug_handler+0x50/0xc0
   call_break_hook+0x160/0x1d8
   brk_handler+0x30/0xc0
   do_debug_exception+0x184/0x340
   el1_dbg+0x48/0xb0
   el1_sync_handler+0x170/0x1c8
   el1_sync+0x80/0x100
   debug_print_object+0x180/0x240
   debug_check_no_obj_freed+0x200/0x430
   slab_free_freelist_hook+0x190/0x210
   kfree+0x13c/0x460
   f2fs_put_super+0x624/0xa58
   generic_shutdown_super+0x120/0x300
   kill_block_super+0x94/0xf8
   kill_f2fs_super+0x244/0x308
   deactivate_locked_super+0x104/0x150
   deactivate_super+0x118/0x148
   cleanup_mnt+0x27c/0x3c0
   __cleanup_mnt+0x28/0x38
   task_work_run+0x10c/0x248
   do_notify_resume+0x9d4/0x1188
   work_pending+0x8/0x34c

Like the error handling for f2fs_register_sysfs(), we need to wait for
the kobject to be destroyed before returning to prevent a potential
use-after-free.

Fixes: bf9e697ecd42 ("f2fs: expose features to sysfs entry")
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoselftests/powerpc: Fix eeh-basic.sh exit codes
Oliver O'Halloran [Wed, 14 Oct 2020 02:47:11 +0000 (13:47 +1100)]
selftests/powerpc: Fix eeh-basic.sh exit codes

[ Upstream commit 996f9e0f93f16211945c8d5f18f296a88cb32f91 ]

The kselftests test running infrastructure expects tests to finish with an
exit code of 4 if the test decided it should be skipped. Currently
eeh-basic.sh exits with the number of devices that failed to recover, so if
four devices didn't recover we'll report a skip instead of a fail.

Fix this by checking if the return code is non-zero and report success
and failure by returning 0 or 1 respectively. For the cases where should
actually skip return 4.

Fixes: 85d86c8aa52e ("selftests/powerpc: Add basic EEH selftest")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201014024711.1138386-1-oohall@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoperf trace: Fix off by ones in memset() after realloc() in arches using libaudit
Jiri Slaby [Thu, 1 Oct 2020 09:34:19 +0000 (11:34 +0200)]
perf trace: Fix off by ones in memset() after realloc() in arches using libaudit

[ Upstream commit f3013f7ed465479e60c1ab1921a5718fc541cc3b ]

'perf trace ls' started crashing after commit d21cb73a9025 on
!HAVE_SYSCALL_TABLE_SUPPORT configs (armv7l here) like this:

  0  strlen () at ../sysdeps/arm/armv6t2/strlen.S:126
  1  0xb6800780 in __vfprintf_internal (s=0xbeff9908, s@entry=0xbeff9900, format=0xa27160 "]: %s()", ap=..., mode_flags=<optimized out>) at vfprintf-internal.c:1688
  ...
  5  0x0056ecdc in fprintf (__fmt=0xa27160 "]: %s()", __stream=<optimized out>) at /usr/include/bits/stdio2.h:100
  6  trace__sys_exit (trace=trace@entry=0xbeffc710, evsel=evsel@entry=0xd968d0, event=<optimized out>, sample=sample@entry=0xbeffc3e8) at builtin-trace.c:2475
  7  0x00566d40 in trace__handle_event (sample=0xbeffc3e8, event=<optimized out>, trace=0xbeffc710) at builtin-trace.c:3122
  ...
  15 main (argc=2, argv=0xbefff6e8) at perf.c:538

It is because memset in trace__read_syscall_info zeroes wrong memory:

1) when initializing for the first time, it does not reset the last id.

2) in other cases, it resets the last id of previous buffer.

ad 1) it causes the crash above as sc->name used in the fprintf above
      contains garbage.

ad 2) it sets nonexistent from true back to false for id 11 here. Not
      sure, what the consequences are.

So fix it by introducing a special case for the initial initialization
and do the right +1 in both cases.

Fixes: d21cb73a9025 ("perf trace: Grow the syscall table as needed when using libaudit")
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20201001093419.15761-1-jslaby@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agomaiblox: mediatek: Fix handling of platform_get_irq() error
Krzysztof Kozlowski [Thu, 27 Aug 2020 07:31:28 +0000 (09:31 +0200)]
maiblox: mediatek: Fix handling of platform_get_irq() error

[ Upstream commit 558e4c36ec9f2722af4fe8ef84dc812bcdb5c43a ]

platform_get_irq() returns -ERRNO on error.  In such case casting to u32
and comparing to 0 would pass the check.

Fixes: 623a6143a845 ("mailbox: mediatek: Add Mediatek CMDQ driver")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agothermal: core: Adding missing nlmsg_free() in thermal_genl_sampling_temp()
Jing Xiangfeng [Tue, 29 Sep 2020 08:26:52 +0000 (16:26 +0800)]
thermal: core: Adding missing nlmsg_free() in thermal_genl_sampling_temp()

[ Upstream commit 48b458591749d35c927351b4960b49e35af30fe6 ]

thermal_genl_sampling_temp() misses to call nlmsg_free() in an error path.

Jump to out_free to fix it.

Fixes: 1ce50e7d408ef2 ("thermal: core: genetlink support for events/cmd/sampling")
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200929082652.59876-1-jingxiangfeng@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoum: time-travel: Fix IRQ handling in time_travel_handle_message()
Johannes Berg [Thu, 10 Sep 2020 09:31:12 +0000 (11:31 +0200)]
um: time-travel: Fix IRQ handling in time_travel_handle_message()

[ Upstream commit ebef8ea2ba967026192a26f4529890893919bc57 ]

As the comment here indicates, we need to do the polling in the
idle loop without blocking interrupts, since interrupts can be
vhost-user messages that we must process even while in our idle
loop.

I don't know why I explained one thing and implemented another,
but we have indeed observed random hangs due to this, depending
on the timing of the messages.

Fixes: 88ce64249233 ("um: Implement time-travel=ext")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoum: vector: Use GFP_ATOMIC under spin lock
Tiezhu Yang [Fri, 19 Jun 2020 05:20:07 +0000 (13:20 +0800)]
um: vector: Use GFP_ATOMIC under spin lock

[ Upstream commit e4e721fe4ccb504a29d1e8d4047667557281d932 ]

Use GFP_ATOMIC instead of GFP_KERNEL under spin lock to fix possible
sleep-in-atomic-context bugs.

Fixes: 9807019a62dc ("um: Loadable BPF "Firmware" for vector drivers")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agof2fs: reject CASEFOLD inode flag without casefold feature
Eric Biggers [Thu, 8 Oct 2020 19:15:22 +0000 (12:15 -0700)]
f2fs: reject CASEFOLD inode flag without casefold feature

[ Upstream commit f6322f3f1212e005e7e6aa82ceb62be53030a64b ]

syzbot reported:

    general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
    KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
    CPU: 0 PID: 6860 Comm: syz-executor835 Not tainted 5.9.0-rc8-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:utf8_casefold+0x43/0x1b0 fs/unicode/utf8-core.c:107
    [...]
    Call Trace:
     f2fs_init_casefolded_name fs/f2fs/dir.c:85 [inline]
     __f2fs_setup_filename fs/f2fs/dir.c:118 [inline]
     f2fs_prepare_lookup+0x3bf/0x640 fs/f2fs/dir.c:163
     f2fs_lookup+0x10d/0x920 fs/f2fs/namei.c:494
     __lookup_hash+0x115/0x240 fs/namei.c:1445
     filename_create+0x14b/0x630 fs/namei.c:3467
     user_path_create fs/namei.c:3524 [inline]
     do_mkdirat+0x56/0x310 fs/namei.c:3664
     do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
     entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [...]

The problem is that an inode has F2FS_CASEFOLD_FL set, but the
filesystem doesn't have the casefold feature flag set, and therefore
super_block::s_encoding is NULL.

Fix this by making sanity_check_inode() reject inodes that have
F2FS_CASEFOLD_FL when the filesystem doesn't have the casefold feature.

Reported-by: syzbot+05139c4039d0679e19ff@syzkaller.appspotmail.com
Fixes: 2c2eb7a300cd ("f2fs: Support case-insensitive file name lookups")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()
Bob Pearson [Thu, 8 Oct 2020 20:36:52 +0000 (15:36 -0500)]
RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()

[ Upstream commit e7ec96fc7932f48a6d6cdd05bf82004a1a04285b ]

The changes referenced below replaced sbk_clone)_ by taking additional
references, passing the skb along and then freeing the skb. This
deleted the packets before they could be processed and additionally
passed bad data in each packet. Since pkt is stored in skb->cb
changing pkt->qp changed it for all the packets.

Replace skb_get() by sbk_clone() in rxe_rcv_mcast_pkt() for cases where
multiple QPs are receiving multicast packets on the same address.

Delete kfree_skb() because the packets need to live until they have been
processed by each QP. They are freed later.

Fixes: 86af61764151 ("IB/rxe: remove unnecessary skb_clone")
Fixes: fe896ceb5772 ("IB/rxe: replace refcount_inc with skb_get")
Link: https://lore.kernel.org/r/20201008203651.256958-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoIB/rdmavt: Fix sizeof mismatch
Colin Ian King [Thu, 8 Oct 2020 09:52:04 +0000 (10:52 +0100)]
IB/rdmavt: Fix sizeof mismatch

[ Upstream commit 8e71f694e0c819db39af2336f16eb9689f1ae53f ]

An incorrect sizeof is being used, struct rvt_ibport ** is not correct, it
should be struct rvt_ibport *. Note that since ** is the same size as
* this is not causing any issues.  Improve this fix by using
sizeof(*rdi->ports) as this allows us to not even reference the type
of the pointer.  Also remove line breaks as the entire statement can
fit on one line.

Link: https://lore.kernel.org/r/20201008095204.82683-1-colin.king@canonical.com
Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
Fixes: ff6acd69518e ("IB/rdmavt: Add device structure allocation")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agocpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_reboot_notifier
Srikar Dronamraju [Tue, 22 Sep 2020 08:02:54 +0000 (13:32 +0530)]
cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_reboot_notifier

[ Upstream commit a2d0230b91f7e23ceb5d8fb6a9799f30517ec33a ]

The patch avoids allocating cpufreq_policy on stack hence fixing frame
size overflow in 'powernv_cpufreq_reboot_notifier':

  drivers/cpufreq/powernv-cpufreq.c: In function powernv_cpufreq_reboot_notifier:
  drivers/cpufreq/powernv-cpufreq.c:906:1: error: the frame size of 2064 bytes is larger than 2048 bytes

Fixes: cf30af76 ("cpufreq: powernv: Set the cpus to nominal frequency during reboot/kexec")
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200922080254.41497-1-srikar@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/papr_scm: Add PAPR command family to pass-through command-set
Vaibhav Jain [Sun, 13 Sep 2020 21:19:04 +0000 (02:49 +0530)]
powerpc/papr_scm: Add PAPR command family to pass-through command-set

[ Upstream commit 13135b461cf205941308984bd3271ec7d403dc40 ]

Add NVDIMM_FAMILY_PAPR to the list of valid 'dimm_family_mask'
acceptable by papr_scm. This is needed as since commit
92fe2aa859f5 ("libnvdimm: Validate command family indices") libnvdimm
performs a validation of 'nd_cmd_pkg.nd_family' received as part of
ND_CMD_CALL processing to ensure only known command families can use
the general ND_CMD_CALL pass-through functionality.

Without this change the ND_CMD_CALL pass-through targeting
NVDIMM_FAMILY_PAPR error out with -EINVAL.

Fixes: 92fe2aa859f5 ("libnvdimm: Validate command family indices")
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200913211904.24472-1-vaibhav@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/book3s64/radix: Make radix_mem_block_size 64bit
Aneesh Kumar K.V [Wed, 7 Oct 2020 11:48:35 +0000 (17:18 +0530)]
powerpc/book3s64/radix: Make radix_mem_block_size 64bit

[ Upstream commit 950805f4d90eda14325ceab56b0f00d034baa8bc ]

Similar to commit 89c140bbaeee ("pseries: Fix 64 bit logical memory block panic")
make sure different variables tracking lmb_size are updated to be 64 bit.

Fixes: af9d00e93a4f ("powerpc/mm/radix: Create separate mappings for hot-plugged memory")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201007114836.282468-4-aneesh.kumar@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/security: Fix link stack flush instruction
Nicholas Piggin [Wed, 7 Oct 2020 08:06:05 +0000 (18:06 +1000)]
powerpc/security: Fix link stack flush instruction

[ Upstream commit 792254a77201453d9a77479e63dc216ad90462d2 ]

The inline execution path for the hardware assisted branch flush
instruction failed to set CTR to the correct value before bcctr,
causing a crash when the feature is enabled.

Fixes: 4d24e21cc694 ("powerpc/security: Allow for processors that flush the link stack using the special bcctr")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201007080605.64423-1-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoi3c: master: Fix error return in cdns_i3c_master_probe()
Jing Xiangfeng [Fri, 11 Sep 2020 03:33:50 +0000 (11:33 +0800)]
i3c: master: Fix error return in cdns_i3c_master_probe()

[ Upstream commit abea14bfdebbe9bd02f2ad24a1f3a878ed21c8f0 ]

Fix to return negative error code -ENOMEM from the error handling
case instead of 0.

Fixes: 603f2bee2c54 ("i3c: master: Add driver for Cadence IP")
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://lore.kernel.org/linux-i3c/20200911033350.23904-1-jingxiangfeng@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoperf stat: Fix out of bounds CPU map access when handling armv8_pmu events
Namhyung Kim [Wed, 7 Oct 2020 08:13:11 +0000 (17:13 +0900)]
perf stat: Fix out of bounds CPU map access when handling armv8_pmu events

[ Upstream commit bef69bd7cfc363ab94b84ea29102f3e913ed3c6c ]

It was reported that 'perf stat' crashed when using with armv8_pmu (CPU)
events with the task mode.  As 'perf stat' uses an empty cpu map for
task mode but armv8_pmu has its own cpu mask, it has confused which map
it should use when accessing file descriptors and this causes segfaults:

  (gdb) bt
  #0  0x0000000000603fc8 in perf_evsel__close_fd_cpu (evsel=<optimized out>,
      cpu=<optimized out>) at evsel.c:122
  #1  perf_evsel__close_cpu (evsel=evsel@entry=0x716e950, cpu=7) at evsel.c:156
  #2  0x00000000004d4718 in evlist__close (evlist=0x70a7cb0) at util/evlist.c:1242
  #3  0x0000000000453404 in __run_perf_stat (argc=3, argc@entry=1, argv=0x30,
      argv@entry=0xfffffaea2f90, run_idx=119, run_idx@entry=1701998435)
      at builtin-stat.c:929
  #4  0x0000000000455058 in run_perf_stat (run_idx=1701998435, argv=0xfffffaea2f90,
      argc=1) at builtin-stat.c:947
  #5  cmd_stat (argc=1, argv=0xfffffaea2f90) at builtin-stat.c:2357
  #6  0x00000000004bb888 in run_builtin (p=p@entry=0x9764b8 <commands+288>,
      argc=argc@entry=4, argv=argv@entry=0xfffffaea2f90) at perf.c:312
  #7  0x00000000004bbb54 in handle_internal_command (argc=argc@entry=4,
      argv=argv@entry=0xfffffaea2f90) at perf.c:364
  #8  0x0000000000435378 in run_argv (argcp=<synthetic pointer>,
      argv=<synthetic pointer>) at perf.c:408
  #9  main (argc=4, argv=0xfffffaea2f90) at perf.c:538

To fix this, I simply used the given cpu map unless the evsel actually
is not a system-wide event (like uncore events).

Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")
Reported-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Barry Song <song.bao.hua@hisilicon.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201007081311.1831003-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/perf/hv-gpci: Fix starting index value
Kajol Jain [Sat, 3 Oct 2020 07:49:39 +0000 (13:19 +0530)]
powerpc/perf/hv-gpci: Fix starting index value

[ Upstream commit 0f9866f7e85765bbda86666df56c92f377c3bc10 ]

Commit 9e9f60108423f ("powerpc/perf/{hv-gpci, hv-common}: generate
requests with counters annotated") adds a framework for defining
gpci counters.
In this patch, they adds starting_index value as '0xffffffffffffffff'.
which is wrong as starting_index is of size 32 bits.

Because of this, incase we try to run hv-gpci event we get error.

In power9 machine:

command#: perf stat -e hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
          -C 0 -I 1000
event syntax error: '..bie_count_and_time_tlbie_instructions_issued/'
                                  \___ value too big for format, maximum is 4294967295

This patch fix this issue and changes starting_index value to '0xffffffff'

After this patch:

command#: perf stat -e hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/ -C 0 -I 1000
     1.000085786              1,024      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     2.000287818              1,024      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/
     2.439113909             17,408      hv_gpci/system_tlbie_count_and_time_tlbie_instructions_issued/

Fixes: 9e9f60108423 ("powerpc/perf/{hv-gpci, hv-common}: generate requests with counters annotated")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201003074943.338618-1-kjain@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/perf: Exclude pmc5/6 from the irrelevant PMU group constraints
Athira Rajeev [Mon, 21 Sep 2020 07:10:04 +0000 (03:10 -0400)]
powerpc/perf: Exclude pmc5/6 from the irrelevant PMU group constraints

[ Upstream commit 3b6c3adbb2fa42749c3d38cfc4d4d0b7e096bb7b ]

PMU counter support functions enforces event constraints for group of
events to check if all events in a group can be monitored. Incase of
event codes using PMC5 and PMC6 ( 500fa and 600f4 respectively ), not
all constraints are applicable, say the threshold or sample bits. But
current code includes pmc5 and pmc6 in some group constraints (like
IC_DC Qualifier bits) which is actually not applicable and hence
results in those events not getting counted when scheduled along with
group of other events. Patch fixes this by excluding PMC5/6 from
constraints which are not relevant for it.

Fixes: 7ffd948 ("powerpc/perf: factor out power8 pmu functions")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1600672204-1610-1-git-send-email-atrajeev@linux.vnet.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc: PPC_SECURE_BOOT should not require PowerNV
Daniel Axtens [Thu, 24 Sep 2020 01:49:22 +0000 (11:49 +1000)]
powerpc: PPC_SECURE_BOOT should not require PowerNV

[ Upstream commit 5c5e46dad939b2bf4df04293ab9ac68abd7c1f55 ]

In commit 61f879d97ce4 ("powerpc/pseries: Detect secure and trusted
boot state of the system.") we taught the kernel how to understand the
secure-boot parameters used by a pseries guest.

However, CONFIG_PPC_SECURE_BOOT still requires PowerNV. I didn't
catch this because pseries_le_defconfig includes support for
PowerNV and so everything still worked. Indeed, most configs will.
Nonetheless, technically PPC_SECURE_BOOT doesn't require PowerNV
any more.

The secure variables support (PPC_SECVAR_SYSFS) doesn't do anything
on pSeries yet, but I don't think it's worth adding a new condition -
at some stage we'll want to add a backend for pSeries anyway.

Fixes: 61f879d97ce4 ("powerpc/pseries: Detect secure and trusted boot state of the system.")
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200924014922.172914-1-dja@axtens.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/64: fix irq replay pt_regs->softe value
Nicholas Piggin [Tue, 15 Sep 2020 11:46:46 +0000 (21:46 +1000)]
powerpc/64: fix irq replay pt_regs->softe value

[ Upstream commit 2b48e96be2f9f7151197fd25dc41487054bc6f5b ]

Replayed interrupts get an "artificial" struct pt_regs constructed to
pass to interrupt handler functions. This did not get the softe field
set correctly, it's as though the interrupt has hit while irqs are
disabled. It should be IRQS_ENABLED.

This is possibly harmless, asynchronous handlers should not be testing
if irqs were disabled, but it might be possible for example some code
is shared with synchronous or NMI handlers, and it makes more sense if
debug output looks at this.

Fixes: 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in C")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200915114650.3980244-2-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/64: fix irq replay missing preempt
Nicholas Piggin [Tue, 15 Sep 2020 11:46:45 +0000 (21:46 +1000)]
powerpc/64: fix irq replay missing preempt

[ Upstream commit 903fd31d3212ab72d564c68f6cfb5d04db68773e ]

Prior to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt
replay in C"), replayed interrupts returned by the regular interrupt
exit code, which performs preemption in case an interrupt had set
need_resched.

This logic was missed by the conversion. Adding preempt_disable/enable
around the interrupt replay and final irq enable will reschedule if
needed.

Fixes: 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in C")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200915114650.3980244-1-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
Kamal Heib [Sun, 4 Oct 2020 13:29:48 +0000 (16:29 +0300)]
RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces

[ Upstream commit 5ce2dced8e95e76ff7439863a118a053a7fc6f91 ]

Report the "ipoib pkey", "mode" and "umcast" netlink attributes for every
IPoiB interface type, not just children created with 'ip link add'.

After setting the rtnl_link_ops for the parent interface, implement the
dellink() callback to block users from trying to remove it.

Fixes: 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks")
Link: https://lore.kernel.org/r/20201004132948.26669-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agomtd: parsers: bcm63xx: Do not make it modular
Florian Fainelli [Tue, 29 Sep 2020 17:27:21 +0000 (10:27 -0700)]
mtd: parsers: bcm63xx: Do not make it modular

[ Upstream commit b597cc75f7fe76708bc6ab3f0e09bbff6f09ae4a ]

With commit 91e81150d388 ("mtd: parsers: bcm63xx: simplify CFE
detection"), we generate a reference to fw_arg3 which is the fourth
firmware/command line argument on MIPS platforms. That symbol is not
exported and would cause a linking failure.

The parser is typically necessary to boot a BCM63xx-based system anyway
so having it be part of the kernel image makes sense, therefore make it
'bool' instead of 'tristate'.

Fixes: 91e81150d388 ("mtd: parsers: bcm63xx: simplify CFE detection")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20200929172726.30469-1-f.fainelli@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agooverflow: Include header file with SIZE_MAX declaration
Leon Romanovsky [Sun, 13 Sep 2020 10:29:28 +0000 (13:29 +0300)]
overflow: Include header file with SIZE_MAX declaration

[ Upstream commit a4947e84f23474803b62a2759b5808147e4e15f9 ]

The various array_size functions use SIZE_MAX define, but missed limits.h
causes to failure to compile code that needs overflow.h.

 In file included from drivers/infiniband/core/uverbs_std_types_device.c:6:
 ./include/linux/overflow.h: In function 'array_size':
 ./include/linux/overflow.h:258:10: error: 'SIZE_MAX' undeclared (first use in this function)
   258 |   return SIZE_MAX;
       |          ^~~~~~~~

Fixes: 610b15c50e86 ("overflow.h: Add allocation size calculation helpers")
Link: https://lore.kernel.org/r/20200913102928.134985-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agokdb: Fix pager search for multi-line strings
Daniel Thompson [Wed, 9 Sep 2020 14:17:08 +0000 (15:17 +0100)]
kdb: Fix pager search for multi-line strings

[ Upstream commit d081a6e353168f15e63eb9e9334757f20343319f ]

Currently using forward search doesn't handle multi-line strings correctly.
The search routine replaces line breaks with \0 during the search and, for
regular searches ("help | grep Common\n"), there is code after the line
has been discarded or printed to replace the break character.

However during a pager search ("help\n" followed by "/Common\n") when the
string is matched we will immediately return to normal output and the code
that should restore the \n becomes unreachable. Fix this by restoring the
replaced character when we disable the search mode and update the comment
accordingly.

Fixes: fb6daa7520f9d ("kdb: Provide forward search at more prompt")
Link: https://lore.kernel.org/r/20200909141708.338273-1-daniel.thompson@linaro.org
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agomtd: rawnand: ams-delta: Fix non-OF build warning
Janusz Krzysztofik [Sat, 19 Sep 2020 08:04:03 +0000 (10:04 +0200)]
mtd: rawnand: ams-delta: Fix non-OF build warning

[ Upstream commit 6d11178762f7c8338a028b428198383b8978b280 ]

Commit 7c2f66a960fc ("mtd: rawnand: ams-delta: Add module device
tables") introduced an OF module device table but wrapped a reference
to it with of_match_ptr() which resolves to NULL in non-OF configs.
That resulted in a clang compiler warning on unused variable in non-OF
builds.  Fix it.

drivers/mtd/nand/raw/ams-delta.c:373:34: warning: unused variable 'gpio_nand_of_id_table' [-Wunused-const-variable]
   static const struct of_device_id gpio_nand_of_id_table[] = {
                                    ^
   1 warning generated.

Fixes: 7c2f66a960fc ("mtd: rawnand: ams-delta: Add module device tables")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20200919080403.17520-1-jmkrzyszt@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agomtd: spinand: gigadevice: Add QE Bit
Hauke Mehrtens [Thu, 20 Aug 2020 16:51:20 +0000 (18:51 +0200)]
mtd: spinand: gigadevice: Add QE Bit

[ Upstream commit aea7687e77bebce5b67fab9d03347bd8df7933c7 ]

The following GigaDevice chips have the QE BIT in the feature flags, I
checked the datasheets, but did not try this.
* GD5F1GQ4xExxG
* GD5F1GQ4xFxxG
* GD5F1GQ4UAYIG
* GD5F4GQ4UAYIG

The Quad operations like 0xEB mention that the QE bit has to be set.

Fixes: c93c613214ac ("mtd: spinand: add support for GigaDevice GD5FxGQ4xA")
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Tested-by: Chuanhong Guo <gch981213@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20200820165121.3192-3-hauke@hauke-m.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agomtd: spinand: gigadevice: Only one dummy byte in QUADIO
Hauke Mehrtens [Thu, 20 Aug 2020 16:51:19 +0000 (18:51 +0200)]
mtd: spinand: gigadevice: Only one dummy byte in QUADIO

[ Upstream commit 6387ad9caf8f09747a8569e5876086b72ee9382c ]

The datasheet only lists one dummy byte in the 0xEH operation for the
following chips:
* GD5F1GQ4xExxG
* GD5F1GQ4xFxxG
* GD5F1GQ4UAYIG
* GD5F4GQ4UAYIG

Fixes: c93c613214ac ("mtd: spinand: add support for GigaDevice GD5FxGQ4xA")
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Tested-by: Chuanhong Guo <gch981213@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20200820165121.3192-2-hauke@hauke-m.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agomtd: rawnand: vf610: disable clk on error handling path in probe
Evgeny Novikov [Thu, 6 Aug 2020 07:26:34 +0000 (10:26 +0300)]
mtd: rawnand: vf610: disable clk on error handling path in probe

[ Upstream commit cb7dc3178a9862614b1e7567d77f4679f027a074 ]

vf610_nfc_probe() does not invoke clk_disable_unprepare() on one error
handling path. The patch fixes that.

Found by Linux Driver Verification project (linuxtesting.org).

Fixes: 6f0ce4dfc5a3 ("mtd: rawnand: vf610: Avoid a potential NULL pointer dereference")
Signed-off-by: Evgeny Novikov <novikov@ispras.ru>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20200806072634.23528-1-novikov@ispras.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agomtd: rawnand: stm32_fmc2: fix a buffer overflow
Christophe Kerello [Tue, 21 Jul 2020 09:52:07 +0000 (11:52 +0200)]
mtd: rawnand: stm32_fmc2: fix a buffer overflow

[ Upstream commit ab16f54ef3cdb6bbc06a36f636a89e6db8a6cea3 ]

This patch solves following static checker warning:
drivers/mtd/nand/raw/stm32_fmc2_nand.c:350 stm32_fmc2_nfc_select_chip()
error: buffer overflow 'nfc->data_phys_addr' 2 <= 2

The CS value can only be 0 or 1.

Signed-off-by: Christophe Kerello <christophe.kerello@st.com>
Fixes: 2cd457f328c1 ("mtd: rawnand: stm32_fmc2: add STM32 FMC2 NAND flash controller driver")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/1595325127-32693-1-git-send-email-christophe.kerello@st.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agomtd: hyperbus: hbmc-am654: Fix direct mapping setup flash access
Vignesh Raghavendra [Thu, 24 Sep 2020 08:12:12 +0000 (13:42 +0530)]
mtd: hyperbus: hbmc-am654: Fix direct mapping setup flash access

[ Upstream commit aca31ce96814c84d1a41aaa109c15abe61005af7 ]

Setting up of direct mapping should be done with flash node's IO
address space and not with controller's IO region.

Fixes: b6fe8bc67d2d3 ("mtd: hyperbus: move direct mapping setup to AM654 HBMC driver")
Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com>
Link: https://lore.kernel.org/r/20200924081214.16934-3-vigneshr@ti.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/mlx5: Fix type warning of sizeof in __mlx5_ib_alloc_counters()
Liu Shixin [Thu, 17 Sep 2020 08:13:54 +0000 (16:13 +0800)]
RDMA/mlx5: Fix type warning of sizeof in __mlx5_ib_alloc_counters()

[ Upstream commit b942fc0319a72b83146b79619eb578e989062911 ]

sizeof() when applied to a pointer typed expression should give the size
of the pointed data, even if the data is a pointer.

Fixes: e1f24a79f424 ("IB/mlx5: Support congestion related counters")
Link: https://lore.kernel.org/r/20200917081354.2083293-1-liushixin2@huawei.com
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/hns: Fix missing sq_sig_type when querying QP
Weihang Li [Sat, 19 Sep 2020 10:03:22 +0000 (18:03 +0800)]
RDMA/hns: Fix missing sq_sig_type when querying QP

[ Upstream commit 05df49279f8926178ecb3ce88e61b63104cd6293 ]

The sq_sig_type field should be filled when querying QP, or the users may
get a wrong value.

Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1600509802-44382-9-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/hns: Fix configuration of ack_req_freq in QPC
Weihang Li [Sat, 19 Sep 2020 10:03:21 +0000 (18:03 +0800)]
RDMA/hns: Fix configuration of ack_req_freq in QPC

[ Upstream commit fbed9d2be292504e04caa2057e3a9477a1e1d040 ]

The hardware will add AckReq flag in BTH header according to the value of
ack_req_freq to request ACK from responder for the packets with this flag.
It should be greater than or equal to lp_pktn_ini instead of using a fixed
value.

Fixes: 7b9bd73ed13d ("RDMA/hns: Fix wrong assignment of lp_pktn_ini in QPC")
Link: https://lore.kernel.org/r/1600509802-44382-8-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/hns: Fix the wrong value of rnr_retry when querying qp
Wenpeng Liang [Sat, 19 Sep 2020 10:03:20 +0000 (18:03 +0800)]
RDMA/hns: Fix the wrong value of rnr_retry when querying qp

[ Upstream commit 99fcf82521d91468ee6115a3c253aa032dc63cbc ]

The rnr_retry returned to the user is not correct, it should be got from
another fields in QPC.

Fixes: bfe860351e31 ("RDMA/hns: Fix cast from or to restricted __le32 for driver")
Link: https://lore.kernel.org/r/1600509802-44382-7-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/hns: Solve the overflow of the calc_pg_sz()
Jiaran Zhang [Sat, 19 Sep 2020 10:03:19 +0000 (18:03 +0800)]
RDMA/hns: Solve the overflow of the calc_pg_sz()

[ Upstream commit 768202a0825d447de785e87ff1ea1d3c86a71727 ]

calc_pg_sz() may gets a data calculation overflow if the PAGE_SIZE is 64 KB
and hop_num is 2. It is because that all variables involved in calculation
are defined in type of int. So change the type of bt_chunk_size,
buf_chunk_size and obj_per_chunk_default to u64.

Fixes: ba6bb7e97421 ("RDMA/hns: Add interfaces to get pf capabilities from firmware")
Link: https://lore.kernel.org/r/1600509802-44382-6-git-send-email-liweihang@huawei.com
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/hns: Add check for the validity of sl configuration
Jiaran Zhang [Sat, 19 Sep 2020 10:03:18 +0000 (18:03 +0800)]
RDMA/hns: Add check for the validity of sl configuration

[ Upstream commit 172505cfa3a8ee98acaa569fd3be97697b333958 ]

According to the RoCE v1 specification, the sl (service level) 0-7 are
mapped directly to priorities 0-7 respectively, sl 8-15 are reserved. The
driver should verify whether the the value of sl is larger than 7, if so,
an exception should be returned.

Fixes: 926a01dc000d ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1600509802-44382-5-git-send-email-liweihang@huawei.com
Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoperf stat: Skip duration_time in setup_system_wide
Jin Yao [Tue, 22 Sep 2020 01:50:04 +0000 (09:50 +0800)]
perf stat: Skip duration_time in setup_system_wide

[ Upstream commit 002a3d690f95804bdef6b70b26154103518e13d9 ]

Some metrics (such as DRAM_BW_Use) consists of uncore events and
duration_time. For uncore events, counter->core.system_wide is true. But
for duration_time, counter->core.system_wide is false so
target.system_wide is set to false.

Then 'enable_on_exec' is set in perf_event_attr of uncore event.  Kernel
will return error when trying to open the uncore event.

This patch skips the duration_time in setup_system_wide then
target.system_wide will be set to true for the evlist of uncore events +
duration_time.

Before (tested on skylake desktop):

  # perf stat -M DRAM_BW_Use -- sleep 1
  Error:
  The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (arb/event=0x84,umask=0x1/).
  /bin/dmesg | grep -i perf may provide additional information.

After:

  # perf stat -M DRAM_BW_Use -- sleep 1

   Performance counter stats for 'system wide':

                169      arb/event=0x84,umask=0x1/ #     0.00 DRAM_BW_Use
             40,427      arb/event=0x81,umask=0x1/
      1,000,902,197 ns   duration_time

        1.000902197 seconds time elapsed

Fixes: e3ba76deef23064f ("perf tools: Force uncore events to system wide monitoring")
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200922015004.30114-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoi40iw: Add support to make destroy QP synchronous
Sindhu, Devale [Wed, 16 Sep 2020 13:18:12 +0000 (08:18 -0500)]
i40iw: Add support to make destroy QP synchronous

[ Upstream commit f2334964e969762e266a616acf9377f6046470a2 ]

Occasionally ib_write_bw crash is seen due to access of a pd object in
i40iw_sc_qp_destroy after it is freed. Destroy qp is not synchronous in
i40iw and thus the iwqp object could be referencing a pd object that is
freed by ib core as a result of successful return from i40iw_destroy_qp.

Wait in i40iw_destroy_qp till all QP references are released and destroy
the QP and its associated resources before returning.  Switch to use the
refcount API vs atomic API for lifetime management of the qp.

 RIP: 0010:i40iw_sc_qp_destroy+0x4b/0x120 [i40iw]
 [...]
 RSP: 0018:ffffb4a7042e3ba8 EFLAGS: 00010002
 RAX: 0000000000000000 RBX: 0000000000000001 RCX: dead000000000122
 RDX: ffffb4a7042e3bac RSI: ffff8b7ef9b1e940 RDI: ffff8b7efbf09080
 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
 R10: 8080808080808080 R11: 0000000000000010 R12: ffff8b7efbf08050
 R13: 0000000000000001 R14: ffff8b7f15042928 R15: ffff8b7ef9b1e940
 FS:  0000000000000000(0000) GS:ffff8b7f2fa00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000400 CR3: 000000020d60a006 CR4: 00000000001606e0
 Call Trace:
  i40iw_exec_cqp_cmd+0x4d3/0x5c0 [i40iw]
  ? try_to_wake_up+0x1ea/0x5d0
  ? __switch_to_asm+0x40/0x70
  i40iw_process_cqp_cmd+0x95/0xa0 [i40iw]
  i40iw_handle_cqp_op+0x42/0x1a0 [i40iw]
  ? cm_event_handler+0x13c/0x1f0 [iw_cm]
  i40iw_rem_ref+0xa0/0xf0 [i40iw]
  cm_work_handler+0x99c/0xd10 [iw_cm]
  process_one_work+0x1a1/0x360
  worker_thread+0x30/0x380
  ? process_one_work+0x360/0x360
  kthread+0x10c/0x130
  ? kthread_park+0x80/0x80
  ret_from_fork+0x35/0x40

Fixes: d37498417947 ("i40iw: add files for iwarp interface")
Link: https://lore.kernel.org/r/20200916131811.2077-1-shiraz.saleem@intel.com
Reported-by: Kamal Heib <kheib@redhat.com>
Signed-off-by: Sindhu, Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/mlx5: Disable IB_DEVICE_MEM_MGT_EXTENSIONS if IB_WR_REG_MR can't work
Jason Gunthorpe [Mon, 14 Sep 2020 11:26:52 +0000 (14:26 +0300)]
RDMA/mlx5: Disable IB_DEVICE_MEM_MGT_EXTENSIONS if IB_WR_REG_MR can't work

[ Upstream commit 0ec52f0194638e2d284ad55eba5a7aff753de1b9 ]

set_reg_wr() always fails if !umr_modify_entity_size_disabled because
mlx5_ib_can_use_umr() always fails. Without set_reg_wr() IB_WR_REG_MR
doesn't work and that means the device should not advertise
IB_DEVICE_MEM_MGT_EXTENSIONS.

Fixes: 841b07f99a47 ("IB/mlx5: Block MR WR if UMR is not possible")
Link: https://lore.kernel.org/r/20200914112653.345244-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/mlx5: Make mkeys always owned by the kernel's PD when not enabled
Jason Gunthorpe [Mon, 14 Sep 2020 11:26:51 +0000 (14:26 +0300)]
RDMA/mlx5: Make mkeys always owned by the kernel's PD when not enabled

[ Upstream commit 5eb29f0d13a66502b91954597270003c90fb66c5 ]

Any mkey that is not enabled and assigned to userspace should have the PD
set to a kernel owned PD.

When cache entries are created for the first time the PDN is set to 0,
which is probably a kernel PD, but be explicit.

When a MR is registered using the hybrid reg_create with UMR xlt & enable
the disabled mkey is pointing at the user PD, keep it pointing at the
kernel until a UMR enables it and sets the user PD.

Fixes: 9ec4483a3f0f ("IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache")
Link: https://lore.kernel.org/r/20200914112653.345244-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/mlx5: Use set_mkc_access_pd_addr_fields() in reg_create()
Jason Gunthorpe [Mon, 14 Sep 2020 11:26:50 +0000 (14:26 +0300)]
RDMA/mlx5: Use set_mkc_access_pd_addr_fields() in reg_create()

[ Upstream commit 1c97ca3da0d12e0156a177f48ed3184c3f202002 ]

reg_create() open codes this helper, use the shared code.

Link: https://lore.kernel.org/r/20200914112653.345244-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/hns: Set the unsupported wr opcode
Lijun Ou [Thu, 17 Sep 2020 13:50:15 +0000 (21:50 +0800)]
RDMA/hns: Set the unsupported wr opcode

[ Upstream commit 22d3e1ed2cc837af87f76c3c8a4ccf4455e225c5 ]

hip06 does not support IB_WR_LOCAL_INV, so the ps_opcode should be set to
an invalid value instead of being left uninitialized.

Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver")
Fixes: a2f3d4479fe9 ("RDMA/hns: Avoid unncessary initialization")
Link: https://lore.kernel.org/r/1600350615-115217-1-git-send-email-oulijun@huawei.com
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/qedr: Fix resource leak in qedr_create_qp
Keita Suzuki [Fri, 11 Sep 2020 12:51:59 +0000 (12:51 +0000)]
RDMA/qedr: Fix resource leak in qedr_create_qp

[ Upstream commit 3e45410fe3c202ffb619f301beff0644f717e132 ]

When xa_insert() fails, the acquired resource in qedr_create_qp should
also be freed. However, current implementation does not handle the error.

Fix this by adding a new goto label that calls qedr_free_qp_resources.

Fixes: 1212767e23bb ("qedr: Add wrapping generic structure for qpidr and adjust idr routines.")
Link: https://lore.kernel.org/r/20200911125159.4577-1-keitasuzuki.park@sslab.ics.keio.ac.jp
Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp>
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoperf metricgroup: Fix uncore metric expressions
Ian Rogers [Thu, 17 Sep 2020 20:18:07 +0000 (13:18 -0700)]
perf metricgroup: Fix uncore metric expressions

[ Upstream commit dcc81be0fc4e66943041e6e19a5faf8f8704a27e ]

A metric like DRAM_BW_Use has on SkylakeX events uncore_imc/cas_count_read/
and uncore_imc/case_count_write/.

These events open 6 events per socket with pmu names of
uncore_imc_[0-5].

The current metric setup code in find_evsel_group assumes one ID will
map to 1 event to be recorded in metric_events.

For events with multiple matches, the first event is recorded in
metric_events (avoiding matching >1 event with the same name) and the
evlist_used updated so that duplicate events aren't removed when the
evlist has unused events removed.

Before this change:

  $ /tmp/perf/perf stat -M DRAM_BW_Use -a -- sleep 1

   Performance counter stats for 'system wide':

               41.14 MiB  uncore_imc/cas_count_read/
       1,002,614,251 ns   duration_time

         1.002614251 seconds time elapsed

After this change:

  $ /tmp/perf/perf stat -M DRAM_BW_Use -a -- sleep 1

   Performance counter stats for 'system wide':

              157.47 MiB  uncore_imc/cas_count_read/ #     0.00 DRAM_BW_Use
              126.97 MiB  uncore_imc/cas_count_write/
       1,003,019,728 ns   duration_time

Erroneous duplication introduced in:
commit 2440689d62e9 ("perf metricgroup: Remove duped metric group events").

Fixes: ded80bda8bc9 ("perf expr: Migrate expr ids table to a hashmap").
Reported-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20200917201807.4090224-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoperf intel-pt: Fix "context_switch event has no tid" error
Adrian Hunter [Wed, 9 Sep 2020 08:49:23 +0000 (11:49 +0300)]
perf intel-pt: Fix "context_switch event has no tid" error

[ Upstream commit 7d537a8d2e76bc4fc71e34545ceaa463ac2cd928 ]

A context_switch event can have no tid because pids can be detached from
a task while the task is still running (in do_exit()). Note this won't
happen with per-task contexts because then tracing stops at
perf_event_exit_task()

If a task with no tid gets preempted, or a dying task gets preempted and
its parent releases it, when it subsequently gets switched back in,
Intel PT will not be able to determine what task is running and prints
an error "context_switch event has no tid". However, it is not really an
error because the task is in kernel space and the decoder can continue
to decode successfully. Fix by changing the error to be only a logged
message, and make allowance for tid == -1.

Example:

  Using 5.9-rc4 with Preemptible Kernel (Low-Latency Desktop) e.g.
  $ uname -r
  5.9.0-rc4
  $ grep PREEMPT .config
  # CONFIG_PREEMPT_NONE is not set
  # CONFIG_PREEMPT_VOLUNTARY is not set
  CONFIG_PREEMPT=y
  CONFIG_PREEMPT_COUNT=y
  CONFIG_PREEMPTION=y
  CONFIG_PREEMPT_RCU=y
  CONFIG_PREEMPT_NOTIFIERS=y
  CONFIG_DRM_I915_PREEMPT_TIMEOUT=640
  CONFIG_DEBUG_PREEMPT=y
  # CONFIG_PREEMPT_TRACER is not set
  # CONFIG_PREEMPTIRQ_DELAY_TEST is not set

Before:

  $ cat forkit.c

  #include <sys/types.h>
  #include <unistd.h>
  #include <sys/wait.h>

  int main()
  {
          pid_t child;
          int status = 0;

          child = fork();
          if (child == 0)
                  return 123;
          wait(&status);
          return 0;
  }

  $ gcc -o forkit forkit.c
  $ sudo ~/bin/perf record --kcore -a -m,64M -e intel_pt/cyc/k &
  [1] 11016
  $ taskset 2 ./forkit
  $ sudo pkill perf
  $ [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 17.262 MB perf.data ]

  [1]+  Terminated              sudo ~/bin/perf record --kcore -a -m,64M -e intel_pt/cyc/k
  $ sudo ~/bin/perf script --show-task-events --show-switch-events --itrace=iqqe-o -C 1 --ns | grep -C 2 forkit
  context_switch event has no tid
           taskset 11019 [001] 66663.270045029:          1 instructions:k:  ffffffffb1d9f844 strnlen_user+0xb4 ([kernel.kallsyms])
           taskset 11019 [001] 66663.270201816:          1 instructions:k:  ffffffffb1a83121 unmap_page_range+0x561 ([kernel.kallsyms])
            forkit 11019 [001] 66663.270327553: PERF_RECORD_COMM exec: forkit:11019/11019
            forkit 11019 [001] 66663.270420028:          1 instructions:k:  ffffffffb1db9537 __clear_user+0x27 ([kernel.kallsyms])
            forkit 11019 [001] 66663.270648704:          1 instructions:k:  ffffffffb18829e6 do_user_addr_fault+0xf6 ([kernel.kallsyms])
            forkit 11019 [001] 66663.270833163:          1 instructions:k:  ffffffffb230a825 irqentry_exit_to_user_mode+0x15 ([kernel.kallsyms])
            forkit 11019 [001] 66663.271092359:          1 instructions:k:  ffffffffb1aea3d9 lock_page_memcg+0x9 ([kernel.kallsyms])
            forkit 11019 [001] 66663.271207092: PERF_RECORD_FORK(11020:11020):(11019:11019)
            forkit 11019 [001] 66663.271234775: PERF_RECORD_SWITCH_CPU_WIDE OUT          next pid/tid: 11020/11020
            forkit 11020 [001] 66663.271238407: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid: 11019/11019
            forkit 11020 [001] 66663.271312066:          1 instructions:k:  ffffffffb1a88140 handle_mm_fault+0x10 ([kernel.kallsyms])
            forkit 11020 [001] 66663.271476225: PERF_RECORD_EXIT(11020:11020):(11019:11019)
            forkit 11020 [001] 66663.271497488: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid: 11019/11019
            forkit 11019 [001] 66663.271500523: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid: 11020/11020
            forkit 11019 [001] 66663.271517241:          1 instructions:k:  ffffffffb24012cd error_entry+0x6d ([kernel.kallsyms])
            forkit 11019 [001] 66663.271664080: PERF_RECORD_EXIT(11019:11019):(1386:1386)

After:

  $ sudo ~/bin/perf script --show-task-events --show-switch-events --itrace=iqqe-o -C 1 --ns | grep -C 2 forkit
           taskset 11019 [001] 66663.270045029:          1 instructions:k:  ffffffffb1d9f844 strnlen_user+0xb4 ([kernel.kallsyms])
           taskset 11019 [001] 66663.270201816:          1 instructions:k:  ffffffffb1a83121 unmap_page_range+0x561 ([kernel.kallsyms])
            forkit 11019 [001] 66663.270327553: PERF_RECORD_COMM exec: forkit:11019/11019
            forkit 11019 [001] 66663.270420028:          1 instructions:k:  ffffffffb1db9537 __clear_user+0x27 ([kernel.kallsyms])
            forkit 11019 [001] 66663.270648704:          1 instructions:k:  ffffffffb18829e6 do_user_addr_fault+0xf6 ([kernel.kallsyms])
            forkit 11019 [001] 66663.270833163:          1 instructions:k:  ffffffffb230a825 irqentry_exit_to_user_mode+0x15 ([kernel.kallsyms])
            forkit 11019 [001] 66663.271092359:          1 instructions:k:  ffffffffb1aea3d9 lock_page_memcg+0x9 ([kernel.kallsyms])
            forkit 11019 [001] 66663.271207092: PERF_RECORD_FORK(11020:11020):(11019:11019)
            forkit 11019 [001] 66663.271234775: PERF_RECORD_SWITCH_CPU_WIDE OUT          next pid/tid: 11020/11020
            forkit 11020 [001] 66663.271238407: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid: 11019/11019
            forkit 11020 [001] 66663.271312066:          1 instructions:k:  ffffffffb1a88140 handle_mm_fault+0x10 ([kernel.kallsyms])
            forkit 11020 [001] 66663.271476225: PERF_RECORD_EXIT(11020:11020):(11019:11019)
            forkit 11020 [001] 66663.271497488: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt  next pid/tid: 11019/11019
            forkit 11019 [001] 66663.271500523: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid: 11020/11020
            forkit 11019 [001] 66663.271517241:          1 instructions:k:  ffffffffb24012cd error_entry+0x6d ([kernel.kallsyms])
            forkit 11019 [001] 66663.271664080: PERF_RECORD_EXIT(11019:11019):(1386:1386)
            forkit 11019 [001] 66663.271688752: PERF_RECORD_SWITCH_CPU_WIDE OUT          next pid/tid:    -1/-1
               :-1    -1 [001] 66663.271692086: PERF_RECORD_SWITCH_CPU_WIDE IN           prev pid/tid: 11019/11019
                :-1    -1 [001] 66663.271707466:          1 instructions:k:  ffffffffb18eb096 update_load_avg+0x306 ([kernel.kallsyms])

Fixes: 86c2786994bd7c ("perf intel-pt: Add support for PERF_RECORD_SWITCH")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lore.kernel.org/lkml/20200909084923.9096-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/cma: Fix use after free race in roce multicast join
Jason Gunthorpe [Wed, 2 Sep 2020 08:11:22 +0000 (11:11 +0300)]
RDMA/cma: Fix use after free race in roce multicast join

[ Upstream commit b5de0c60cc30c2a3513c7188c73f3f29acc29234 ]

The roce path triggers a work queue that continues to touch the id_priv
but doesn't hold any reference on it. Futher, unlike in the IB case, the
work queue is not fenced during rdma_destroy_id().

This can trigger a use after free if a destroy is triggered in the
incredibly narrow window after the queue_work and the work starting and
obtaining the handler_mutex.

The only purpose of this work queue is to run the ULP event callback from
the standard context, so switch the design to use the existing
cma_work_handler() scheme. This simplifies quite a lot of the flow:

- Use the cma_work_handler() callback to launch the work for roce. This
  requires generating the event synchronously inside the
  rdma_join_multicast(), which in turn means the dummy struct
  ib_sa_multicast can become a simple stack variable.

- cm_work_handler() used the id_priv kref, so we can entirely eliminate
  the kref inside struct cma_multicast. Since the cma_multicast never
  leaks into an unprotected work queue the kfree can be done at the same
  time as for IB.

- Eliminating the general multicast.ib requires using cma_set_mgid() in a
  few places to recompute the mgid.

Fixes: 3c86aa70bf67 ("RDMA/cm: Add RDMA CM support for IBoE devices")
Link: https://lore.kernel.org/r/20200902081122.745412-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/cma: Consolidate the destruction of a cma_multicast in one place
Jason Gunthorpe [Wed, 2 Sep 2020 08:11:21 +0000 (11:11 +0300)]
RDMA/cma: Consolidate the destruction of a cma_multicast in one place

[ Upstream commit 3788d2997bc0150ea911a964d5b5a2e11808a936 ]

Two places were open coding this sequence, and also pull in
cma_leave_roce_mc_group() which was called only once.

Link: https://lore.kernel.org/r/20200902081122.745412-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/cma: Remove dead code for kernel rdmacm multicast
Jason Gunthorpe [Wed, 2 Sep 2020 08:11:20 +0000 (11:11 +0300)]
RDMA/cma: Remove dead code for kernel rdmacm multicast

[ Upstream commit 1bb5091def706732c749df9aae45fbca003696f2 ]

There is no kernel user of RDMA CM multicast so this code managing the
multicast subscription of the kernel-only internal QP is dead. Remove it.

This makes the bug fixes in the next patches much simpler.

Link: https://lore.kernel.org/r/20200902081122.745412-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/cma: Combine cma_ndev_work with cma_work
Jason Gunthorpe [Wed, 2 Sep 2020 08:11:19 +0000 (11:11 +0300)]
RDMA/cma: Combine cma_ndev_work with cma_work

[ Upstream commit 7e85bcda8bfe883f4244672ed79f81b7762a1a7e ]

These are the same thing, except that cma_ndev_work doesn't have a state
transition. Signal no state transition by setting old_state and new_state
== 0.

In all cases the handler function should not be called once
rdma_destroy_id() has progressed passed setting the state.

Link: https://lore.kernel.org/r/20200902081122.745412-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/papr_scm: Fix warning triggered by perf_stats_show()
Vaibhav Jain [Sat, 12 Sep 2020 08:14:51 +0000 (13:44 +0530)]
powerpc/papr_scm: Fix warning triggered by perf_stats_show()

[ Upstream commit ca78ef2f08ccfa29b711d644964cdf9d7ace15e5 ]

A warning is reported by the kernel in case perf_stats_show() returns
an error code. The warning is of the form below:

 papr_scm ibm,persistent-memory:ibm,pmemory@44100001:
    Failed to query performance stats, Err:-10
 dev_attr_show: perf_stats_show+0x0/0x1c0 [papr_scm] returned bad count
 fill_read_buffer: dev_attr_show+0x0/0xb0 returned bad count

On investigation it looks like that the compiler is silently
truncating the return value of drc_pmem_query_stats() from 'long' to
'int', since the variable used to store the return code 'rc' is an
'int'. This truncated value is then returned back as a 'ssize_t' back
from perf_stats_show() to 'dev_attr_show()' which thinks of it as a
large unsigned number and triggers this warning..

To fix this we update the type of variable 'rc' from 'int' to
'ssize_t' that prevents the compiler from truncating the return value
of drc_pmem_query_stats() and returning correct signed value back from
perf_stats_show().

Fixes: 2d02bf835e57 ("powerpc/papr_scm: Fetch nvdimm performance stats from PHYP")
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200912081451.66225-1-vaibhav@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/64s/radix: Fix mm_cpumask trimming race vs kthread_use_mm
Nicholas Piggin [Mon, 14 Sep 2020 04:52:19 +0000 (14:52 +1000)]
powerpc/64s/radix: Fix mm_cpumask trimming race vs kthread_use_mm

[ Upstream commit a665eec0a22e11cdde708c1c256a465ebe768047 ]

Commit 0cef77c7798a7 ("powerpc/64s/radix: flush remote CPUs out of
single-threaded mm_cpumask") added a mechanism to trim the mm_cpumask of
a process under certain conditions. One of the assumptions is that
mm_users would not be incremented via a reference outside the process
context with mmget_not_zero() then go on to kthread_use_mm() via that
reference.

That invariant was broken by io_uring code (see previous sparc64 fix),
but I'll point Fixes: to the original powerpc commit because we are
changing that assumption going forward, so this will make backports
match up.

Fix this by no longer relying on that assumption, but by having each CPU
check the mm is not being used, and clearing their own bit from the mask
only if it hasn't been switched-to by the time the IPI is processed.

This relies on commit 38cf307c1f20 ("mm: fix kthread_use_mm() vs TLB
invalidate") and ARCH_WANT_IRQS_OFF_ACTIVATE_MM to disable irqs over mm
switch sequences.

Fixes: 0cef77c7798a7 ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>
Depends-on: 38cf307c1f20 ("mm: fix kthread_use_mm() vs TLB invalidate")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200914045219.3736466-5-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/kasan: Fix CONFIG_KASAN_VMALLOC for 8xx
Christophe Leroy [Fri, 11 Sep 2020 05:05:38 +0000 (05:05 +0000)]
powerpc/kasan: Fix CONFIG_KASAN_VMALLOC for 8xx

[ Upstream commit 4c42dc5c69a8f24c467a6c997909d2f1d4efdc7f ]

Before the commit identified below, pages tables allocation was
performed after the allocation of final shadow area for linear memory.
But that commit switched the order, leading to page tables being
already allocated at the time 8xx kasan_init_shadow_8M() is called.
Due to this, kasan_init_shadow_8M() doesn't map the needed
shadow entries because there are already page tables.

kasan_init_shadow_8M() installs huge PMD entries instead of page
tables. We could at that time free the page tables, but there is no
point in creating page tables that get freed before being used.

Only book3s/32 hash needs early allocation of page tables. For other
variants, we can keep the initial order and create remaining page
tables after the allocation of final shadow memory for linear mem.

Move back the allocation of shadow page tables for
CONFIG_KASAN_VMALLOC into kasan_init() after the loop which creates
final shadow memory for linear mem.

Fixes: 41ea93cf7ba4 ("powerpc/kasan: Fix shadow pages allocation failure")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8ae4554357da4882612644a74387ae05525b2aaa.1599800716.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/tau: Disable TAU between measurements
Finn Thain [Fri, 4 Sep 2020 23:02:20 +0000 (09:02 +1000)]
powerpc/tau: Disable TAU between measurements

[ Upstream commit e63d6fb5637e92725cf143559672a34b706bca4f ]

Enabling CONFIG_TAU_INT causes random crashes:

Unrecoverable exception 1700 at c0009414 (msr=1000)
Oops: Unrecoverable exception, sig: 6 [#1]
BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-pmac-00043-gd5f545e1a8593 #5
NIP:  c0009414 LR: c0009414 CTR: c00116fc
REGS: c0799eb8 TRAP: 1700   Not tainted  (5.7.0-pmac-00043-gd5f545e1a8593)
MSR:  00001000 <ME>  CR: 22000228  XER: 00000100

GPR00: 00000000 c0799f70 c076e300 00800000 0291c0ac 00e00000 c076e300 00049032
GPR08: 00000001 c00116fc 00000000 dfbd3200 ffffffff 007f80a8 00000000 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 c075ce04
GPR24: c075ce04 dfff8880 c07b0000 c075ce04 00080000 00000001 c079ef98 c079ef5c
NIP [c0009414] arch_cpu_idle+0x24/0x6c
LR [c0009414] arch_cpu_idle+0x24/0x6c
Call Trace:
[c0799f70] [00000001] 0x1 (unreliable)
[c0799f80] [c0060990] do_idle+0xd8/0x17c
[c0799fa0] [c0060ba4] cpu_startup_entry+0x20/0x28
[c0799fb0] [c072d220] start_kernel+0x434/0x44c
[c0799ff0] [00003860] 0x3860
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX 3d20c07b XXXXXXXX XXXXXXXX XXXXXXXX 7c0802a6
XXXXXXXX XXXXXXXX XXXXXXXX 4e800421 XXXXXXXX XXXXXXXX XXXXXXXX 7d2000a6
---[ end trace 3a0c9b5cb216db6b ]---

Resolve this problem by disabling each THRMn comparator when handling
the associated THRMn interrupt and by disabling the TAU entirely when
updating THRMn thresholds.

Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5a0ba3dc5612c7aac596727331284a3676c08472.1599260540.git.fthain@telegraphics.com.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/tau: Check processor type before enabling TAU interrupt
Finn Thain [Fri, 4 Sep 2020 23:02:20 +0000 (09:02 +1000)]
powerpc/tau: Check processor type before enabling TAU interrupt

[ Upstream commit 5e3119e15fed5b9a9a7e528665ff098a4a8dbdbc ]

According to Freescale's documentation, MPC74XX processors have an
erratum that prevents the TAU interrupt from working, so don't try to
use it when running on those processors.

Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c281611544768e758bd58fe812cf702a5bd2d042.1599260540.git.fthain@telegraphics.com.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/tau: Remove duplicated set_thresholds() call
Finn Thain [Fri, 4 Sep 2020 23:02:20 +0000 (09:02 +1000)]
powerpc/tau: Remove duplicated set_thresholds() call

[ Upstream commit 420ab2bc7544d978a5d0762ee736412fe9c796ab ]

The commentary at the call site seems to disagree with the code. The
conditional prevents calling set_thresholds() via the exception handler,
which appears to crash. Perhaps that's because it immediately triggers
another TAU exception. Anyway, calling set_thresholds() from TAUupdate()
is redundant because tau_timeout() does so.

Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d7c7ee33232cf72a6a6bbb6ef05838b2e2b113c0.1599260540.git.fthain@telegraphics.com.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/tau: Convert from timer to workqueue
Finn Thain [Fri, 4 Sep 2020 23:02:20 +0000 (09:02 +1000)]
powerpc/tau: Convert from timer to workqueue

[ Upstream commit b1c6a0a10bfaf36ec82fde6f621da72407fa60a1 ]

Since commit 19dbdcb8039cf ("smp: Warn on function calls from softirq
context") the Thermal Assist Unit driver causes a warning like the
following when CONFIG_SMP is enabled.

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 0 at kernel/smp.c:428 smp_call_function_many_cond+0xf4/0x38c
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-pmac #3
  NIP:  c00b37a8 LR: c00b3abc CTR: c001218c
  REGS: c0799c60 TRAP: 0700   Not tainted  (5.7.0-pmac)
  MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 42000224  XER: 00000000
  GPR00: c00b3abc c0799d18 c076e300 c079ef5c c0011fec 00000000 00000000 00000000
  GPR08: 00000100 00000100 00008000 ffffffff 42000224 00000000 c079d040 c079d044
  GPR16: 00000001 00000000 00000004 c0799da0 c079f054 c07a0000 c07a0000 00000000
  GPR24: c0011fec 00000000 c079ef5c c079ef5c 00000000 00000000 00000000 00000000
  NIP [c00b37a8] smp_call_function_many_cond+0xf4/0x38c
  LR [c00b3abc] on_each_cpu+0x38/0x68
  Call Trace:
  [c0799d18] [ffffffff] 0xffffffff (unreliable)
  [c0799d68] [c00b3abc] on_each_cpu+0x38/0x68
  [c0799d88] [c0096704] call_timer_fn.isra.26+0x20/0x7c
  [c0799d98] [c0096b40] run_timer_softirq+0x1d4/0x3fc
  [c0799df8] [c05b4368] __do_softirq+0x118/0x240
  [c0799e58] [c0039c44] irq_exit+0xc4/0xcc
  [c0799e68] [c000ade8] timer_interrupt+0x1b0/0x230
  [c0799ea8] [c0013520] ret_from_except+0x0/0x14
  --- interrupt: 901 at arch_cpu_idle+0x24/0x6c
      LR = arch_cpu_idle+0x24/0x6c
  [c0799f70] [00000001] 0x1 (unreliable)
  [c0799f80] [c0060990] do_idle+0xd8/0x17c
  [c0799fa0] [c0060ba8] cpu_startup_entry+0x24/0x28
  [c0799fb0] [c072d220] start_kernel+0x434/0x44c
  [c0799ff0] [00003860] 0x3860
  Instruction dump:
  8129f204 2f890000 40beff98 3d20c07a 8929eec4 2f890000 40beff88 0fe00000
  81220000 552805de 550802ef 4182ff84 <0fe000003860ffff 7f65db78 7f44d378
  ---[ end trace 34a886e47819c2eb ]---

Don't call on_each_cpu() from a timer callback, call it from a worker
thread instead.

Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bb61650bea4f4c91fb8e24b9a6f130a1438651a7.1599260540.git.fthain@telegraphics.com.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/tau: Use appropriate temperature sample interval
Finn Thain [Fri, 4 Sep 2020 23:02:20 +0000 (09:02 +1000)]
powerpc/tau: Use appropriate temperature sample interval

[ Upstream commit 66943005cc41f48e4d05614e8f76c0ca1812f0fd ]

According to the MPC750 Users Manual, the SITV value in Thermal
Management Register 3 is 13 bits long. The present code calculates the
SITV value as 60 * 500 cycles. This would overflow to give 10 us on
a 500 MHz CPU rather than the intended 60 us. (But according to the
Microprocessor Datasheet, there is also a factor of 266 that has to be
applied to this value on certain parts i.e. speed sort above 266 MHz.)
Always use the maximum cycle count, as recommended by the Datasheet.

Fixes: 1da177e4c3f41 ("Linux-2.6.12-rc2")
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/896f542e5f0f1d6cf8218524c2b67d79f3d69b3c.1599260540.git.fthain@telegraphics.com.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/book3s64/hash/4k: Support large linear mapping range with 4K
Aneesh Kumar K.V [Mon, 8 Jun 2020 07:09:03 +0000 (12:39 +0530)]
powerpc/book3s64/hash/4k: Support large linear mapping range with 4K

[ Upstream commit 7746406baa3bc9e23fdd7b7da2f04d86e25ab837 ]

With commit: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel
regions in the same 0xc range"), we now split the 64TB address range
into 4 contexts each of 16TB. That implies we can do only 16TB linear
mapping.

On some systems, eg. Power9, memory attached to nodes > 0 will appear
above 16TB in the linear mapping. This resulted in kernel crash when
we boot such systems in hash translation mode with 4K PAGE_SIZE.

This patch updates the kernel mapping such that we now start supporting upto
61TB of memory with 4K. The kernel mapping now looks like below 4K PAGE_SIZE
and hash translation.

    vmalloc start     = 0xc0003d0000000000
    IO start          = 0xc0003e0000000000
    vmemmap start     = 0xc0003f0000000000

Our MAX_PHYSMEM_BITS for 4K is still 64TB even though we can only map 61TB.
We prevent bolt mapping anything outside 61TB range by checking against
H_VMALLOC_START.

Fixes: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range")
Reported-by: Cameron Berkenpas <cam@neo-zeon.de>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200608070904.387440-3-aneesh.kumar@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/watchpoint: Add hw_len wherever missing
Ravi Bangoria [Wed, 2 Sep 2020 04:29:43 +0000 (09:59 +0530)]
powerpc/watchpoint: Add hw_len wherever missing

[ Upstream commit 58da5984d2ea6d95f3f9d9e8dd9f7e1b0dddfb3c ]

There are couple of places where we set len but not hw_len. For
ptrace/perf watchpoints, when CONFIG_HAVE_HW_BREAKPOINT=Y, hw_len
will be calculated and set internally while parsing watchpoint.
But when CONFIG_HAVE_HW_BREAKPOINT=N, we need to manually set
'hw_len'. Similarly for xmon as well, hw_len needs to be set
directly.

Fixes: b57aeab811db ("powerpc/watchpoint: Fix length calculation for unaligned target")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200902042945.129369-7-ravi.bangoria@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/watchpoint: Fix handling of vector instructions
Ravi Bangoria [Wed, 2 Sep 2020 04:29:39 +0000 (09:59 +0530)]
powerpc/watchpoint: Fix handling of vector instructions

[ Upstream commit 4441eb02333a9b46a0d919aa7a6d3b137b5f2562 ]

Vector load/store instructions are special because they are always
aligned. Thus unaligned EA needs to be aligned down before comparing
it with watch ranges. Otherwise we might consider valid event as
invalid.

Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200902042945.129369-3-ravi.bangoria@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/watchpoint: Fix quadword instruction handling on p10 predecessors
Ravi Bangoria [Wed, 2 Sep 2020 04:29:38 +0000 (09:59 +0530)]
powerpc/watchpoint: Fix quadword instruction handling on p10 predecessors

[ Upstream commit 4759c11ed20454b7b36db4ec15f7d5aa1519af4a ]

On p10 predecessors, watchpoint with quadword access is compared at
quadword length. If the watch range is doubleword or less than that
in a first half of quadword aligned 16 bytes, and if there is any
unaligned quadword access which will access only the 2nd half, the
handler should consider it as extraneous and emulate/single-step it
before continuing.

Fixes: 74c6881019b7 ("powerpc/watchpoint: Prepare handler to handle more than one watchpoint")
Reported-by: Pedro Miraglia Franco de Carvalho <pedromfc@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200902042945.129369-2-ravi.bangoria@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agopowerpc/pseries/svm: Allocate SWIOTLB buffer anywhere in memory
Thiago Jung Bauermann [Tue, 18 Aug 2020 22:11:26 +0000 (19:11 -0300)]
powerpc/pseries/svm: Allocate SWIOTLB buffer anywhere in memory

[ Upstream commit eae9eec476d13fad9af6da1f44a054ee02b7b161 ]

POWER secure guests (i.e., guests which use the Protected Execution
Facility) need to use SWIOTLB to be able to do I/O with the
hypervisor, but they don't need the SWIOTLB memory to be in low
addresses since the hypervisor doesn't have any addressing limitation.

This solves a SWIOTLB initialization problem we are seeing in secure
guests with 128 GB of RAM: they are configured with 4 GB of
crashkernel reserved memory, which leaves no space for SWIOTLB in low
addresses.

To do this, we use mostly the same code as swiotlb_init(), but
allocate the buffer using memblock_alloc() instead of
memblock_alloc_low().

Fixes: 2efbc58f157a ("powerpc/pseries/svm: Force SWIOTLB for secure guests")
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200818221126.391073-1-bauerman@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/qedr: Fix inline size returned for iWARP
Michal Kalderon [Wed, 2 Sep 2020 16:57:40 +0000 (19:57 +0300)]
RDMA/qedr: Fix inline size returned for iWARP

[ Upstream commit fbf58026b2256e9cd5f241a4801d79d3b2b7b89d ]

commit 59e8970b3798 ("RDMA/qedr: Return max inline data in QP query
result") changed query_qp max_inline size to return the max roce inline
size.  When iwarp was introduced, this should have been modified to return
the max inline size based on protocol.  This size is cached in the device
attributes

Fixes: 69ad0e7fe845 ("RDMA/qedr: Add support for iWARP in user space")
Link: https://lore.kernel.org/r/20200902165741.8355-8-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/qedr: Fix return code if accept is called on a destroyed qp
Michal Kalderon [Wed, 2 Sep 2020 16:57:37 +0000 (19:57 +0300)]
RDMA/qedr: Fix return code if accept is called on a destroyed qp

[ Upstream commit 8a5a10a1a74465065c75d9de1aa6685e1f1aa117 ]

In iWARP, accept could be called after a QP is already destroyed.  In this
case an error should be returned and not success.

Fixes: 82af6d19d8d9 ("RDMA/qedr: Fix synchronization methods and memory leaks in qedr")
Link: https://lore.kernel.org/r/20200902165741.8355-5-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/qedr: Fix use of uninitialized field
Michal Kalderon [Wed, 2 Sep 2020 16:57:36 +0000 (19:57 +0300)]
RDMA/qedr: Fix use of uninitialized field

[ Upstream commit a379ad54e55a12618cae7f6333fd1b3071de9606 ]

dev->attr.page_size_caps was used uninitialized when setting device
attributes

Fixes: ec72fce401c6 ("qedr: Add support for RoCE HW init")
Link: https://lore.kernel.org/r/20200902165741.8355-4-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/qedr: Fix doorbell setting
Michal Kalderon [Wed, 2 Sep 2020 16:57:35 +0000 (19:57 +0300)]
RDMA/qedr: Fix doorbell setting

[ Upstream commit 0b1eddc1964351cd5ce57aff46853ed4ce9ebbff ]

Change the doorbell setting so that the maximum value between the last and
current value is set. This is to avoid doorbells being lost.

Fixes: a7efd7773e31 ("qedr: Add support for PD,PKEY and CQ verbs")
Link: https://lore.kernel.org/r/20200902165741.8355-3-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/qedr: Fix qp structure memory leak
Michal Kalderon [Wed, 2 Sep 2020 16:57:34 +0000 (19:57 +0300)]
RDMA/qedr: Fix qp structure memory leak

[ Upstream commit 098e345a1a8faaad6e4e54d138773466cecc45d4 ]

The qedr_qp structure wasn't freed when the protocol was RoCE.  kmemleak
output when running basic RoCE scenario.

unreferenced object 0xffff927ad7e22c00 (size 1024):
  comm "ib_send_bw", pid 7082, jiffies 4384133693 (age 274.698s)
  hex dump (first 32 bytes):
    00 b0 cd a2 79 92 ff ff 00 3f a1 a2 79 92 ff ff  ....y....?..y...
    00 ee 5c dd 80 92 ff ff 00 f6 5c dd 80 92 ff ff  ..\.......\.....
  backtrace:
    [<00000000b2ba0f35>] qedr_create_qp+0xb3/0x6c0 [qedr]
    [<00000000e85a43dd>] ib_uverbs_handler_UVERBS_METHOD_QP_CREATE+0x555/0xad0 [ib_uverbs]
    [<00000000fee4d029>] ib_uverbs_cmd_verbs+0xa5a/0xb80 [ib_uverbs]
    [<000000005d622660>] ib_uverbs_ioctl+0xa4/0x110 [ib_uverbs]
    [<00000000eb4cdc71>] ksys_ioctl+0x87/0xc0
    [<00000000abe6b23a>] __x64_sys_ioctl+0x16/0x20
    [<0000000046e7cef4>] do_syscall_64+0x4d/0x90
    [<00000000c6948f76>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 1212767e23bb ("qedr: Add wrapping generic structure for qpidr and adjust idr routines.")
Link: https://lore.kernel.org/r/20200902165741.8355-2-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
4 years agoRDMA/umem: Prevent small pages from being returned by ib_umem_find_best_pgsz()
Jason Gunthorpe [Fri, 4 Sep 2020 22:41:43 +0000 (19:41 -0300)]
RDMA/umem: Prevent small pages from being returned by ib_umem_find_best_pgsz()

[ Upstream commit 10c75ccb54e4fe548cb16d7ed426d7d709e6ae76 ]

rdma_for_each_block() makes assumptions about how the SGL is constructed
that don't work if the block size is below the page size used to to build
the SGL.

The rules for umem SGL construction require that the SG's all be PAGE_SIZE
aligned and we don't encode the actual byte offset of the VA range inside
the SGL using offset and length. So rdma_for_each_block() has no idea
where the actual starting/ending point is to compute the first/last block
boundary if the starting address should be within a SGL.

Fixing the SGL construction turns out to be really hard, and will be the
subject of other patches. For now block smaller pages.

Fixes: 4a35339958f1 ("RDMA/umem: Add API to find best driver supported page size in an MR")
Link: https://lore.kernel.org/r/2-v2-270386b7e60b+28f4-umem_1_jgg@nvidia.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>