Miquel Raynal [Mon, 5 Dec 2022 14:40:59 +0000 (15:40 +0100)]
Merge tag 'spi-nor/for-6.2' into mtd/next
SPI NOR core changes:
* Add support for flash reset using the dt reset-gpios property.
* Update hwcaps.mask to include 8D-8D-8D read and page program ops
when xSPI profile 1.0 table is defined.
* Bypass zero erase size in spi_nor_find_best_erase_type().
* Fix select_uniform_erase to skip 0 erase size
* Add generic flash driver. If a flash is not found in the flash_info
array, fall back to the generic flash driver which is described solely
by the flash's SFDP tables.
* Fix the number of bytes for the dummy cycles in
spi_nor_spimem_check_readop().
* Introduce SPI_NOR_QUAD_PP flag, as PP_1_1_4 is not SFDP discoverable.
SPI NOR manufacturer drivers changes:
* Spansion:
- use PARSE_SFDP for s28hs512t,
- add support for s28hl512t, s28hl01gt, and s28hs01gt.
* Gigadevice: Replace default_init() with post_bfpt() for gd25q256.
* Micron - ST: Enable locking for mt25qu256a.
* Winbond: Add support for W25Q512NW-IQ.
* ISSI: Use PARSE_SFDP and SPI_NOR_QUAD_PP.
Miquel Raynal [Mon, 5 Dec 2022 14:37:27 +0000 (15:37 +0100)]
Merge tag 'nand/for-6.2' into mtd/next
Raw NAND core changes:
* Drop obsolete dependencies on COMPILE_TEST
* MAINTAINERS: rectify entry for MESON NAND controller bindings
* Drop EXPORT_SYMBOL_GPL for nanddev_erase()
Raw NAND driver changes:
* marvell: Enable NFC/DEVBUS arbiter
* gpmi: Use pm_runtime_resume_and_get instead of pm_runtime_get_sync
* mpc5121: Replace NO_IRQ by 0
* lpc32xx_{slc,mlc}:
- Switch to using pm_ptr()
- Switch to using gpiod API
* lpc32xx_mlc: Switch to using pm_ptr()
* cadence: Support 64-bit slave dma interface
* rockchip: Describe rk3128-nfc in the bindings
* brcmnand: Update interrupts description in the bindings
Jean Delvare [Thu, 24 Nov 2022 10:59:46 +0000 (11:59 +0100)]
mtd: rawnand: Drop obsolete dependencies on COMPILE_TEST
Since commit 0166dc11be91 ("of: make CONFIG_OF user selectable"), it
is possible to test-build any driver which depends on OF on any
architecture by explicitly selecting OF. Therefore depending on
COMPILE_TEST as an alternative is no longer needed.
It is actually better to always build such drivers with OF enabled,
so that the test builds are closer to how each driver will actually be
built on its intended target. Building them without OF may not test
much as the compiler will optimize out potentially large parts of the
code. In the worst case, this could even pop false positive warnings.
Dropping COMPILE_TEST here improves the quality of our testing and
avoids wasting time on non-existent issues.
Shang XiaoJing [Sat, 19 Nov 2022 06:39:15 +0000 (14:39 +0800)]
mtd: core: Fix refcount error in del_mtd_device()
del_mtd_device() will call of_node_put() to mtd_get_of_node(mtd), which
is mtd->dev.of_node. However, memset(&mtd->dev, 0) is called before
of_node_put(). As the result, of_node_put() won't do anything in
del_mtd_device(), and causes the refcount leak.
mtd: spi-nor: add SFDP fixups for Quad Page Program
SFDP table of some flash chips do not advertise support of Quad Input
Page Program even though it has support. Use flags and add hardware
cap for these chips.
mtd: spi-nor: issi: is25wp256: Init flash based on SFDP
The datasheet of is25wp256 says it supports SFDP. Get rid of the static
initialization of the flash parameters and init them when parsing SFDP.
Testing showed the flash using SPINOR_OP_READ_1_1_4_4B 0x6c,
SPINOR_OP_PP_4B 0x12 and SPINOR_OP_BE_4K_4B 0x21 before enabling SFDP.
After this patch, it parses the SFDP information and still uses the
same opcodes.
Set sector_size and n_sectors to zero as they will be discovered when
parsing SFDP.
Eliav Farber [Thu, 20 Oct 2022 09:20:58 +0000 (09:20 +0000)]
mtd: spi-nor: micron-st: Enable locking for mt25qu256a
mt25qu256a [1] uses the 4 bit Block Protection scheme and supports
Top/Bottom protection via the BP and TB bits of the Status Register.
BP3 is located in bit 6 of the Status Register.
Tested on MT25QU256ABA8ESF-0SIT.
Allen-KH Cheng [Mon, 31 Oct 2022 12:46:33 +0000 (20:46 +0800)]
mtd: spi-nor: Fix the number of bytes for the dummy cycles
The number of bytes used by spi_nor_spimem_check_readop() may be
incorrect for the dummy cycles. Since nor->read_dummy is not initialized
before spi_nor_spimem_adjust_hwcaps().
We use both mode and wait state clock cycles instead of nor->read_dummy.
Fixes: 0e30f47232ab ("mtd: spi-nor: add support for DTR protocol") Co-developed-by: Bayi Cheng <bayi.cheng@mediatek.com> Signed-off-by: Bayi Cheng <bayi.cheng@mediatek.com> Signed-off-by: Allen-KH Cheng <allen-kh.cheng@mediatek.com> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Tested-by: Dhruva Gole <d-gole@ti.com> Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Link: https://lore.kernel.org/r/20221031124633.13189-1-allen-kh.cheng@mediatek.com
Yaliang Wang [Sun, 16 Oct 2022 17:19:01 +0000 (01:19 +0800)]
mtd: spi-nor: gigadevice: gd25q256: replace gd25q256_default_init with gd25q256_post_bfpt
When utilizing PARSE_SFDP to initialize the flash parameter, the
deprecated initializing method spi_nor_init_params_deprecated() and the
function spi_nor_manufacturer_init_params() within it will never be
executed, which results in the default_init hook function will also never
be executed.
This is okay for 'D' generation of GD25Q256, because 'D' generation is
implementing the JESD216B standards, it has QER field defined in BFPT,
parsing the SFDP can properly set the quad_enable function. The 'E'
generation also implements the JESD216B standards, and it has the same
status register definitions as 'D' generation, parsing the SFDP to set
the quad_enable function should also work for 'E' generation.
However, the same thing can't apply to 'C' generation. 'C' generation
'GD25Q256C' implements the JESD216 standards, and it doesn't have the
QER field defined in BFPT, since it does have QE bit in status register
1, the quad_enable hook needs to be tweaked to properly set the
quad_enable function, this can be done in post_bfpt fixup hook.
Fixes: 047275f7de18 ("mtd: spi-nor: gigadevice: gd25q256: Init flash based on SFDP") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Yaliang Wang <Yaliang.Wang@windriver.com>
[tudor.ambarus@microchip.com: Update comment in gd25q256_post_bfpt] Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221016171901.1483542-2-yaliang.wang@windriver.com
Michael Walle [Wed, 10 Aug 2022 22:06:53 +0000 (00:06 +0200)]
mtd: spi-nor: add generic flash driver
Our SFDP parsing is everything we need to support all basic operations
of a flash device. If the flash isn't found in our in-kernel flash
database, gracefully fall back to a driver described solely by its SFDP
tables.
Michael Walle [Wed, 10 Aug 2022 22:06:52 +0000 (00:06 +0200)]
mtd: spi-nor: fix select_uniform_erase to skip 0 erase size
4bait will set the erase size to 0 if there is no corresponding
opcode for the 4byte erase. Fix spi_nor_select_uniform_erase to skip
the 0 erase size to avoid mtd device registration failure cases.
Michael Walle [Wed, 10 Aug 2022 22:06:50 +0000 (00:06 +0200)]
mtd: spi-nor: remember full JEDEC flash ID
At the moment, we print the JEDEC ID that is stored in our database. The
generic flash support won't have such an entry in our database. To find
out the JEDEC ID later we will have to cache it. There is also another
advantage: If the flash is found in the database, the ID could be
truncated because the ID of the entry is used which can be shorter. Some
flashes still holds valuable information in the bytes after the JEDEC ID
and come in handy during debugging of when coping with INFO6() entries.
These are not accessible for now.
Save a copy of the ID bytes after reading and display it via debugfs.
Alexander Sverdlin [Fri, 19 Nov 2021 08:14:12 +0000 (09:14 +0100)]
mtd: spi-nor: Check for zero erase size in spi_nor_find_best_erase_type()
Erase can be zeroed in spi_nor_parse_4bait() or
spi_nor_init_non_uniform_erase_map(). In practice it happened with
mt25qu256a, which supports 4K, 32K, 64K erases with 3b address commands,
but only 4K and 64K erase with 4b address commands.
Fixes: dc92843159a7 ("mtd: spi-nor: fix erase_type array to indicate current map conf") Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211119081412.29732-1-alexander.sverdlin@nokia.com
Hamish Martin [Wed, 9 Nov 2022 23:13:25 +0000 (12:13 +1300)]
mtd: rawnand: marvell: Enable NFC/DEVBUS arbiter
The CN9130 SoC (an ARMADA 8K type) has both a NAND Flash Controller and
a generic local bus controller (Device Bus Controller) that share common
pins.
With a board design that incorporates both a NAND flash and uses
the Device Bus (in our case for an SRAM) accessing the Device Bus device
fails unless the NfArbiterEn bit is set. Setting the bit enables
arbitration between the Device Bus and the NAND flash.
Since there is no obvious downside in enabling this for designs that
don't require arbitration, we always enable it.
Lukas Bulwahn [Wed, 16 Nov 2022 12:49:32 +0000 (13:49 +0100)]
mtd: parsers: refer to ARCH_BCMBCA instead of ARCH_BCM4908
Commit dd5c672d7ca9 ("arm64: bcmbca: Merge ARCH_BCM4908 to ARCH_BCMBCA")
removes config ARCH_BCM4908 as config ARCH_BCMBCA has the same intent.
Probably due to concurrent development, commit 002181f5b150 ("mtd: parsers:
add Broadcom's U-Boot parser") introduces 'Broadcom's U-Boot partition
parser' that depends on ARCH_BCM4908, but this use was not visible during
the config refactoring from the commit above. Hence, these two changes
create a reference to a non-existing config symbol.
Adjust the MTD_BRCM_U_BOOT definition to refer to ARCH_BCMBCA instead of
ARCH_BCM4908 to remove the reference to the non-existing config symbol
ARCH_BCM4908.
The schema for 'sercomm,scpart-id' is broken. The 'if' condition is
never true because 'compatible' is in the parent node, not the child
node the sub-schema applies to. The example passes as there are no
constraints on additional/unevaluated properties. That's a secondary
issue which is complicated due to nested partitions.
Drop the if/then schema and the unnecessary 'allOf' so that the
'sercomm,scpart-id' property is at least defined.
Miquel Raynal [Mon, 14 Nov 2022 09:03:12 +0000 (10:03 +0100)]
dt-bindings: mtd: nvmem-cells: Inherit from MTD partitions
The aim of MTD nvmem-cells is to treat MTD partitions as NVMEM
providers. Hence, MTD partition properties are valid here. Let's
reference mtd/partition.yaml which gives us a chance to drop
"additionalProperties: true" in favor of "unevaluatedProperties:
false".
Miquel Raynal [Mon, 14 Nov 2022 09:03:11 +0000 (10:03 +0100)]
dt-bindings: mtd: nvmem-cells: Drop range property from example
Memory mapped devices such as parallel NOR flash could make use of the
'ranges' property to translate a nvmem 'reg' cell address to a CPU
address but in practice there is no upstream user nor any declaration of
this property being valid in this case yet, leading to a warning when
constraining a bit more the schema:
.../mtd/partitions/nvmem-cells.example.dtb: calibration@f00000:
Unevaluated properties are not allowed ('ranges' was unexpected)
So let's drop the property from the example, knowing that someone might
actually properly define it some day.
Miquel Raynal [Mon, 14 Nov 2022 09:03:10 +0000 (10:03 +0100)]
dt-bindings: mtd: partitions: Change qcom,smem-part partition type
As described in dd638202dfb6 ("dt-bindings: mtd: partitions: add additional
example for qcom,smem-part"), the aim of documenting the subnodes was to be
able to declare nvmem cells. Hence, the partition property does not
really apply directly here, let's instead reference nvmem-cells.yaml
first.
Miquel Raynal [Mon, 14 Nov 2022 09:03:09 +0000 (10:03 +0100)]
dt-bindings: mtd: partitions: Constrain the list of parsers
Parser compatibles cannot be used anywhere, and the list is limited. In
order to constrain this list, enumerate them all under the top
"partitions" subnode. New parsers will have to add their own compatible
here as well.
Miquel Raynal [Mon, 14 Nov 2022 09:03:08 +0000 (10:03 +0100)]
dt-bindings: mtd: physmap: Reuse the generic definitions
The memory mapped MTD devices also share a lot with all the other MTD
devices, so let's share the properties by referencing mtd.yaml. We can
then drop mentioning the properties, to the cost of mentioning the
possible "sram" node name prefix.
Miquel Raynal [Mon, 14 Nov 2022 09:03:07 +0000 (10:03 +0100)]
dt-bindings: mtd: spi-nor: Drop common properties
When redefining common properties does not bring any additional
information, just drop them from the SPI-NOR bindings because these
properties already are definied in mtd.yaml.
Miquel Raynal [Mon, 14 Nov 2022 09:03:05 +0000 (10:03 +0100)]
dt-bindings: mtd: onenand: Mention the expected node name
The chip node name in this driver is expected to be different and should
be prefixed with onenand instead of the regular "flash" string, so
mention it.
Miquel Raynal [Mon, 14 Nov 2022 09:03:04 +0000 (10:03 +0100)]
dt-bindings: mtd: ingenic: Mark partitions in the controller node as deprecated
Defining partitions as subnodes of the controller has been deprecated
long time ago, but unlike having partitions within the controller node,
having an enveloppe named "partitions" (which is not itself within a
chip subnode) is not that common, so keep this deprecated definition in
this file.
Miquel Raynal [Mon, 14 Nov 2022 09:03:03 +0000 (10:03 +0100)]
dt-bindings: mtd: nand: Standardize the child node name
In almost all the schema mentioning a NAND chip child node, the name of
the subnode contains a single index number.
In practice there are currently no controller supporting more than 8 cs
so even the [a-f] numbers are not needed. But let's be safe and limit
the number of touched files by just allow a single number everywhere, so
in practice up to 16 CS at most. This value can anyway be limited in
each schema.
Miquel Raynal [Mon, 14 Nov 2022 09:03:02 +0000 (10:03 +0100)]
dt-bindings: mtd: nand: Drop common properties already defined in generic files
generic files, so let's drop these properties from the individual NAND
controller bindings when no additional information is provided rather
than the possible presence of the property.
Miquel Raynal [Mon, 14 Nov 2022 09:03:01 +0000 (10:03 +0100)]
dt-bindings: mtd: nand-chip: Reference mtd.yaml
A NAND chip is an MTD device. mtd.yaml already defines many useful and
relevant properties, let's reference this file here to get access to
these additional property definitions.
Miquel Raynal [Mon, 14 Nov 2022 09:02:59 +0000 (10:02 +0100)]
dt-bindings: mtd: Clarify all partition subnodes
Over time the various ways to define MTD partitions has evolved. Most of
the controllers support several different bindings. Let's define all
possible choices in one file and mark the legacy ones deprecated. This
way, we can just reference this file and avoid dupplicating these
definitions.
TP-Link SafeLoader partitioning means flash contains multiple partitions
defined in the on-flash table. Some of those partitions may have a
special meaning and may require describing additionally. Allow that.
Geert Uytterhoeven [Thu, 27 Oct 2022 13:10:28 +0000 (15:10 +0200)]
mtd: rawnand: lpc32xx_slc: Switch to using pm_ptr()
The switch to using the gpiod API removed the last user of
lpc32xx_wp_disable() outside #ifdef CONFIG_PM, causing build failures if
CONFIG_PM=n:
drivers/mtd/nand/raw/lpc32xx_slc.c:318:13: error: ‘lpc32xx_wp_disable’ defined but not used [-Werror=unused-function]
318 | static void lpc32xx_wp_disable(struct lpc32xx_nand_host *host)
| ^~~~~~~~~~~~~~~~~~
Fix this by switching from #ifdef CONFIG_PM to pm_ptr(), increasing
compile-coverage as a side-effect.
Geert Uytterhoeven [Thu, 27 Oct 2022 13:10:27 +0000 (15:10 +0200)]
mtd: rawnand: lpc32xx_mlc: Switch to using pm_ptr()
The switch to using the gpiod API removed the last user of
lpc32xx_wp_disable() outside #ifdef CONFIG_PM, causing build failures if
CONFIG_PM=n:
drivers/mtd/nand/raw/lpc32xx_mlc.c:380:13: error: ‘lpc32xx_wp_disable’ defined but not used [-Werror=unused-function]
380 | static void lpc32xx_wp_disable(struct lpc32xx_nand_host *host)
| ^~~~~~~~~~~~~~~~~~
Fix this by switching from #ifdef CONFIG_PM to pm_ptr(), increasing
compile-coverage as a side-effect.
Gaosheng Cui [Mon, 24 Oct 2022 06:51:09 +0000 (14:51 +0800)]
mtd: core: fix possible resource leak in init_mtd()
I got the error report while inject fault in init_mtd():
sysfs: cannot create duplicate filename '/devices/virtual/bdi/mtd-0'
Call Trace:
<TASK>
dump_stack_lvl+0x67/0x83
sysfs_warn_dup+0x60/0x70
sysfs_create_dir_ns+0x109/0x120
kobject_add_internal+0xce/0x2f0
kobject_add+0x98/0x110
device_add+0x179/0xc00
device_create_groups_vargs+0xf4/0x100
device_create+0x7b/0xb0
bdi_register_va.part.13+0x58/0x2d0
bdi_register+0x9b/0xb0
init_mtd+0x62/0x171 [mtd]
do_one_initcall+0x6c/0x3c0
do_init_module+0x58/0x222
load_module+0x268e/0x27d0
__do_sys_finit_module+0xd5/0x140
do_syscall_64+0x37/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
</TASK>
kobject_add_internal failed for mtd-0 with -EEXIST, don't try to register
things with the same name in the same directory.
Error registering mtd class or bdi: -17
If init_mtdchar() fails in init_mtd(), mtd_bdi will not be unregistered,
as a result, we can't load the mtd module again, to fix this by calling
bdi_unregister(mtd_bdi) after out_procfs label.
Rafał Miłecki [Sat, 22 Oct 2022 21:13:18 +0000 (23:13 +0200)]
mtd: core: set ROOT_DEV for partitions marked as rootfs in DT
This adds support for "linux,rootfs" binding that is used to mark flash
partition containing rootfs. It's useful for devices using device tree
that don't have bootloader passing root info in cmdline.
Rafał Miłecki [Sat, 22 Oct 2022 21:13:17 +0000 (23:13 +0200)]
dt-bindings: mtd: partitions: support marking rootfs partition
Linux needs to know what to use as root device. On embedded devices with
flash the only common way to specify that is cmdline & root= parameter.
That solution works with U-Boot which is Linux & cmdline aware but isn't
available with all market bootloaders. Also that method is fragile:
1. Requires specific probing order on multi-flash devices
2. Uses hardcoded partitions indexes
A lot of devices use different partitioning methods. It may be
"fixed-partitions" or some dynamic partitioning (e.g. based on parts
table). For such cases allow "linux,rootfs" property to mark correct
flash partition.
On 64 bit systems, the highest 32 bits of the "offset" variable are
not initialized. Also the existing code is not endian safe (it will
fail on big endian systems). Change the type of "offset" to a u32.
Most TP-Link home routers use the same partitioning system based on a
custom ASCII table.
It doesn't seem to have any official name. GPL sources contain tool
named simply "make_flash" and Makefile target "FlashMaker".
This partitions table format was first found in devices with a custom
SafeLoader bootloader so it was called SafeLoader by a community. Later
it was ported to other bootloaders but it seems the name sticked.
Add binding for describing flashes with SafeLoader partitions table. It
allows operating systems to parse it properly and register proper flash
layout.
Ray Zhang [Mon, 10 Oct 2022 04:55:49 +0000 (04:55 +0000)]
mtd: mtdoops: panic caused mtdoops to call mtdoops_erase function immediately
The panic function disables the local interrupts, preemption, and all
other processors. When the invoked mtdoops needs to erase a used page,
calling schedule_work() to do it will not work. Instead, just call
mtdoops_erase function immediately.
Tested:
~# echo c > /proc/sysrq-trigger
[ 171.654759] sysrq: Trigger a crash
[ 171.658325] Kernel panic - not syncing: sysrq triggered crash
......
[ 172.406423] mtdoops: not ready 34, 35 (erase immediately)
[ 172.432285] mtdoops: ready 34, 35
[ 172.435633] Rebooting in 10 seconds..
Ray Zhang [Mon, 10 Oct 2022 04:55:47 +0000 (04:55 +0000)]
mtd: mtdoops: change printk() to counterpart pr_ functions
To comply with latest kernel code requirement, change printk() to
counterpart pr_ functions in mtdoops driver:
- change printk(INFO) to pr_info()
- change printk(DEBUG) to pr_debug()
- change printk(WARNING) to pr_warn()
- change printk(ERR) to pr_err()
Note that only if dynamic debugging is enabled or DEBUG is defined,
printk(KERN_DEBUG) and pr_debug() are equivalent; Otherwise pr_debug()
is no-op, causing different behavior.
Rafał Miłecki [Tue, 4 Oct 2022 08:37:10 +0000 (10:37 +0200)]
mtd: core: try to find OF node for every MTD partition
So far this feature was limited to the top-level "nvmem-cells" node.
There are multiple parsers creating partitions and subpartitions
dynamically. Extend that code to handle them too.
This allows finding partition-* node for every MTD (sub)partition.
Linus Torvalds [Sun, 30 Oct 2022 18:31:14 +0000 (11:31 -0700)]
Merge tag 'fbdev-for-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev
Pull fbdev fixes from Helge Deller:
"A use-after-free bugfix in the smscufx driver and various minor error
path fixes, smaller build fixes, sysfs fixes and typos in comments in
the stifb, sisfb, da8xxfb, xilinxfb, sm501fb, gbefb and cyber2000fb
drivers"
* tag 'fbdev-for-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: cyber2000fb: fix missing pci_disable_device()
fbdev: sisfb: use explicitly signed char
fbdev: smscufx: Fix several use-after-free bugs
fbdev: xilinxfb: Make xilinxfb_release() return void
fbdev: sisfb: fix repeated word in comment
fbdev: gbefb: Convert sysfs snprintf to sysfs_emit
fbdev: sm501fb: Convert sysfs snprintf to sysfs_emit
fbdev: stifb: Fall back to cfb_fillrect() on 32-bit HCRX cards
fbdev: da8xx-fb: Fix error handling in .remove()
fbdev: MIPS supports iomem addresses
Linus Torvalds [Sun, 30 Oct 2022 17:35:07 +0000 (10:35 -0700)]
Merge tag 'usb-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"A few small USB fixes for 6.1-rc3. Include in here are:
- MAINTAINERS update, including a big one for the USB gadget
subsystem. Many thanks to Felipe for all of the years of hard work
he has done on this codebase, it was greatly appreciated.
- dwc3 driver fixes for reported problems.
- xhci driver fixes for reported problems.
- typec driver fixes for minor issues
- uvc gadget driver change, and then revert as it wasn't relevant for
6.1-final, as it is a new feature and people are still reviewing
and modifying it.
All of these have been in the linux-next tree with no reported issues"
* tag 'usb-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: dwc3: gadget: Don't set IMI for no_interrupt
usb: dwc3: gadget: Stop processing more requests on IMI
Revert "usb: gadget: uvc: limit isoc_sg to super speed gadgets"
xhci: Remove device endpoints from bandwidth list when freeing the device
xhci-pci: Set runtime PM as default policy on all xHC 1.2 or later devices
xhci: Add quirk to reset host back to default state at shutdown
usb: xhci: add XHCI_SPURIOUS_SUCCESS to ASM1042 despite being a V0.96 controller
usb: dwc3: st: Rely on child's compatible instead of name
usb: gadget: uvc: limit isoc_sg to super speed gadgets
usb: bdc: change state when port disconnected
usb: typec: ucsi: acpi: Implement resume callback
usb: typec: ucsi: Check the connection on resume
usb: gadget: aspeed: Fix probe regression
usb: gadget: uvc: fix sg handling during video encode
usb: gadget: uvc: fix sg handling in error case
usb: gadget: uvc: fix dropped frame after missed isoc
usb: dwc3: gadget: Don't delay End Transfer on delayed_status
usb: dwc3: Don't switch OTG -> peripheral if extcon is present
MAINTAINERS: Update maintainers for broadcom USB
MAINTAINERS: move USB gadget and phy entries under the main USB entry
Linus Torvalds [Sun, 30 Oct 2022 17:21:42 +0000 (10:21 -0700)]
Merge tag 'gpio-fixes-for-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- convert gpio-tegra to using an immutable irqchip
- MAINTAINERS update
* tag 'gpio-fixes-for-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
MAINTAINERS: Change myself to a maintainer
gpio: tegra: Convert to immutable irq chip
Linus Torvalds [Sun, 30 Oct 2022 16:49:18 +0000 (09:49 -0700)]
Merge tag 'perf_urgent_for_v6.1_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Rename a perf memory level event define to denote it is of CXL type
- Add Alder and Raptor Lakes support to RAPL
- Make sure raw sample data is output with tracepoints
* tag 'perf_urgent_for_v6.1_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL
perf/x86/rapl: Add support for Intel Raptor Lake
perf/x86/rapl: Add support for Intel AlderLake-N
perf: Fix missing raw data on tracepoint events
Linus Torvalds [Sun, 30 Oct 2022 16:44:06 +0000 (09:44 -0700)]
Merge tag 'loongarch-fixes-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Remove unused kernel stack padding, fix some build errors/warnings and
two bugs in laptop platform driver"
* tag 'loongarch-fixes-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
platform/loongarch: laptop: Fix possible UAF and simplify generic_acpi_laptop_init()
platform/loongarch: laptop: Adjust resume order for loongson_hotkey_resume()
LoongArch: BPF: Avoid declare variables in switch-case
LoongArch: Use flexible-array member instead of zero-length array
LoongArch: Remove unused kernel stack padding
Linus Torvalds [Sun, 30 Oct 2022 16:40:04 +0000 (09:40 -0700)]
Merge tag '6.1-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
- use after free fix for reconnect race
- two memory leak fixes
* tag '6.1-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix use-after-free caused by invalid pointer `hostname`
cifs: Fix pages leak when writedata alloc failed in cifs_write_from_iter()
cifs: Fix pages array leak when writedata alloc failed in cifs_writedata_alloc()
Linus Torvalds [Sun, 30 Oct 2022 01:33:03 +0000 (18:33 -0700)]
Merge tag 'random-6.1-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator fix from Jason Donenfeld:
"One fix from Jean-Philippe Brucker, addressing a regression in which
early boot code on ARM64 would use the non-_early variant of the
arch_get_random family of functions, resulting in the architectural
random number generator appearing unavailable during that early phase
of boot.
The fix simply changes arch_get_random*() to arch_get_random*_early().
This distinction between these two functions is a bit of an old wart
I'm not a fan of, and for 6.2 I'll see if I can make obsolete the
_early variant, so that one function does the right thing in all
contexts without overhead"
* tag 'random-6.1-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: use arch_get_random*_early() in random_init()
Linus Torvalds [Sun, 30 Oct 2022 01:12:45 +0000 (18:12 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Varions small fixes, all in drivers.
Some of these arrived during the merge window and got held over to
make sure of testing on the -rc tree.
The biggest change is for standards conformance in the target driver,
closely followed by a set of bug fixes in megaraid_sas"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (21 commits)
scsi: ufs: core: Fix typo in comment
scsi: mpi3mr: Select CONFIG_SCSI_SAS_ATTRS
scsi: ufs: core: Fix typo for register name in comments
scsi: pm80xx: Display proc_name in sysfs
scsi: ufs: core: Fix the error log in ufshcd_query_flag_retry()
scsi: ufs: core: Remove unneeded casts from void *
scsi: lpfc: Fix spelling mistake "unsolicted" -> "unsolicited"
scsi: qla2xxx: Use transport-defined speed mask for supported_speeds
scsi: target: iblock: Fold iblock_emulate_read_cap_with_block_size() into iblock_get_blocks()
scsi: qla2xxx: Fix serialization of DCBX TLV data request
scsi: ufs: qcom: Remove redundant dev_err() call
scsi: megaraid_sas: Move megasas_dbg_lvl init to megasas_init()
scsi: megaraid_sas: Remove unnecessary memset()
scsi: megaraid_sas: Simplify megasas_update_device_list
scsi: megaraid_sas: Correct an error message
scsi: megaraid_sas: Correct value passed to scsi_device_lookup()
scsi: target: core: UA on all LUNs after reset
scsi: target: core: New key must be used for moved PR
scsi: target: core: Abort all preempted regs if requested
scsi: target: core: Fix memory leak in preempt_and_abort
...
Linus Torvalds [Sun, 30 Oct 2022 01:06:52 +0000 (18:06 -0700)]
Merge tag 'block-6.1-2022-10-28' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Christoph:
- make the multipath dma alignment match the non-multipath one
(Keith Busch)
- fix a bogus use of sg_init_marker() (Nam Cao)
- fix circulr locking in nvme-tcp (Sagi Grimberg)
- Initialization fix for requests allocated via the special hw queue
allocator (John)
- Fix for a regression added in this release with the batched
completions of end_io backed requests (Ming)
- Error handling leak fix for rbd (Yang)
- Error handling leak fix for add_disk() failure (Yu)
* tag 'block-6.1-2022-10-28' of git://git.kernel.dk/linux:
blk-mq: Properly init requests from blk_mq_alloc_request_hctx()
blk-mq: don't add non-pt request with ->end_io to batch
rbd: fix possible memory leak in rbd_sysfs_init()
nvme-multipath: set queue dma alignment to 3
nvme-tcp: fix possible circular locking when deleting a controller under memory pressure
nvme-tcp: replace sg_init_marker() with sg_init_table()
block: fix memory leak for elevator on add_disk failure
Linus Torvalds [Sun, 30 Oct 2022 01:01:16 +0000 (18:01 -0700)]
Merge tag 'io_uring-6.1-2022-10-28' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
"Just a fix for a locking regression introduced with the deferred
task_work running from this merge window"
* tag 'io_uring-6.1-2022-10-28' of git://git.kernel.dk/linux:
io_uring: unlock if __io_run_local_work locked inside
io_uring: use io_run_local_work_locked helper
Linus Torvalds [Sun, 30 Oct 2022 00:49:33 +0000 (17:49 -0700)]
Merge tag 'mm-hotfixes-stable-2022-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"Eight fix pre-6.0 bugs and the remainder address issues which were
introduced in the 6.1-rc merge cycle, or address issues which aren't
considered sufficiently serious to warrant a -stable backport"
* tag 'mm-hotfixes-stable-2022-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
mm: multi-gen LRU: move lru_gen_add_mm() out of IRQ-off region
lib: maple_tree: remove unneeded initialization in mtree_range_walk()
mmap: fix remap_file_pages() regression
mm/shmem: ensure proper fallback if page faults
mm/userfaultfd: replace kmap/kmap_atomic() with kmap_local_page()
x86: fortify: kmsan: fix KMSAN fortify builds
x86: asm: make sure __put_user_size() evaluates pointer once
Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default
x86/purgatory: disable KMSAN instrumentation
mm: kmsan: export kmsan_copy_page_meta()
mm: migrate: fix return value if all subpages of THPs are migrated successfully
mm/uffd: fix vma check on userfault for wp
mm: prep_compound_tail() clear page->private
mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs
mm/page_isolation: fix clang deadcode warning
fs/ext4/super.c: remove unused `deprecated_msg'
ipc/msg.c: fix percpu_counter use after free
memory tier, sysfs: rename attribute "nodes" to "nodelist"
MAINTAINERS: git://github.com -> https://github.com for nilfs2
mm/kmemleak: prevent soft lockup in kmemleak_scan()'s object iteration loops
...
Linus Torvalds [Sat, 29 Oct 2022 17:35:17 +0000 (10:35 -0700)]
Merge tag 'powerpc-6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix a case of rescheduling with user access unlocked, when preempt is
enabled.
- A follow-up fix for a recent fix, which could lead to IRQ state
assertions firing incorrectly.
- Two fixes for lockdep warnings seen when using kfence with the Hash
MMU.
- Two fixes for preempt warnings seen when using the Hash MMU.
- Two fixes for the VAS coprocessor mechanism used on pseries.
- Prevent building some of our older KVM backends when
CONTEXT_TRACKING_USER is enabled, as it's known to cause crashes.
- A couple of fixes for issues seen with PMU NMIs.
Thanks to Nicholas Piggin, Guenter Roeck, Frederic Barrat Haren Myneni,
Sachin Sant, and Samuel Holland.
* tag 'powerpc-6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/interrupt: Fix clear of PACA_IRQS_HARD_DIS when returning to soft-masked context
powerpc/64s/interrupt: Perf NMI should not take normal exit path
powerpc/64/interrupt: Prevent NMI PMI causing a dangerous warning
KVM: PPC: BookS PR-KVM and BookE do not support context tracking
powerpc: Fix reschedule bug in KUAP-unlocked user copy
powerpc/64s: Fix hash__change_memory_range preemption warning
powerpc/64s: Disable preemption in hash lazy mmu mode
powerpc/64s: make linear_map_hash_lock a raw spinlock
powerpc/64s: make HPTE lock and native_tlbie_lock irq-safe
powerpc/64s: Add lockdep for HPTE lock
powerpc/pseries: Use lparcfg to reconfig VAS windows for DLPAR CPU
powerpc/pseries/vas: Add VAS IRQ primary handler
Yang Yingliang [Sat, 29 Oct 2022 08:29:31 +0000 (16:29 +0800)]
platform/loongarch: laptop: Fix possible UAF and simplify generic_acpi_laptop_init()
Currently the return value of 'sub_driver->init' is not checked. If
sparse_keymap_setup() called in the init function fails, 'generic_
inputdev' is freed, then it will lead a UAF when using it in generic_
acpi_laptop_init(). Fix it by checking the return value and setting
generic_inputdev to NULL after free, so as to avoid double free it.
The error code in generic_subdriver_init() is always negative, so the
return of generic_subdriver_init() can be simplified.
Huacai Chen [Sat, 29 Oct 2022 08:29:31 +0000 (16:29 +0800)]
platform/loongarch: laptop: Adjust resume order for loongson_hotkey_resume()
Some laptops don't support SW_LID, but still have backlight control,
move backlight resuming before SW_LID event handling so as to avoid
backlight mistake due to early return.
Huacai Chen [Sat, 29 Oct 2022 08:29:31 +0000 (16:29 +0800)]
LoongArch: BPF: Avoid declare variables in switch-case
Not all compilers support declare variables in switch-case, so move
declarations to the beginning of a function. Otherwise we may get such
build errors:
arch/loongarch/net/bpf_jit.c: In function ‘emit_atomic’:
arch/loongarch/net/bpf_jit.c:362:3: error: a label can only be part of a statement and a declaration is not a statement
u8 r0 = regmap[BPF_REG_0];
^~
arch/loongarch/net/bpf_jit.c: In function ‘build_insn’:
arch/loongarch/net/bpf_jit.c:727:3: error: a label can only be part of a statement and a declaration is not a statement
u8 t7 = -1;
^~
arch/loongarch/net/bpf_jit.c:778:3: error: a label can only be part of a statement and a declaration is not a statement
int ret;
^~~
arch/loongarch/net/bpf_jit.c:779:3: error: expected expression before ‘u64’
u64 func_addr;
^~~
arch/loongarch/net/bpf_jit.c:780:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
bool func_addr_fixed;
^~~~
arch/loongarch/net/bpf_jit.c:784:11: error: ‘func_addr’ undeclared (first use in this function); did you mean ‘in_addr’?
&func_addr, &func_addr_fixed);
^~~~~~~~~
in_addr
arch/loongarch/net/bpf_jit.c:784:11: note: each undeclared identifier is reported only once for each function it appears in
arch/loongarch/net/bpf_jit.c:814:3: error: a label can only be part of a statement and a declaration is not a statement
u64 imm64 = (u64)(insn + 1)->imm << 32 | (u32)insn->imm;
^~~
Jinyang He [Sat, 29 Oct 2022 08:29:31 +0000 (16:29 +0800)]
LoongArch: Remove unused kernel stack padding
The current LoongArch kernel stack is padded as if obeying the MIPS o32
calling convention (32 bytes), signifying the port's MIPS lineage but no
longer making sense. Remove the padding for clarity.
Reviewed-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Jinyang He <hejinyang@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Linus Torvalds [Sat, 29 Oct 2022 00:03:00 +0000 (17:03 -0700)]
Merge tag 'riscv-for-linus-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for a build warning in the jump_label code
- One of the git://github -> https://github cleanups, for the SiFive
drivers
- A fix for the kasan initialization code, this still likely warrants
some cleanups but that's a bigger problem and at least this fixes the
crashes in the short term
- A pair of fixes for extension support detection on mixed LLVM/GNU
toolchains
- A fix for a runtime warning in the /proc/cpuinfo code
* tag 'riscv-for-linus-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Fix /proc/cpuinfo cpumask warning
riscv: fix detection of toolchain Zihintpause support
riscv: fix detection of toolchain Zicbom support
riscv: mm: add missing memcpy in kasan_init
MAINTAINERS: git://github.com -> https://github.com for sifive
riscv: jump_label: mark arguments as const to satisfy asm constraints
Linus Torvalds [Fri, 28 Oct 2022 23:48:29 +0000 (16:48 -0700)]
Merge tag 'acpi-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and device properties fixes from Rafael Wysocki:
"These fix device properties documentation and the ACPI PCC code, add a
new IRQ override quirk for resource handling and add one more item to
the list of device IDs to be ignored when returned by _DEP.
Specifics:
- Fix the documentation of the *_match_string() family of functions
to properly cover the return value (Andy Shevchenko)
- Fix a possible integer overflow during multiplication in the ACPI
PCC code (Manank Patel)
- Make the ACPI device resources code skip IRQ override on Asus
Vivobook S5602ZA (Tamim Khan)
- Add LATT2021 to the list of device IDs that are ignored when
returned by _DEP, because there are no drivers for them in the
kernel and no plans to add such drivers (Hans de Goede)"
* tag 'acpi-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: scan: Add LATT2021 to acpi_ignore_dep_ids[]
ACPI: resource: Skip IRQ override on Asus Vivobook S5602ZA
ACPI: PCC: Fix unintentional integer overflow
device property: Fix documentation for *_match_string() APIs
Linus Torvalds [Fri, 28 Oct 2022 23:44:12 +0000 (16:44 -0700)]
Merge tag 'pm-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These make the intel_pstate driver work as expected on all hybrid
platforms to date (regardless of possible platform firmware issues),
fix hybrid sleep on systems using suspend-to-idle by default, make the
generic power domains code handle disabled idle states properly and
update pm-graph.
Specifics:
- Make intel_pstate use what is known about the hardware instead of
relying on information from the platform firmware (ACPI CPPC in
particular) to establish the relationship between the HWP CPU
performance levels and frequencies on all hybrid platforms
available to date (Rafael Wysocki)
- Allow hybrid sleep to use suspend-to-idle as a system suspend
method if it is the current suspend method of choice (Mario
Limonciello)
- Fix handling of unavailable/disabled idle states in the generic
power domains code (Sudeep Holla)
- Update the pm-graph suite of utilities to version 5.10 which is
fixes-mostly and does not add any new features (Todd Brandt)"
* tag 'pm-6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: domains: Fix handling of unavailable/disabled idle states
pm-graph v5.10
cpufreq: intel_pstate: hybrid: Use known scaling factor for P-cores
cpufreq: intel_pstate: Read all MSRs on the target CPU
PM: hibernate: Allow hybrid sleep to work with s2idle
Jean-Philippe Brucker [Fri, 28 Oct 2022 16:00:42 +0000 (17:00 +0100)]
random: use arch_get_random*_early() in random_init()
While reworking the archrandom handling, commit d349ab99eec7 ("random:
handle archrandom with multiple longs") switched to the non-early
archrandom helpers in random_init(), which broke initialization of the
entropy pool from the arm64 random generator.
Indeed at that point the arm64 CPU features, which verify that all CPUs
have compatible capabilities, are not finalized so arch_get_random_seed_longs()
is unsuccessful. Instead random_init() should use the _early functions,
which check only the boot CPU on arm64. On other architectures the
_early functions directly call the normal ones.
Fixes: d349ab99eec7 ("random: handle archrandom with multiple longs") Cc: stable@vger.kernel.org Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Sebastian Andrzej Siewior [Wed, 26 Oct 2022 13:48:30 +0000 (15:48 +0200)]
mm: multi-gen LRU: move lru_gen_add_mm() out of IRQ-off region
lru_gen_add_mm() has been added within an IRQ-off region in the commit
mentioned below. The other invocations of lru_gen_add_mm() are not within
an IRQ-off region.
The invocation within IRQ-off region is problematic on PREEMPT_RT because
the function is using a spin_lock_t which must not be used within
IRQ-disabled regions.
The other invocations of lru_gen_add_mm() occur while
task_struct::alloc_lock is acquired. Move lru_gen_add_mm() after
interrupts are enabled and before task_unlock().
Link: https://lkml.kernel.org/r/20221026134830.711887-1-bigeasy@linutronix.de Fixes: bd74fdaea1460 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Yu Zhao <yuzhao@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W . Biederman" <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lukas Bulwahn [Wed, 26 Oct 2022 12:00:29 +0000 (14:00 +0200)]
lib: maple_tree: remove unneeded initialization in mtree_range_walk()
Before the do-while loop in mtree_range_walk(), the variables next, min,
max need to be initialized. The variables last, prev_min and prev_max are
set within the loop body before they are eventually used after exiting the
loop body.
As it is a do-while loop, the loop body is executed at least once, so the
variables last, prev_min and prev_max do not need to be initialized before
the loop body.
Remove unneeded initialization of last and prev_min.
The needless initialization was reported by clang-analyzer as Dead Stores.
As the compiler already identifies these assignments as unneeded, it
optimizes the assignments away. Hence:
Liam Howlett [Tue, 25 Oct 2022 16:12:49 +0000 (16:12 +0000)]
mmap: fix remap_file_pages() regression
When using the VMA iterator, the final execution will set the variable
'next' to NULL which causes the function to fail out. Restore the break
in the loop to exit the VMA iterator early without clearing NULL fixes the
issue.
Ira Weiny [Tue, 25 Oct 2022 22:01:08 +0000 (15:01 -0700)]
mm/shmem: ensure proper fallback if page faults
The kernel test robot flagged a recursive lock as a result of a conversion
from kmap_atomic() to kmap_local_folio()[Link]
The cause was due to the code depending on the kmap_atomic() side effect
of disabling page faults. In that case the code expects the fault to fail
and take the fallback case.
git archaeology implied that the recursion may not be an actual bug.[1]
However, depending on the implementation of the mmap_lock and the
condition of the call there may still be a deadlock.[2] So this is not
purely a lockdep issue. Considering a single threaded call stack there
are 3 options.
1) Different mm's are in play (no issue)
2) Readlock implementation is recursive and same mm is in play
(no issue)
3) Readlock implementation is _not_ recursive (issue)
The mmap_lock is recursive so with a single thread there is no issue.
However, Matthew pointed out a deadlock scenario when you consider
additional process' and threads thusly.
"The readlock implementation is only recursive if nobody else has taken a
write lock. If you have a multithreaded process, one of the other threads
can call mmap() and that will prevent recursion (due to fairness). Even
if it's a different process that you're trying to acquire the mmap read
lock on, you can still get into a deadly embrace. eg:
process A thread 1 takes read lock on own mmap_lock
process A thread 2 calls mmap, blocks taking write lock
process B thread 1 takes page fault, read lock on own mmap lock
process B thread 2 calls mmap, blocks taking write lock
process A thread 1 blocks taking read lock on process B
process B thread 1 blocks taking read lock on process A
Now all four threads are blocked waiting for each other."
Regardless using pagefault_disable() ensures that no matter what locking
implementation is used a deadlock will not occur. Add an explicit
pagefault_disable() and a big comment to explain this for future souls
looking at this code.
Link: https://lkml.kernel.org/r/20221025220108.2366043-1-ira.weiny@intel.com Link: https://lore.kernel.org/r/202210211215.9dc6efb5-yujie.liu@intel.com Fixes: 7a7256d5f512 ("shmem: convert shmem_mfill_atomic_pte() to use a folio") Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: kernel test robot <yujie.liu@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ira Weiny [Mon, 24 Oct 2022 04:34:52 +0000 (21:34 -0700)]
mm/userfaultfd: replace kmap/kmap_atomic() with kmap_local_page()
kmap() and kmap_atomic() are being deprecated in favor of
kmap_local_page() which is appropriate for any thread local context.[1]
A recent locking bug report with userfaultfd showed that the conversion of
the kmap_atomic()'s in those code flows requires care with regard to the
prevention of deadlock.[2]
git archaeology implied that the recursion may not be an actual bug.[3]
However, depending on the implementation of the mmap_lock and the
condition of the call there may still be a deadlock.[4] So this is not
purely a lockdep issue. Considering a single threaded call stack there
are 3 options.
1) Different mm's are in play (no issue)
2) Readlock implementation is recursive and same mm is in play
(no issue)
3) Readlock implementation is _not_ recursive (issue)
The mmap_lock is recursive so with a single thread there is no issue.
However, Matthew pointed out a deadlock scenario when you consider
additional process' and threads thusly.
"The readlock implementation is only recursive if nobody else has taken a
write lock. If you have a multithreaded process, one of the other threads
can call mmap() and that will prevent recursion (due to fairness). Even
if it's a different process that you're trying to acquire the mmap read
lock on, you can still get into a deadly embrace. eg:
process A thread 1 takes read lock on own mmap_lock
process A thread 2 calls mmap, blocks taking write lock
process B thread 1 takes page fault, read lock on own mmap lock
process B thread 2 calls mmap, blocks taking write lock
process A thread 1 blocks taking read lock on process B
process B thread 1 blocks taking read lock on process A
Now all four threads are blocked waiting for each other."
Regardless using pagefault_disable() ensures that no matter what locking
implementation is used a deadlock will not occur.
Complete kmap conversion in userfaultfd by replacing the kmap() and
kmap_atomic() calls with kmap_local_page(). When replacing the
kmap_atomic() call ensure page faults continue to be disabled to support
the correct fall back behavior and add a comment to inform future souls of
the requirement.
Alexander Potapenko [Mon, 24 Oct 2022 21:21:44 +0000 (23:21 +0200)]
x86: fortify: kmsan: fix KMSAN fortify builds
Ensure that KMSAN builds replace memset/memcpy/memmove calls with the
respective __msan_XXX functions, and that none of the macros are redefined
twice. This should allow building kernel with both CONFIG_KMSAN and
CONFIG_FORTIFY_SOURCE.
Alexander Potapenko [Mon, 24 Oct 2022 21:21:43 +0000 (23:21 +0200)]
x86: asm: make sure __put_user_size() evaluates pointer once
User access macros must ensure their arguments are evaluated only once if
they are used more than once in the macro body. Adding
instrument_put_user() to __put_user_size() resulted in double evaluation
of the `ptr` argument, which led to correctness issues when performing
e.g. unsafe_put_user(..., p++, ...).
To fix those issues, evaluate the `ptr` argument of __put_user_size() at
the beginning of the macro.
Link: https://lkml.kernel.org/r/20221024212144.2852069-4-glider@google.com Fixes: 888f84a6da4d ("x86: asm: instrument usercopy in get_user() and put_user()") Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: youling257 <youling257@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alexander Potapenko [Mon, 24 Oct 2022 21:21:42 +0000 (23:21 +0200)]
Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default
KMSAN adds a lot of instrumentation to the code, which results in
increased stack usage (up to 2048 bytes and more in some cases). It's
hard to predict how big the stack frames can be, so we disable the
warnings for KMSAN instead.
Baolin Wang [Mon, 24 Oct 2022 08:34:21 +0000 (16:34 +0800)]
mm: migrate: fix return value if all subpages of THPs are migrated successfully
During THP migration, if THPs are not migrated but they are split and all
subpages are migrated successfully, migrate_pages() will still return the
number of THP pages that were not migrated. This will confuse the callers
of migrate_pages(). For example, the longterm pinning will failed though
all pages are migrated successfully.
Thus we should return 0 to indicate that all pages are migrated in this
case
Link: https://lkml.kernel.org/r/de386aa864be9158d2f3b344091419ea7c38b2f7.1666599848.git.baolin.wang@linux.alibaba.com Fixes: b5bade978e9b ("mm: migrate: fix the return value of migrate_pages()") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
I just got time to revisit this and found that the root cause is we simply
messed up with the vma check, so that for !PTE_MARKER_UFFD_WP system, we
will allow UFFDIO_REGISTER of MINOR & WP upon shmem as the check was
wrong:
if (vm_flags & VM_UFFD_MINOR)
return is_vm_hugetlb_page(vma) || vma_is_shmem(vma);
Where we'll allow anything to pass on shmem as long as minor mode is
requested.
Axel did it right when introducing minor mode but I messed it up in b1f9e876862d when moving code around. Fix it.
Hugh Dickins [Sat, 22 Oct 2022 07:51:06 +0000 (00:51 -0700)]
mm: prep_compound_tail() clear page->private
Although page allocation always clears page->private in the first page or
head page of an allocation, it has never made a point of clearing
page->private in the tails (though 0 is often what is already there).
But now commit 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t
during THP split") issues a warning when page_tail->private is found to be
non-0 (unless it's swapcache).
We could just delete the warning, but today's consensus appears to want
page->private to be 0, unless there's a good reason for it to be set: so
now clear it in prep_compound_tail() (more general than just for THP; but
not for high order allocation, which makes no pass down the tails).
Link: https://lkml.kernel.org/r/1c4233bb-4e4d-5969-fbd4-96604268a285@google.com Fixes: 71e2d666ef85 ("mm/huge_memory: do not clobber swp_entry_t during THP split") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rik van Riel [Fri, 21 Oct 2022 23:28:05 +0000 (19:28 -0400)]
mm,madvise,hugetlb: fix unexpected data loss with MADV_DONTNEED on hugetlbfs
A common use case for hugetlbfs is for the application to create
memory pools backed by huge pages, which then get handed over to
some malloc library (eg. jemalloc) for further management.
That malloc library may be doing MADV_DONTNEED calls on memory
that is no longer needed, expecting those calls to happen on
PAGE_SIZE boundaries.
However, currently the MADV_DONTNEED code rounds up any such
requests to HPAGE_PMD_SIZE boundaries. This leads to undesired
outcomes when jemalloc expects a 4kB MADV_DONTNEED, but 2MB of
memory get zeroed out, instead.
Use of pre-built shared libraries means that user code does not
always know the page size of every memory arena in use.
Avoid unexpected data loss with MADV_DONTNEED by rounding up
only to PAGE_SIZE (in do_madvise), and rounding down to huge
page granularity.
That way programs will only get as much memory zeroed out as
they requested.
Link: https://lkml.kernel.org/r/20221021192805.366ad573@imladris.surriel.com Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings") Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Maria Yu [Fri, 21 Oct 2022 10:15:55 +0000 (18:15 +0800)]
mm/page_isolation: fix clang deadcode warning
When !CONFIG_VM_BUG_ON, there is warning of
clang-analyzer-deadcode.DeadStores:
Value stored to 'mt' during its initialization is never read.
Link: https://lkml.kernel.org/r/20221021101555.7992-2-quic_aiquny@quicinc.com Signed-off-by: Maria Yu <quic_aiquny@quicinc.com> Cc: David Hildenbrand <david@redhat.com> Cc: Doug Berger <opendmb@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>