Stanislav Jakubek [Wed, 3 Jul 2024 12:02:46 +0000 (14:02 +0200)]
dt-bindings: timer: sprd-timer: convert to YAML
Convert the Spreadtrum SC9860 timer bindings to DT schema.
Changes during conversion:
- rename file to match compatible
- add sprd,sc9860-suspend-timer which was previously undocumented
- minor grammar fix in description
Signed-off-by: Stanislav Jakubek <stano.jakubek@gmail.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Link: https://lore.kernel.org/r/ZoU95lBgoyF/8Md3@standask-GA-A55M-S2HP Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Krzysztof Kozlowski [Fri, 12 Jul 2024 12:11:46 +0000 (14:11 +0200)]
dt-bindings: incomplete-devices: document devices without bindings
There are devices in the wild with non-updatable firmware coming with
ACPI tables with rejected compatibles, e.g. "ltr,ltrf216a". Linux
kernel still supports this device via ACPI PRP0001, however the
compatible was never accepted to bindings.
There are also several early PowerPC or SPARC platforms using
compatibles for their OpenFirmware, but without in-tree DTS. Often the
legacy compatible is not correct in terms of current Devicetree
specification, e.g. missing vendor prefix.
Finally there are also Linux-specific tools and test code with
compatibles.
Add a schema covering above cases purely to satisfy the DT schema and
scripts/checkpatch.pl checks for undocumented compatibles. For
ltr,ltrf216a this also documents the consensus: compatible is allowed
only via ACPI PRP0001, but not bindings.
Neil Armstrong [Wed, 10 Jul 2024 17:02:52 +0000 (19:02 +0200)]
dt-bindings: trivial-devices: document the Sierra Wireless mangOH Green SPI IoT interface
Document the Sierra Wireless mangOH Green SPI IoT interface as a trivial
device.
This fixes the following check:
qcom-mdm9615-wp8548-mangoh-green.dtb: /soc/gsbi@16200000/spi@16280000/spi@0: failed to match any schema with compatible: ['swir,mangoh-iotport-spi']
1df7b047fe43 pylibfdt/Makefile.pylibfdt: use project's flags to compile the extension 61e88fdcec52 libfdt: overlay: Fix phandle overwrite check for new subtrees 49d30894466e meson: fix installation with meson-python d54aaf93673c pylibfdt: clean up python build directory ab86f1e9fda8 pylibfdt: add VERSION.txt to Python sdist 7b8a30eceabe pylibfdt: fix Python version ff4f17eb5865 pylibfdt/Makefile.pylibfdt: fix Python library being rebuild during install 9e313b14e684 pylibfdt/meson.build: fix Python library being rebuilt during install d598fc3648ec tests/run_tests.sh: fix Meson library path being dropped b98239da2f18 tests/meson.build: fix python and yaml tests not running c17d76ab5e84 checks: Check the overall length of "interrupt-map" ae26223a056e libfdt: overlay: Refactor overlay_fixup_phandle 4dd831affd01 libfdt: tests: Update test case for overlay_bad_fixup e6d294200837 tests: Remove two_roots and named_root from LIBTREE_TESTS_L and add all dtb filenames generated by dumptrees to TESTS_TREES_L in Makefile.tests 855c934e26ae tests: fix tests broken under Meson 4fd3f4f0a95d github: enforce testing pylibfdt and yaml support 9ca7d62dbf0b meson: split run-tests by type bb51223083a4 meson: fix dependencies of tests e81900635c95 meson: fix pylibfdt missing dependency on libfdt 822123856980 pylibfdt: fix get_mem_rsv for newer Python versions 1fad065080e6 libfdt: overlay: ensure that existing phandles are not overwritten b0aacd0a7735 github: add windows/msys CI build ae97d9745862 github: Don't accidentally suppress test errors 057a7dbbb777 github: Display meson test logs on failure 92b5d4e91678 pylibfdt: Remove some apparently deprecated options from setup.py 417e3299dbd1 github: Update to newer checkout action 5e6cefa17e2d fix MinGW format attribute 24f60011fd43 libfdt: Simplify adjustment of values for local fixups da39ee0e68b6 libfdt: rework shared/static libraries a669223f7a60 Makefile: do not hardcode the `install` program path 3fbfdd08afd2 libfdt: fix duplicate meson target dcef5f834ea3 tests: use correct pkg-config when cross compiling 0b8026ff254f meson: allow building from shallow clones 95c74d71f090 treesource: Restore string list output when no type markers 2283dd78eff5 libfdt: fdt_path_offset_namelen: Reject empty path 79b9e326a162 libfdt: fdt_get_alias_namelen: Validate aliases 52157f13ef3d pylibfdt: Support boolean properties d77433727566 dtc: fix missing string in usage_opts_help ad8bf9f9aa39 libfdt: Fix fdt_appendprop_addrrange documentation 6c5e189fb952 github: add workflow for Meson builds a3dc9f006a78 libfdt: rename libfdt-X.Y.Z.so to libfdt.so.X.Y.Z 35019949c4c7 workflows: build: remove setuptools_scm hack cd3e2304f4a9 pylibfdt: use fallback version in tarballs 0f5864567745 move release version into VERSION.txt 38165954c13b libfdt: add missing version symbols 5e98b5979354 editorconfig: use tab indentation for version.lds d030a893be25 tests: generate dtbs in Meson build directory 8d8372b13706 tests: fix use of deprecated meson methods 761114effaf7 pylibtfdt: fix use of deprecated meson method bf6377a98d97 meson: set minimum Meson version to 0.56.0 4c68e4b16b22 libfdt: fix library version to match project version bdc5c8793a13 meson: allow disabling tests f088e381f29e Makefile: allow to install libfdt without building executables 6df5328a902c Fix use of <ctype.h> functions ccf1f62d59ad libfdt: Fix a typo in libfdt.h 71a8b8ef0adf libfdt: meson: Fix linking on macOS linker 589d8c7653c7 dtc: Add an option to generate __local_fixups__ and __fixups__ e8364666d5ac CI: Add build matrix with multiple Linux distributions 3b02a94b486f dtc: Correct invalid dts output with mixed phandles and integers d4888958d64b tests: Add additional tests for device graph checks ea3b9a1d2c5a checks: Fix crash in graph_child_address if 'reg' cell size != 1 b2b9671583e9 livetree: fix off-by-one in propval_cell_n() bounds check ab481e483061 Add definition for a GitHub Actions CI job c88038c9b8ca Drop obsolete/broken CI definitions 0ac8b30ba5a1 yaml: Depend on libyaml >= 0.2.3 f1657b2fb5be tests: Add test cases for bad endpoint node and remote-endpoint prop checks 44bb89cafd3d checks: Fix segmentation fault in check_graph_node 60bcf1cde1a8 improve documentation for fdt_path_offset() a6f997bc77d4 add fdt_get_symbol() and fdt_get_symbol_namelen() functions 18f5ec12a10e use fdt_path_getprop_namelen() in fdt_get_alias_namelen() df093279282c add fdt_path_getprop_namelen() helper 129bb4b78bc6 doc: dt-object-internal: Fix a typo 390f481521c3 fdtoverlay: Drop a a repeated article 9f8b382ed45e manual: Fix and improve documentation about -@ 2cdf93a6d402 fdtoverlay: Fix usage string to not mention "<type>" 72fc810c3025 build-sys: add -Wwrite-strings 083ab26da83b tests: fix leaks spotted by ASAN 6f8b28f49609 livetree: fix leak spotted by ASAN fd68bb8c5658 Make name_node() xstrdup its name argument 4718189c4ca8 Delay xstrdup() of node and property names coming from a flat tree 0b842c3c8199 Make build_property() xstrdup its name argument 9cceabea1ee0 checks: correct I2C 10-bit address check 0d56145938fe yamltree.c: fix -Werror=discarded-qualifiers & -Werror=cast-qual 61fa22b05f69 checks: make check.data const 7a1d72a788e0 checks.c: fix check_msg() leak ee5799938697 checks.c: fix heap-buffer-overflow 44c9b73801c1 tests: fix -Wwrite-strings 5b60f5104fcc srcpos.c: fix -Wwrite-strings 32174a66efa4 meson: Fix cell overflow tests when running from meson 64a907f08b9b meson.build: bump version to 1.7.0 e3cde0613bfd Add -Wsuggest-attribute=format warning, correct warnings thus generated 41821821101a Use #ifdef NO_VALGRIND 71c19f20b3ef Do not redefine _GNU_SOURCE if already set 039a99414e77 Bump version to v1.7.0 9b62ec84bb2d Merge remote-tracking branch 'gitlab/main' 3f29d6d85c24 pylibfdt: add size_hint parameter for get_path 2022bb10879d checks: Update #{size,address}-cells check for 'dma-ranges'
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Frank Li [Wed, 3 Jul 2024 18:08:11 +0000 (14:08 -0400)]
dt-bindings: soc: fsl: Add fsl,ls1028a-reset for reset syscon node
ls1028a has a reset module that includes reboot, reset control word, and
service processor control.
Add platform specific compatible string to fix the below warning.
syscon@1e60000: compatible: 'anyOf' conditional failed, one must be fixed:
['syscon'] is too short
'syscon' is not one of ['al,alpine-sysfabric-service', ...]
Frank Li [Wed, 3 Jul 2024 16:49:39 +0000 (12:49 -0400)]
dt-bindings: soc: fsl: cpm_qe: convert to yaml format
Convert binding doc qe.txt to yaml format. Split it to
fsl,qe-firmware.yaml, fsl,qe-ic.yaml, fsl,qe-muram.yaml, fsl,qe-si.yaml
fsl,qe-siram.yaml, fsl,qe.yaml.
Additional Changes:
- Fix error in example.
- Change to low case for hex value.
- Remove fsl,qe-num-riscs and fsl,qe-snums from required list.
- Add #address-cell and #size-cell.
- Add interrupts description for qe-ic.
- Add compatible string fsl,ls1043-qe-si for fsl,qe-si.yaml
- Add compatible string fsl,ls1043-qe-siram for fsl,qe-siram.yaml
- Add child node for fsl,qe.yaml
Fix below warning:
arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/muram@10000: failed to match any schema with compatible: ['fsl,qe-muram', 'fsl,cpm-muram']
arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/muram@10000: failed to match any schema with compatible: ['fsl,qe-muram', 'fsl,cpm-muram']
arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/muram@10000/data-only@0: failed to match any schema with compatible: ['fsl,qe-muram-data', 'fsl,cpm-muram-data']
arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000: failed to match any schema with compatible: ['fsl,qe', 'simple-bus']
arch/arm64/boot/dts/freescale/fsl-ls1043a-rdb.dtb: /soc/uqe@2400000/muram@10000/data-only@0: failed to match any schema with compatible: ['fsl,qe-muram-data', 'fsl,cpm-muram-data']
arch/arm64/boot/dts/freescale/fsl-ls1043a-qds.dtb: /soc/uqe@2400000/qeic@80: failed to match any schema with compatible: ['fsl,qe-ic']
Eddie James [Wed, 22 May 2024 19:25:15 +0000 (14:25 -0500)]
dt-bindings: i2c: i2c-fsi: Convert to json-schema
Convert to json-schema for the FSI-attached I2C controller.
Signed-off-by: Eddie James <eajames@linux.ibm.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Ninad Palsule <ninad@linux.ibm.com> Link: https://lore.kernel.org/r/20240522192524.3286237-12-eajames@linux.ibm.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Eddie James [Wed, 22 May 2024 19:25:13 +0000 (14:25 -0500)]
dt-bindings: fsi: Document the AST2700 FSI controller
Add the appropriate compatible string, and document the new reg
properties of the AST2700 FSI controller, which can directly access
the FSI controller registers as well as the FSI link address space.
Eddie James [Wed, 22 May 2024 19:25:10 +0000 (14:25 -0500)]
dt-bindings: fsi: Document the FSI controller common properties
Since there are multiple FSI controllers documented, the common
properties should be documented separately and then referenced
from the specific controller documentation. Add bus-frequency for
the FSI bus and CFAM local bus frequencies. Add interrupt
controller properties.
Eddie James [Wed, 22 May 2024 19:25:09 +0000 (14:25 -0500)]
dt-bindings: fsi: Document the IBM SBEFIFO engine
The SBEFIFO engine provides an interface to the POWER processor
Self Boot Engine (SBE).
Signed-off-by: Eddie James <eajames@linux.ibm.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andrew Jeffery <andrew@codeconstruct.com.au> Link: https://lore.kernel.org/r/20240522192524.3286237-6-eajames@linux.ibm.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
The FSI2SPI bridge has several SPI controllers behind it, which
should be documented. Also, therefore the node needs to specify
address and size cells.
Frank Li [Thu, 27 Jun 2024 14:42:07 +0000 (10:42 -0400)]
dt-bindings: interrupt-controller: convert fsl,ls-scfg-msi to yaml
Convert device tree binding fsl,ls-scfg-msi to yaml format.
Additional changes:
- Include gic.h and use predefined macro in example.
- Remove label in example.
- Change node name to interrupt-controller in example.
- Fix error in example.
- ls1046a allow 4 irqs, other platform only 1 irq.
- Add $ref: msi-controller.yaml
- Add #msi-cells.
Frank Li [Wed, 26 Jun 2024 19:37:53 +0000 (15:37 -0400)]
dt-bindings: soc: fsl: Convert q(b)man-* to yaml format
Convert qman, bman, qman-portals, bman-portals to yaml format.
Additional Change for fsl,q(b)man-portal:
- Only keep one example.
- Add fsl,qman-channel-id property.
- Use interrupt type macro.
- Remove top level qman-portals@ff4200000 at example.
Additional change for fsl,q(b)man:
- Fixed example error.
- Remove redundent part, only keep fsl,qman node.
- Change memory-regions to memory-region.
- fsl,q(b)man-portals is not required property
Additional change for fsl,qman-fqd.yaml:
- Fixed example error.
- Only keep one example.
- Ref to reserve-memory.yaml
- Merge fsl,bman reserver memory part
Frank Li [Mon, 17 Jun 2024 17:09:34 +0000 (13:09 -0400)]
dt-bindings: misc: fsl,qoriq-mc: convert to yaml format
Convert fsl,qoriq-mc from txt to yaml format.
Addition changes:
- Child node name allow 'ethernet'.
- Use 32bit address in example.
- Fixed missed ';' in example.
- Allow dma-coherent.
- Remove smmu, its part in example.
- Change child node name as 'ethernet'
Add IMX platform maintainers for bindings which would become orphaned.
Acked-by: Uwe Kleine-König <ukleinek@kernel.org> Reviewed-by: Fabio Estevam <festevam@gmail.com> Acked-by: Peng Fan <peng.fan@nxp.com> Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com> # for I2C Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # for IIO Acked-by: Andi Shyti <andi.shyti@kernel.org> Acked-by: Abel Vesa <abel.vesa@linaro.org> Link: https://lore.kernel.org/r/20240617065828.9531-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Dmitry Baryshkov [Mon, 27 May 2024 11:34:24 +0000 (14:34 +0300)]
kbuild: verify dtoverlay files against schema
Currently only the single part device trees are validated against DT
schema. For the multipart DT files only the base DTB is validated.
Extend the fdtoverlay commands to validate the resulting DTB file
against schema.
Andre Przywara [Tue, 18 Jun 2024 16:04:50 +0000 (17:04 +0100)]
dt-bindings: arm: cpus: Add new Cortex and Neoverse names
Add compatible strings for the Arm Cortex-A725 and Cortex-A925 CPUs, as
well as new Neoverse cores: Arm Neoverse N3, Neoverse V2, Neoverse V3,
and Neoverse V3AE.
Herve Codina [Mon, 27 May 2024 16:14:44 +0000 (18:14 +0200)]
PCI: of_property: Add interrupt-controller property in PCI device nodes
PCI devices and bridges DT nodes created during the PCI scan are created
with the interrupt-map property set to handle interrupts.
In order to set this interrupt-map property at a specific level, a
phandle to the parent interrupt controller is needed. On systems that
are not fully described by a device-tree, the parent interrupt
controller may be unavailable (i.e. not described by the device-tree).
As mentioned in the [1], avoiding the use of the interrupt-map property
and considering a PCI device as an interrupt controller itself avoid the
use of a parent interrupt phandle.
In that case, the PCI device itself as an interrupt controller is
responsible for routing the interrupts described in the device-tree
world (DT overlay) to the PCI interrupts.
Add the 'interrupt-controller' property in the PCI device DT node.
Herve Codina [Mon, 27 May 2024 16:14:41 +0000 (18:14 +0200)]
of: unittest: Add tests for changeset properties adding
No test cases are present to test the of_changes_add_prop_*() function
family.
Add a new test to fill this lack.
Functions tested are:
- of_changes_add_prop_string()
- of_changes_add_prop_string_array()
- of_changeset_add_prop_u32()
- of_changeset_add_prop_u32_array()
Herve Codina [Mon, 27 May 2024 16:14:40 +0000 (18:14 +0200)]
of: dynamic: Constify parameter in of_changeset_add_prop_string_array()
The str_array parameter has no reason to be an un-const array.
Indeed, elements of the 'str_array' array are not changed by the code.
Constify the 'str_array' array parameter.
With this const qualifier added, the following construction is allowed:
static const char * const tab_str[] = { "string1", "string2" };
of_changeset_add_prop_string_array(..., tab_str, ARRAY_SIZE(tab_str));
Geert Uytterhoeven [Wed, 29 May 2024 12:32:32 +0000 (14:32 +0200)]
dt-bindings: timer: renesas,tmu: Make interrupt-names required
Now all in-tree users have been updated with interrupt-names properties
according to commit 0076a37a426b6c85 ("dt-bindings: timer: renesas,tmu:
Document input capture interrupt"), make interrupt-names required.
Krzysztof Kozlowski [Wed, 5 Jun 2024 10:56:59 +0000 (12:56 +0200)]
dt-bindings: display: panel: constrain 'reg' in DSI panels (part two)
DSI-attached devices could respond to more than one virtual channel
number, thus their bindings are supposed to constrain the 'reg' property
to match hardware. Add missing 'reg' constrain for DSI-attached display
panels, based on DTS sources in Linux kernel (assume all devices take
only one channel number).
Few bindings missed previous fixup: LG SW43408 and Raydium RM69380.
Dmitry Baryshkov [Tue, 28 May 2024 13:36:48 +0000 (16:36 +0300)]
dt-bindings: ufs: qcom,ufs: drop source clock entries
There is no need to mention and/or to touch in any way the intermediate
(source) clocks. Drop them from MSM8996 UFSHCD schema, making it follow
the example lead by all other platforms.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20240528-msm8996-fix-ufs-v5-1-b475c341126e@linaro.org Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Rob Herring [Wed, 29 Aug 2018 22:20:46 +0000 (17:20 -0500)]
of/fdt: Scan the root node properties earlier
Scan the root node properties (#{size,address}-cells) earlier, so that
the dt_root_addr_cells and dt_root_size_cells variables are initialized
and can be used.
Kent Overstreet [Fri, 24 May 2024 15:42:09 +0000 (11:42 -0400)]
mm: percpu: Include smp.h in alloc_tag.h
percpu.h depends on smp.h, but doesn't include it directly because of
circular header dependency issues; percpu.h is needed in a bunch of low
level headers.
This fixes a randconfig build error on mips:
include/linux/alloc_tag.h: In function '__alloc_tag_ref_set':
include/asm-generic/percpu.h:31:40: error: implicit declaration of function 'raw_smp_processor_id' [-Werror=implicit-function-declaration]
Linus Torvalds [Sun, 26 May 2024 16:54:26 +0000 (09:54 -0700)]
Merge tag 'perf-tools-fixes-for-v6.10-1-2024-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tool fix from Arnaldo Carvalho de Melo:
"Revert a patch causing a regression.
This made a simple 'perf record -e cycles:pp make -j199' stop working
on the Ampere ARM64 system Linus uses to test ARM64 kernels".
* tag 'perf-tools-fixes-for-v6.10-1-2024-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"
This made a simple 'perf record -e cycles:pp make -j199' stop working on
the Ampere ARM64 system Linus uses to test ARM64 kernels, as discussed
at length in the threads in the Link tags below.
The fix provided by Ian wasn't acceptable and work to fix this will take
time we don't have at this point, so lets revert this and work on it on
the next devel cycle.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com> Cc: Ethan Adams <j.ethan.adams@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tycho Andersen <tycho@tycho.pizza> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/lkml/CAHk-=wi5Ri=yR2jBVk-4HzTzpoAWOgstr1LEvg_-OXtJvXXJOA@mail.gmail.com Link: https://lore.kernel.org/lkml/CAHk-=wiWvtFyedDNpoV7a8Fq_FpbB+F5KmWK2xPY3QoYseOf_A@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Sun, 26 May 2024 05:33:10 +0000 (22:33 -0700)]
Merge tag '6.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- two important netfs integration fixes - including for a data
corruption and also fixes for multiple xfstests
- reenable swap support over SMB3
* tag '6.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix missing set of remote_i_size
cifs: Fix smb3_insert_range() to move the zero_point
cifs: update internal version number
smb3: reenable swapfiles over SMB3 mounts
Linus Torvalds [Sat, 25 May 2024 22:10:33 +0000 (15:10 -0700)]
Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"16 hotfixes, 11 of which are cc:stable.
A few nilfs2 fixes, the remainder are for MM: a couple of selftests
fixes, various singletons fixing various issues in various parts"
* tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/ksm: fix possible UAF of stable_node
mm/memory-failure: fix handling of dissolved but not taken off from buddy pages
mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
nilfs2: fix potential hang in nilfs_detach_log_writer()
nilfs2: fix unexpected freezing of nilfs_segctor_sync()
nilfs2: fix use-after-free of timer for log writer thread
selftests/mm: fix build warnings on ppc64
arm64: patching: fix handling of execmem addresses
selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
selftests/mm: compaction_test: fix bogus test success on Aarch64
mailmap: update email address for Satya Priya
mm/huge_memory: don't unpoison huge_zero_folio
kasan, fortify: properly rename memintrinsics
lib: add version into /proc/allocinfo output
mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
Linus Torvalds [Sat, 25 May 2024 21:48:40 +0000 (14:48 -0700)]
Merge tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
- Fix x86 IRQ vector leak caused by a CPU offlining race
- Fix build failure in the riscv-imsic irqchip driver
caused by an API-change semantic conflict
- Fix use-after-free in irq_find_at_or_after()
* tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()
genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline
irqchip/riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
Linus Torvalds [Sat, 25 May 2024 21:40:09 +0000 (14:40 -0700)]
Merge tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix regressions of the new x86 CPU VFM (vendor/family/model)
enumeration/matching code
- Fix crash kernel detection on buggy firmware with
non-compliant ACPI MADT tables
- Address Kconfig warning
* tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL
crypto: x86/aes-xts - switch to new Intel CPU model defines
x86/topology: Handle bogus ACPI tables correctly
x86/kconfig: Select ARCH_WANT_FRAME_POINTERS again when UNWINDER_FRAME_POINTER=y
Linus Torvalds [Sat, 25 May 2024 21:23:58 +0000 (14:23 -0700)]
Merge tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"A series from Xiubo that adds support for additional access checks
based on MDS auth caps which were recently made available to clients.
This is needed to prevent scenarios where the MDS quietly discards
updates that a UID-restricted client previously (wrongfully) acked to
the user.
Other than that, just a documentation fixup"
* tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client:
doc: ceph: update userspace command to get CephFS metadata
ceph: add CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK feature bit
ceph: check the cephx mds auth access for async dirop
ceph: check the cephx mds auth access for open
ceph: check the cephx mds auth access for setattr
ceph: add ceph_mds_check_access() helper
ceph: save cap_auths in MDS client when session is opened
Linus Torvalds [Sat, 25 May 2024 21:19:01 +0000 (14:19 -0700)]
Merge tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 updates from Konstantin Komarov:
"Fixes:
- reusing of the file index (could cause the file to be trimmed)
- infinite dir enumeration
- taking DOS names into account during link counting
- le32_to_cpu conversion, 32 bit overflow, NULL check
- some code was refactored
Changes:
- removed max link count info display during driver init
Remove:
- atomic_open has been removed for lack of use"
* tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3:
fs/ntfs3: Break dir enumeration if directory contents error
fs/ntfs3: Fix case when index is reused during tree transformation
fs/ntfs3: Mark volume as dirty if xattr is broken
fs/ntfs3: Always make file nonresident on fallocate call
fs/ntfs3: Redesign ntfs_create_inode to return error code instead of inode
fs/ntfs3: Use variable length array instead of fixed size
fs/ntfs3: Use 64 bit variable to avoid 32 bit overflow
fs/ntfs3: Check 'folio' pointer for NULL
fs/ntfs3: Missed le32_to_cpu conversion
fs/ntfs3: Remove max link count info display during driver init
fs/ntfs3: Taking DOS names into account during link counting
fs/ntfs3: remove atomic_open
fs/ntfs3: use kcalloc() instead of kzalloc()
Linus Torvalds [Sat, 25 May 2024 21:15:39 +0000 (14:15 -0700)]
Merge tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
"Two ksmbd server fixes, both for stable"
* tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: ignore trailing slashes in share paths
ksmbd: avoid to send duplicate oplock break notifications
Linus Torvalds [Sat, 25 May 2024 20:33:53 +0000 (13:33 -0700)]
Merge tag 'rtc-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"There is one new driver and then most of the changes are the device
tree bindings conversions to yaml.
New driver:
- Epson RX8111
Drivers:
- Many Device Tree bindings conversions to dtschema
- pcf8563: wakeup-source support"
* tag 'rtc-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
pcf8563: add wakeup-source support
rtc: rx8111: handle VLOW flag
rtc: rx8111: demote warnings to debug level
rtc: rx6110: Constify struct regmap_config
dt-bindings: rtc: convert trivial devices into dtschema
dt-bindings: rtc: stmp3xxx-rtc: convert to dtschema
dt-bindings: rtc: pxa-rtc: convert to dtschema
rtc: Add driver for Epson RX8111
dt-bindings: rtc: Add Epson RX8111
rtc: mcp795: drop unneeded MODULE_ALIAS
rtc: nuvoton: Modify part number value
rtc: test: Split rtc unit test into slow and normal speed test
dt-bindings: rtc: nxp,lpc1788-rtc: convert to dtschema
dt-bindings: rtc: digicolor-rtc: move to trivial-rtc
dt-bindings: rtc: alphascale,asm9260-rtc: convert to dtschema
dt-bindings: rtc: armada-380-rtc: convert to dtschema
rtc: cros-ec: provide ID table for avoiding fallback match
Linus Torvalds [Sat, 25 May 2024 20:28:29 +0000 (13:28 -0700)]
Merge tag 'i3c/for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull i3c updates from Alexandre Belloni:
"Runtime PM (power management) is improved and hot-join support has
been added to the dw controller driver.
Core:
- Allow device driver to trigger controller runtime PM
Drivers:
- dw: hot-join support
- svc: better IBI handling"
* tag 'i3c/for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: dw: Add hot-join support.
i3c: master: Enable runtime PM for master controller
i3c: master: svc: fix invalidate IBI type and miss call client IBI handler
i3c: master: svc: change ENXIO to EAGAIN when IBI occurs during start frame
i3c: Add comment for -EAGAIN in i3c_device_do_priv_xfers()
Linus Torvalds [Sat, 25 May 2024 20:23:42 +0000 (13:23 -0700)]
Merge tag 'jffs2-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull jffs2 updates from Richard Weinberger:
- Fix illegal memory access in jffs2_free_inode()
- Kernel-doc fixes
- print symbolic error names
* tag 'jffs2-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
jffs2: Fix potential illegal address access in jffs2_free_inode
jffs2: Simplify the allocation of slab caches
jffs2: nodemgmt: fix kernel-doc comments
jffs2: print symbolic error name instead of error code
Linus Torvalds [Sat, 25 May 2024 20:17:48 +0000 (13:17 -0700)]
Merge tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML updates from Richard Weinberger:
- Fixes for -Wmissing-prototypes warnings and further cleanup
- Remove callback returning void from rtc and virtio drivers
- Fix bash location
* tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (26 commits)
um: virtio_uml: Convert to platform remove callback returning void
um: rtc: Convert to platform remove callback returning void
um: Remove unused do_get_thread_area function
um: Fix -Wmissing-prototypes warnings for __vdso_*
um: Add an internal header shared among the user code
um: Fix the declaration of kasan_map_memory
um: Fix the -Wmissing-prototypes warning for get_thread_reg
um: Fix the -Wmissing-prototypes warning for __switch_mm
um: Fix -Wmissing-prototypes warnings for (rt_)sigreturn
um: Stop tracking host PID in cpu_tasks
um: process: remove unused 'n' variable
um: vector: remove unused len variable/calculation
um: vector: fix bpfflash parameter evaluation
um: slirp: remove set but unused variable 'pid'
um: signal: move pid variable where needed
um: Makefile: use bash from the environment
um: Add winch to winch_handlers before registering winch IRQ
um: Fix -Wmissing-prototypes warnings for __warp_* and foo
um: Fix -Wmissing-prototypes warnings for text_poke*
um: Move declarations to proper headers
...
Linus Torvalds [Sat, 25 May 2024 00:28:02 +0000 (17:28 -0700)]
Merge tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Some fixes for the end of the merge window, mostly amdgpu and panthor,
with one nouveau uAPI change that fixes a bad decision we made a few
months back.
nouveau:
- fix bo metadata uAPI for vm bind
panthor:
- Fixes for panthor's heap logical block.
- Reset on unrecoverable fault
- Fix VM references.
- Reset fix.
xlnx:
- xlnx compile and doc fixes.
amdgpu:
- Handle vbios table integrated info v2.3
amdkfd:
- Handle duplicate BOs in reserve_bo_and_cond_vms
- Handle memory limitations on small APUs
dp/mst:
- MST null deref fix.
bridge:
- Don't let next bridge create connector in adv7511 to make probe
work"
* tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel:
drm/amdgpu/atomfirmware: add intergrated info v2.3 table
drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2
drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms
drm/bridge: adv7511: Attach next bridge without creating connector
drm/buddy: Fix the warn on's during force merge
drm/nouveau: use tile_mode and pte_kind for VM_BIND bo allocations
drm/panthor: Call panthor_sched_post_reset() even if the reset failed
drm/panthor: Reset the FW VM to NULL on unplug
drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level
drm/panthor: Force an immediate reset on unrecoverable faults
drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints
drm/panthor: Fix an off-by-one in the heap context retrieval logic
drm/panthor: Relax the constraints on the tiler chunk size
drm/panthor: Make sure the tiler initial/max chunks are consistent
drm/panthor: Fix tiler OOM handling to allow incremental rendering
drm: xlnx: zynqmp_dpsub: Fix compilation error
drm: xlnx: zynqmp_dpsub: Fix few function comments
David Howells [Fri, 24 May 2024 14:23:36 +0000 (15:23 +0100)]
cifs: Fix missing set of remote_i_size
Occasionally, the generic/001 xfstest will fail indicating corruption in
one of the copy chains when run on cifs against a server that supports
FSCTL_DUPLICATE_EXTENTS_TO_FILE (eg. Samba with a share on btrfs). The
problem is that the remote_i_size value isn't updated by cifs_setsize()
when called by smb2_duplicate_extents(), but i_size *is*.
This may cause cifs_remap_file_range() to then skip the bit after calling
->duplicate_extents() that sets sizes.
Fix this by calling netfs_resize_file() in smb2_duplicate_extents() before
calling cifs_setsize() to set i_size.
This means we don't then need to call netfs_resize_file() upon return from
->duplicate_extents(), but we also fix the test to compare against the pre-dup
inode size.
[Note that this goes back before the addition of remote_i_size with the
netfs_inode struct. It should probably have been setting cifsi->server_eof
previously.]
Fixes: cfc63fc8126a ("smb3: fix cached file size problems in duplicate extents (reflink)") Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev Signed-off-by: Steve French <stfrench@microsoft.com>
David Howells [Wed, 22 May 2024 08:38:48 +0000 (09:38 +0100)]
cifs: Fix smb3_insert_range() to move the zero_point
Fix smb3_insert_range() to move the zero_point over to the new EOF.
Without this, generic/147 fails as reads of data beyond the old EOF point
return zeroes.
Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") Signed-off-by: David Howells <dhowells@redhat.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev Signed-off-by: Steve French <stfrench@microsoft.com>
Chengming Zhou [Mon, 13 May 2024 03:07:56 +0000 (11:07 +0800)]
mm/ksm: fix possible UAF of stable_node
The commit 2c653d0ee2ae ("ksm: introduce ksm_max_page_sharing per page
deduplication limit") introduced a possible failure case in the
stable_tree_insert(), where we may free the new allocated stable_node_dup
if we fail to prepare the missing chain node.
Then that kfolio return and unlock with a freed stable_node set... And
any MM activities can come in to access kfolio->mapping, so UAF.
Fix it by moving folio_set_stable_node() to the end after stable_node
is inserted successfully.
Link: https://lkml.kernel.org/r/20240513-b4-ksm-stable-node-uaf-v1-1-f687de76f452@linux.dev Fixes: 2c653d0ee2ae ("ksm: introduce ksm_max_page_sharing per page deduplication limit") Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Stefan Roesch <shr@devkernel.io> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
memory_failure
try_memory_failure_hugetlb
me_huge_page
__page_handle_poison
dissolve_free_hugetlb_folio
drain_all_pages -- Buddy page can be isolated e.g. for compaction.
take_page_off_buddy -- Failed as page is not in the buddy list.
-- Page can be putback into buddy after compaction.
page_ref_inc -- Leads to buddy page with refcnt = 1.
Then unpoison_memory() can unpoison the page and send the buddy page back
into buddy list again leading to the above bad page state warning. And
bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy
page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to
allocate this page.
Fix this issue by only treating __page_handle_poison() as successful when
it returns 1.
Link: https://lkml.kernel.org/r/20240523071217.1696196-1-linmiaohe@huawei.com Fixes: ceaf8fbea79a ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yuanyuan Zhong [Thu, 23 May 2024 18:35:31 +0000 (12:35 -0600)]
mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
After switching smaps_rollup to use VMA iterator, searching for next entry
is part of the condition expression of the do-while loop. So the current
VMA needs to be addressed before the continue statement.
Otherwise, with some VMAs skipped, userspace observed memory
consumption from /proc/pid/smaps_rollup will be smaller than the sum of
the corresponding fields from /proc/pid/smaps.
Link: https://lkml.kernel.org/r/20240523183531.2535436-1-yzhong@purestorage.com Fixes: c4c84f06285e ("fs/proc/task_mmu: stop using linked list and highest_vm_end") Signed-off-by: Yuanyuan Zhong <yzhong@purestorage.com> Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ryusuke Konishi [Mon, 20 May 2024 13:26:21 +0000 (22:26 +0900)]
nilfs2: fix potential hang in nilfs_detach_log_writer()
Syzbot has reported a potential hang in nilfs_detach_log_writer() called
during nilfs2 unmount.
Analysis revealed that this is because nilfs_segctor_sync(), which
synchronizes with the log writer thread, can be called after
nilfs_segctor_destroy() terminates that thread, as shown in the call trace
below:
Fix this issue by changing nilfs_segctor_sync() so that the log writer
thread returns normally without synchronizing after it terminates, and by
forcing tasks that are already waiting to complete once after the thread
terminates.
The skipped inode metadata flushout will then be processed together in the
subsequent cleanup work in nilfs_segctor_destroy().
Ryusuke Konishi [Mon, 20 May 2024 13:26:20 +0000 (22:26 +0900)]
nilfs2: fix unexpected freezing of nilfs_segctor_sync()
A potential and reproducible race issue has been identified where
nilfs_segctor_sync() would block even after the log writer thread writes a
checkpoint, unless there is an interrupt or other trigger to resume log
writing.
This turned out to be because, depending on the execution timing of the
log writer thread running in parallel, the log writer thread may skip
responding to nilfs_segctor_sync(), which causes a call to schedule()
waiting for completion within nilfs_segctor_sync() to lose the opportunity
to wake up.
The reason why waking up the task waiting in nilfs_segctor_sync() may be
skipped is that updating the request generation issued using a shared
sequence counter and adding an wait queue entry to the request wait queue
to the log writer, are not done atomically. There is a possibility that
log writing and request completion notification by nilfs_segctor_wakeup()
may occur between the two operations, and in that case, the wait queue
entry is not yet visible to nilfs_segctor_wakeup() and the wake-up of
nilfs_segctor_sync() will be carried over until the next request occurs.
Fix this issue by performing these two operations simultaneously within
the lock section of sc_state_lock. Also, following the memory barrier
guidelines for event waiting loops, move the call to set_current_state()
in the same location into the event waiting loop to ensure that a memory
barrier is inserted just before the event condition determination.
Ryusuke Konishi [Mon, 20 May 2024 13:26:19 +0000 (22:26 +0900)]
nilfs2: fix use-after-free of timer for log writer thread
Patch series "nilfs2: fix log writer related issues".
This bug fix series covers three nilfs2 log writer-related issues,
including a timer use-after-free issue and potential deadlock issue on
unmount, and a potential freeze issue in event synchronization found
during their analysis. Details are described in each commit log.
This patch (of 3):
A use-after-free issue has been reported regarding the timer sc_timer on
the nilfs_sc_info structure.
The problem is that even though it is used to wake up a sleeping log
writer thread, sc_timer is not shut down until the nilfs_sc_info structure
is about to be freed, and is used regardless of the thread's lifetime.
Fix this issue by limiting the use of sc_timer only while the log writer
thread is alive.
Link: https://lkml.kernel.org/r/20240520132621.4054-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240520132621.4054-2-konishi.ryusuke@gmail.com Fixes: fdce895ea5dd ("nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: "Bai, Shuangpeng" <sjb7183@psu.edu> Closes: https://groups.google.com/g/syzkaller/c/MK_LYqtt8ko/m/8rgdWeseAwAJ Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Michael Ellerman [Tue, 21 May 2024 03:02:19 +0000 (13:02 +1000)]
selftests/mm: fix build warnings on ppc64
Fix warnings like:
In file included from uffd-unit-tests.c:8:
uffd-unit-tests.c: In function `uffd_poison_handle_fault':
uffd-common.h:45:33: warning: format `%llu' expects argument of type
`long long unsigned int', but argument 3 has type `__u64' {aka `long
unsigned int'} [-Wformat=]
By switching to unsigned long long for u64 for ppc64 builds.
The problem is because bpf_arch_text_copy() silently fails to write to the
read-only area as a result of patch_map() faulting and the resulting
-EFAULT being chucked away.
Update patch_map() to use CONFIG_EXECMEM instead of
CONFIG_STRICT_MODULE_RWX to check for vmalloc addresses.
Link: https://lkml.kernel.org/r/20240521213813.703309-1-rppt@kernel.org Fixes: 2c9e5d4a0082 ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of") Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Reported-by: Klara Modin <klarasmodin@gmail.com> Closes: https://lore.kernel.org/all/7983fbbf-0127-457c-9394-8d6e4299c685@gmail.com Tested-by: Klara Modin <klarasmodin@gmail.com> Cc: Björn Töpel <bjorn@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dev Jain [Tue, 21 May 2024 07:43:58 +0000 (13:13 +0530)]
selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
Reset nr_hugepages to zero before the start of the test.
If a non-zero number of hugepages is already set before the start of the
test, the following problems arise:
- The probability of the test getting OOM-killed increases. Proof:
The test wants to run on 80% of available memory to prevent OOM-killing
(see original code comments). Let the value of mem_free at the start
of the test, when nr_hugepages = 0, be x. In the other case, when
nr_hugepages > 0, let the memory consumed by hugepages be y. In the
former case, the test operates on 0.8 * x of memory. In the latter,
the test operates on 0.8 * (x - y) of memory, with y already filled,
hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 *
x. Q.E.D
- The probability of a bogus test success increases. Proof: Let the
memory consumed by hugepages be greater than 25% of x, with x and y
defined as above. The definition of compaction_index is c_index = (x -
y)/z where z is the memory consumed by hugepages after trying to
increase them again. In check_compaction(), we set the number of
hugepages to zero, and then increase them back; the probability that
they will be set back to consume at least y amount of memory again is
very high (since there is not much delay between the two attempts of
changing nr_hugepages). Hence, z >= y > (x/4) (by the 25% assumption).
Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3
hence, c_index can always be forced to be less than 3, thereby the test
succeeding always. Q.E.D
Link: https://lkml.kernel.org/r/20240521074358.675031-4-dev.jain@arm.com Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Dev Jain <dev.jain@arm.com> Cc: <stable@vger.kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sri Jayaramappa <sjayaram@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dev Jain [Tue, 21 May 2024 07:43:57 +0000 (13:13 +0530)]
selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
Currently, the test tries to set nr_hugepages to zero, but that is not
actually done because the file offset is not reset after read(). Fix that
using lseek().
Link: https://lkml.kernel.org/r/20240521074358.675031-3-dev.jain@arm.com Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Dev Jain <dev.jain@arm.com> Cc: <stable@vger.kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sri Jayaramappa <sjayaram@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dev Jain [Tue, 21 May 2024 07:43:56 +0000 (13:13 +0530)]
selftests/mm: compaction_test: fix bogus test success on Aarch64
Patch series "Fixes for compaction_test", v2.
The compaction_test memory selftest introduces fragmentation in memory
and then tries to allocate as many hugepages as possible. This series
addresses some problems.
On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since
compaction_index becomes 0, which is less than 3, due to no division by
zero exception being raised. We fix that by checking for division by
zero.
Secondly, correctly set the number of hugepages to zero before trying
to set a large number of them.
Now, consider a situation in which, at the start of the test, a non-zero
number of hugepages have been already set (while running the entire
selftests/mm suite, or manually by the admin). The test operates on 80%
of memory to avoid OOM-killer invocation, and because some memory is
already blocked by hugepages, it would increase the chance of OOM-killing.
Also, since mem_free used in check_compaction() is the value before we
set nr_hugepages to zero, the chance that the compaction_index will
be small is very high if the preset nr_hugepages was high, leading to a
bogus test success.
This patch (of 3):
Currently, if at runtime we are not able to allocate a huge page, the test
will trivially pass on Aarch64 due to no exception being raised on
division by zero while computing compaction_index. Fix that by checking
for nr_hugepages == 0. Anyways, in general, avoid a division by zero by
exiting the program beforehand. While at it, fix a typo, and handle the
case where the number of hugepages may overflow an integer.
The root cause is that HWPoison flag will be set for huge_zero_folio
without increasing the folio refcnt. But then unpoison_memory() will
decrease the folio refcnt unexpectedly as it appears like a successfully
hwpoisoned folio leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0) when
releasing huge_zero_folio.
Skip unpoisoning huge_zero_folio in unpoison_memory() to fix this issue.
We're not prepared to unpoison huge_zero_folio yet.
Link: https://lkml.kernel.org/r/20240516122608.22610-1-linmiaohe@huawei.com Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Xu Yu <xuyu@linux.alibaba.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrey Konovalov [Fri, 17 May 2024 13:01:18 +0000 (15:01 +0200)]
kasan, fortify: properly rename memintrinsics
After commit 69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*()
functions") and the follow-up fixes, with CONFIG_FORTIFY_SOURCE enabled,
even though the compiler instruments meminstrinsics by generating calls to
__asan/__hwasan_ prefixed functions, FORTIFY_SOURCE still uses
uninstrumented memset/memmove/memcpy as the underlying functions.
As a result, KASAN cannot detect bad accesses in memset/memmove/memcpy.
This also makes KASAN tests corrupt kernel memory and cause crashes.
To fix this, use __asan_/__hwasan_memset/memmove/memcpy as the underlying
functions whenever appropriate. Do this only for the instrumented code
(as indicated by __SANITIZE_ADDRESS__).
Link: https://lkml.kernel.org/r/20240517130118.759301-1-andrey.konovalov@linux.dev Fixes: 69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*() functions") Fixes: 51287dcb00cc ("kasan: emit different calls for instrumentable memintrinsics") Fixes: 36be5cba99f6 ("kasan: treat meminstrinsic as builtins in uninstrumented files") Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com> Reported-by: Erhard Furtner <erhard_f@mailbox.org> Reported-by: Nico Pache <npache@redhat.com> Closes: https://lore.kernel.org/all/20240501144156.17e65021@outsider.home/ Reviewed-by: Marco Elver <elver@google.com> Tested-by: Nico Pache <npache@redhat.com> Acked-by: Nico Pache <npache@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Daniel Axtens <dja@axtens.net> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Hailong.Liu [Fri, 10 May 2024 10:01:31 +0000 (18:01 +0800)]
mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
commit a421ef303008 ("mm: allow !GFP_KERNEL allocations for kvmalloc")
includes support for __GFP_NOFAIL, but it presents a conflict with commit dd544141b9eb ("vmalloc: back off when the current task is OOM-killed"). A
possible scenario is as follows:
process-a
__vmalloc_node_range(GFP_KERNEL | __GFP_NOFAIL)
__vmalloc_area_node()
vm_area_alloc_pages()
--> oom-killer send SIGKILL to process-a
if (fatal_signal_pending(current)) break;
--> return NULL;
To fix this, do not check fatal_signal_pending() in vm_area_alloc_pages()
if __GFP_NOFAIL set.
This issue occurred during OPLUS KASAN TEST. Below is part of the log
-> oom-killer sends signal to process
[65731.222840] [ T1308] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/apps/uid_10198,task=gs.intelligence,pid=32454,uid=10198
Linus Torvalds [Fri, 24 May 2024 17:46:35 +0000 (10:46 -0700)]
Merge tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
- The compression format used for boot images is now configurable at
build time, and these formats are shown in `make help`
- access_ok() has been optimized
- A pair of performance bugs have been fixed in the uaccess handlers
- Various fixes and cleanups, including one for the IMSIC build failure
and one for the early-boot ftrace illegal NOPs bug
* tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix early ftrace nop patching
irqchip: riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
riscv: selftests: Add signal handling vector tests
riscv: mm: accelerate pagefault when badaccess
riscv: uaccess: Relax the threshold for fast path
riscv: uaccess: Allow the last potential unrolled copy
riscv: typo in comment for get_f64_reg
Use bool value in set_cpu_online()
riscv: selftests: Add hwprobe binaries to .gitignore
riscv: stacktrace: fixed walk_stackframe()
ftrace: riscv: move from REGS to ARGS
riscv: do not select MODULE_SECTIONS by default
riscv: show help string for riscv-specific targets
riscv: make image compression configurable
riscv: cpufeature: Fix extension subset checking
riscv: cpufeature: Fix thead vector hwcap removal
riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context
riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled
riscv: Define TASK_SIZE_MAX for __access_ok()
riscv: Remove PGDIR_SIZE_L3 and TASK_SIZE_MIN
Linus Torvalds [Fri, 24 May 2024 17:24:49 +0000 (10:24 -0700)]
Merge tag 'for-linus-6.10a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- a small cleanup in the drivers/xen/xenbus Makefile
- a fix of the Xen xenstore driver to improve connecting to a late
started Xenstore
- an enhancement for better support of ballooning in PVH guests
- a cleanup using try_cmpxchg() instead of open coding it
* tag 'for-linus-6.10a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
drivers/xen: Improve the late XenStore init protocol
xen/xenbus: Use *-y instead of *-objs in Makefile
xen/x86: add extra pages to unpopulated-alloc if available
locking/x86/xen: Use try_cmpxchg() in xen_alloc_p2m_entry()
Linus Torvalds [Fri, 24 May 2024 16:40:31 +0000 (09:40 -0700)]
Merge tag 'for-6.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull more btrfs updates from David Sterba:
"A few more updates, mostly stability fixes or user visible changes:
- fix race in zoned mode during device replace that can lead to
use-after-free
- update return codes and lower message levels for quota rescan where
it's causing false alerts
- fix unexpected qgroup id reuse under some conditions
- fix condition when looking up extent refs
- add option norecovery (removed in 6.8), the intended replacements
haven't been used and some aplications still rely on the old one
- build warning fixes"
* tag 'for-6.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: re-introduce 'norecovery' mount option
btrfs: fix end of tree detection when searching for data extent ref
btrfs: scrub: initialize ret in scrub_simple_mirror() to fix compilation warning
btrfs: zoned: fix use-after-free due to race with dev replace
btrfs: qgroup: fix qgroup id collision across mounts
btrfs: qgroup: update rescan message levels and error codes
Linus Torvalds [Fri, 24 May 2024 16:31:50 +0000 (09:31 -0700)]
Merge tag 'erofs-for-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull more erofs updates from Gao Xiang:
"The main ones are metadata API conversion to byte offsets by Al Viro.
Another patch gets rid of unnecessary memory allocation out of DEFLATE
decompressor. The remaining one is a trivial cleanup.
- Convert metadata APIs to byte offsets
- Avoid allocating DEFLATE streams unnecessarily
- Some erofs_show_options() cleanup"
* tag 'erofs-for-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: avoid allocating DEFLATE streams before mounting
z_erofs_pcluster_begin(): don't bother with rounding position down
erofs: don't round offset down for erofs_read_metabuf()
erofs: don't align offset for erofs_read_metabuf() (simple cases)
erofs: mechanically convert erofs_read_metabuf() to offsets
erofs: clean up erofs_show_options()
Linus Torvalds [Fri, 24 May 2024 16:01:21 +0000 (09:01 -0700)]
Merge tag 'input-for-v6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- a change to input core to trim amount of keys data in modalias string
in case when a device declares too many keys and they do not fit in
uevent buffer instead of reporting an error which results in uevent
not being generated at all
- support for Machenike G5 Pro Controller added to xpad driver
- support for FocalTech FT5452 and FT8719 added to edt-ft5x06
- support for new SPMI vibrator added to pm8xxx-vibrator driver
- missing locking added to cyapa touchpad driver
- removal of unused fields in various driver structures
- explicit initialization of i2c_device_id::driver_data to 0 dropped
from input drivers
- other assorted fixes and cleanups.
* tag 'input-for-v6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (24 commits)
Input: edt-ft5x06 - add support for FocalTech FT5452 and FT8719
dt-bindings: input: touchscreen: edt-ft5x06: Document FT5452 and FT8719 support
Input: xpad - add support for Machenike G5 Pro Controller
Input: try trimming too long modalias strings
Input: drop explicit initialization of struct i2c_device_id::driver_data to 0
Input: zet6223 - remove an unused field in struct zet6223_ts
Input: chipone_icn8505 - remove an unused field in struct icn8505_data
Input: cros_ec_keyb - remove an unused field in struct cros_ec_keyb
Input: lpc32xx-keys - remove an unused field in struct lpc32xx_kscan_drv
Input: matrix_keypad - remove an unused field in struct matrix_keypad
Input: tca6416-keypad - remove unused struct tca6416_drv_data
Input: tca6416-keypad - remove an unused field in struct tca6416_keypad_chip
Input: da7280 - remove an unused field in struct da7280_haptic
Input: ff-core - prefer struct_size over open coded arithmetic
Input: cyapa - add missing input core locking to suspend/resume functions
input: pm8xxx-vibrator: add new SPMI vibrator support
dt-bindings: input: qcom,pm8xxx-vib: add new SPMI vibrator module
input: pm8xxx-vibrator: refactor to support new SPMI vibrator
Input: pm8xxx-vibrator - correct VIB_MAX_LEVELS calculation
Input: sur40 - convert le16 to cpu before use
...
Linus Torvalds [Fri, 24 May 2024 15:48:51 +0000 (08:48 -0700)]
Merge tag 'sound-fix-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes for 6.10-rc1. Most of changes are various
device-specific fixes and quirks, while there are a few small changes
in ALSA core timer and module / built-in fixes"
* tag 'sound-fix-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: fix mute/micmute LEDs don't work for ProBook 440/460 G11.
ALSA: core: Enable proc module when CONFIG_MODULES=y
ALSA: core: Fix NULL module pointer assignment at card init
ALSA: hda/realtek: Enable headset mic of JP-IK LEAP W502 with ALC897
ASoC: dt-bindings: stm32: Ensure compatible pattern matches whole string
ASoC: tas2781: Fix wrong loading calibrated data sequence
ASoC: tas2552: Add TX path for capturing AUDIO-OUT data
ALSA: usb-audio: Fix for sampling rates support for Mbox3
Documentation: sound: Fix trailing whitespaces
ALSA: timer: Set lower bound of start tick time
ASoC: codecs: ES8326: solve hp and button detect issue
ASoC: rt5645: mic-in detection threshold modification
ASoC: Intel: sof_sdw_rt_sdca_jack_common: Use name_prefix for `-sdca` detection
Linus Torvalds [Fri, 24 May 2024 15:43:25 +0000 (08:43 -0700)]
Merge tag 'char-misc-6.10-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fix from Greg KH:
"Here is one remaining bugfix for 6.10-rc1 that missed the 6.9-final
merge window, and has been sitting in my tree and linux-next for quite
a while now, but wasn't sent to you (my fault, travels...)
It is a bugfix to resolve an error in the speakup code that could
overflow a buffer.
It has been in linux-next for a while with no reported problems"
* tag 'char-misc-6.10-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
speakup: Fix sizeof() vs ARRAY_SIZE() bug
Linus Torvalds [Fri, 24 May 2024 15:38:28 +0000 (08:38 -0700)]
Merge tag 'tty-6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small TTY and Serial driver fixes that missed the
6.9-final merge window, but have been in my tree for weeks (my fault,
travel caused me to miss this)
These fixes include:
- more n_gsm fixes for reported problems
- 8520_mtk driver fix
- 8250_bcm7271 driver fix
- sc16is7xx driver fix
All of these have been in linux-next for weeks without any reported
problems"
* tag 'tty-6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: sc16is7xx: fix bug in sc16is7xx_set_baud() when using prescaler
serial: 8250_bcm7271: use default_mux_rate if possible
serial: 8520_mtk: Set RTS on shutdown for Rx in-band wakeup
tty: n_gsm: fix missing receive state reset after mode switch
tty: n_gsm: fix possible out-of-bounds in gsm0_receive()
Linus Torvalds [Fri, 24 May 2024 15:33:44 +0000 (08:33 -0700)]
Merge tag 'hardening-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- loadpin: Prevent SECURITY_LOADPIN_ENFORCE=y without module
decompression (Stephen Boyd)
- ubsan: Restore dependency on ARCH_HAS_UBSAN
- kunit/fortify: Fix memcmp() test to be amplitude agnostic
* tag 'hardening-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
kunit/fortify: Fix memcmp() test to be amplitude agnostic
ubsan: Restore dependency on ARCH_HAS_UBSAN
loadpin: Prevent SECURITY_LOADPIN_ENFORCE=y without module decompression
Linus Torvalds [Fri, 24 May 2024 15:27:34 +0000 (08:27 -0700)]
Merge tag 'trace-tracefs-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracefs/eventfs updates from Steven Rostedt:
"Bug fixes:
- The eventfs directories need to have unique inode numbers. Make
sure that they do not get the default file inode number.
- Update the inode uid and gid fields on remount.
When a remount happens where a uid and/or gid is specified, all the
tracefs files and directories should get the specified uid and/or
gid. But this can be sporadic when some uids were assigned already.
There's already a list of inodes that are allocated. Just update
their uid and gid fields at the time of remount.
- Update the eventfs_inodes on remount from the top level "events"
descriptor.
There was a bug where not all the eventfs files or directories
where getting updated on remount. One fix was to clear the
SAVED_UID/GID flags from the inode list during the iteration of the
inodes during the remount. But because the eventfs inodes can be
freed when the last referenced is released, not all the
eventfs_inodes were being updated. This lead to the ownership
selftest to fail if it was run a second time (the first time would
leave eventfs_inodes with no corresponding tracefs_inode).
Instead, for eventfs_inodes, only process the "events"
eventfs_inode from the list iteration, as it is guaranteed to have
a tracefs_inode (it's never freed while the "events" directory
exists). As it has a list of its children, and the children have a
list of their children, just iterate all the eventfs_inodes from
the "events" descriptor and it is guaranteed to get all of them.
- Clear the EVENT_INODE flag from the tracefs_drop_inode() callback.
Currently the EVENTFS_INODE FLAG is cleared in the tracefs_d_iput()
callback. But this is the wrong location. The iput() callback is
called when the last reference to the dentry inode is hit. There
could be a case where two dentry's have the same inode, and the
flag will be cleared prematurely. The flag needs to be cleared when
the last reference of the inode is dropped and that happens in the
inode's drop_inode() callback handler.
Cleanups:
- Consolidate the creation of a tracefs_inode for an eventfs_inode
A tracefs_inode is created for both files and directories of the
eventfs system. It is open coded. Instead, consolidate it into a
single eventfs_get_inode() function call.
- Remove the eventfs getattr and permission callbacks.
The permissions for the eventfs files and directories are updated
when the inodes are created, on remount, and when the user sets
them (via setattr). The inodes hold the current permissions so
there is no need to have custom getattr or permissions callbacks as
they will more likely cause them to be incorrect. The inode's
permissions are updated when they should be updated. Remove the
getattr and permissions inode callbacks.
- Do not update eventfs_inode attributes on creation of inodes.
The eventfs_inodes attribute field is used to store the permissions
of the directories and files for when their corresponding inodes
are freed and are created again. But when the creation of the
inodes happen, the eventfs_inode attributes are recalculated. The
recalculation should only happen when the permissions change for a
given file or directory. Currently, the attribute changes are just
being set to their current files so this is not a bug, but it's
unnecessary and error prone. Stop doing that.
- The events directory inode is created once when the events
directory is created and deleted when it is deleted. It is now
updated on remount and when the user changes the permissions.
There's no need to use the eventfs_inode of the events directory to
store the events directory permissions. But using it to store the
default permissions for the files within the directory that have
not been updated by the user can simplify the code"
* tag 'trace-tracefs-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Do not use attributes for events directory
eventfs: Cleanup permissions in creation of inodes
eventfs: Remove getattr and permission callbacks
eventfs: Consolidate the eventfs_inode update in eventfs_get_inode()
tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()
eventfs: Update all the eventfs_inodes from the events descriptor
tracefs: Update inode permissions on remount
eventfs: Keep the directories from having the same inode number as files
dicken.ding [Fri, 24 May 2024 09:17:39 +0000 (17:17 +0800)]
genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()
irq_find_at_or_after() dereferences the interrupt descriptor which is
returned by mt_find() while neither holding sparse_irq_lock nor RCU read
lock, which means the descriptor can be freed between mt_find() and the
dereference:
Fixes: 721255b9826b ("genirq: Use a maple tree for interrupt descriptor management") Signed-off-by: dicken.ding <dicken.ding@mediatek.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240524091739.31611-1-dicken.ding@mediatek.com
Konstantin Komarov [Tue, 23 Apr 2024 12:31:56 +0000 (15:31 +0300)]
fs/ntfs3: Fix case when index is reused during tree transformation
In most cases when adding a cluster to the directory index,
they are placed at the end, and in the bitmap, this cluster corresponds
to the last bit. The new directory size is calculated as follows:
data_size = (u64)(bit + 1) << indx->index_bits;
In the case of reusing a non-final cluster from the index,
data_size is calculated incorrectly, resulting in the directory size
differing from the actual size.
A check for cluster reuse has been added, and the size update is skipped.
Fixes: 82cae269cfa95 ("fs/ntfs3: Add initialization of super block") Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: stable@vger.kernel.org
Jeff Xu [Mon, 15 Apr 2024 16:35:21 +0000 (16:35 +0000)]
mseal: add mseal syscall
The new mseal() is an syscall on 64 bit CPU, and with following signature:
int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.
mseal() blocks following operations for the given memory range.
1> Unmapping, moving to another location, and shrinking the size,
via munmap() and mremap(), can leave an empty space, therefore can
be replaced with a VMA with a new set of attributes.
2> Moving or expanding a different VMA into the current location,
via mremap().
3> Modifying a VMA via mmap(MAP_FIXED).
4> Size expansion, via mremap(), does not appear to pose any specific
risks to sealed VMAs. It is included anyway because the use case is
unclear. In any case, users can rely on merging to expand a sealed VMA.
5> mprotect() and pkey_mprotect().
6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
memory, when users don't have write permission to the memory. Those
behaviors can alter region contents by discarding pages, effectively a
memset(0) for anonymous memory.
Following input during RFC are incooperated into this patch:
Jann Horn: raising awareness and providing valuable insights on the
destructive madvise operations.
Linus Torvalds: assisting in defining system call signature and scope.
Liam R. Howlett: perf optimization.
Theo de Raadt: sharing the experiences and insight gained from
implementing mimmutable() in OpenBSD.
Finally, the idea that inspired this patch comes from Stephen Röttger's
work in Chrome V8 CFI.
[jeffxu@chromium.org: add branch prediction hint, per Pedro] Link: https://lkml.kernel.org/r/20240423192825.1273679-2-jeffxu@chromium.org Link: https://lkml.kernel.org/r/20240415163527.626541-3-jeffxu@chromium.org Signed-off-by: Jeff Xu <jeffxu@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guenter Roeck <groeck@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: Jeff Xu <jeffxu@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jorge Lucangeli Obes <jorgelo@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: Stephen Röttger <sroettger@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Amer Al Shanawany <amer.shanawany@gmail.com> Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jeff Xu [Mon, 15 Apr 2024 16:35:20 +0000 (16:35 +0000)]
mseal: wire up mseal syscall
Patch series "Introduce mseal", v10.
This patchset proposes a new mseal() syscall for the Linux kernel.
In a nutshell, mseal() protects the VMAs of a given virtual memory range
against modifications, such as changes to their permission bits.
Modern CPUs support memory permissions, such as the read/write (RW) and
no-execute (NX) bits. Linux has supported NX since the release of kernel
version 2.6.8 in August 2004 [1]. The memory permission feature improves
the security stance on memory corruption bugs, as an attacker cannot
simply write to arbitrary memory and point the code to it. The memory
must be marked with the X bit, or else an exception will occur.
Internally, the kernel maintains the memory permissions in a data
structure called VMA (vm_area_struct). mseal() additionally protects the
VMA itself against modifications of the selected seal type.
Memory sealing is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For example,
such an attacker primitive can break control-flow integrity guarantees
since read-only memory that is supposed to be trusted can become writable
or .text pages can get remapped. Memory sealing can automatically be
applied by the runtime loader to seal .text and .rodata pages and
applications can additionally seal security critical data at runtime. A
similar feature already exists in the XNU kernel with the
VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall
[4]. Also, Chrome wants to adopt this feature for their CFI work [2] and
this patchset has been designed to be compatible with the Chrome use case.
Two system calls are involved in sealing the map: mmap() and mseal().
The new mseal() is an syscall on 64 bit CPU, and with following signature:
int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.
mseal() blocks following operations for the given memory range.
1> Unmapping, moving to another location, and shrinking the size,
via munmap() and mremap(), can leave an empty space, therefore can
be replaced with a VMA with a new set of attributes.
2> Moving or expanding a different VMA into the current location,
via mremap().
3> Modifying a VMA via mmap(MAP_FIXED).
4> Size expansion, via mremap(), does not appear to pose any specific
risks to sealed VMAs. It is included anyway because the use case is
unclear. In any case, users can rely on merging to expand a sealed VMA.
5> mprotect() and pkey_mprotect().
6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
memory, when users don't have write permission to the memory. Those
behaviors can alter region contents by discarding pages, effectively a
memset(0) for anonymous memory.
The idea that inspired this patch comes from Stephen Röttger’s work in
V8 CFI [5]. Chrome browser in ChromeOS will be the first user of this
API.
Indeed, the Chrome browser has very specific requirements for sealing,
which are distinct from those of most applications. For example, in the
case of libc, sealing is only applied to read-only (RO) or read-execute
(RX) memory segments (such as .text and .RELRO) to prevent them from
becoming writable, the lifetime of those mappings are tied to the lifetime
of the process.
Chrome wants to seal two large address space reservations that are managed
by different allocators. The memory is mapped RW- and RWX respectively
but write access to it is restricted using pkeys (or in the future ARM
permission overlay extensions). The lifetime of those mappings are not
tied to the lifetime of the process, therefore, while the memory is
sealed, the allocators still need to free or discard the unused memory.
For example, with madvise(DONTNEED).
However, always allowing madvise(DONTNEED) on this range poses a security
risk. For example if a jump instruction crosses a page boundary and the
second page gets discarded, it will overwrite the target bytes with zeros
and change the control flow. Checking write-permission before the discard
operation allows us to control when the operation is valid. In this case,
the madvise will only succeed if the executing thread has PKEY write
permissions and PKRU changes are protected in software by control-flow
integrity.
Although the initial version of this patch series is targeting the Chrome
browser as its first user, it became evident during upstream discussions
that we would also want to ensure that the patch set eventually is a
complete solution for memory sealing and compatible with other use cases.
The specific scenario currently in mind is glibc's use case of loading and
sealing ELF executables. To this end, Stephen is working on a change to
glibc to add sealing support to the dynamic linker, which will seal all
non-writable segments at startup. Once this work is completed, all
applications will be able to automatically benefit from these new
protections.
In closing, I would like to formally acknowledge the valuable
contributions received during the RFC process, which were instrumental in
shaping this patch:
Jann Horn: raising awareness and providing valuable insights on the
destructive madvise operations.
Liam R. Howlett: perf optimization.
Linus Torvalds: assisting in defining system call signature and scope.
Theo de Raadt: sharing the experiences and insight gained from
implementing mimmutable() in OpenBSD.
MM perf benchmarks
==================
This patch adds a loop in the mprotect/munmap/madvise(DONTNEED) to
check the VMAs’ sealing flag, so that no partial update can be made,
when any segment within the given memory range is sealed.
To measure the performance impact of this loop, two tests are developed.
[8]
The first is measuring the time taken for a particular system call,
by using clock_gettime(CLOCK_MONOTONIC). The second is using
PERF_COUNT_HW_REF_CPU_CYCLES (exclude user space). Both tests have
similar results.
The tests have roughly below sequence:
for (i = 0; i < 1000, i++)
create 1000 mappings (1 page per VMA)
start the sampling
for (j = 0; j < 1000, j++)
mprotect one mapping
stop and save the sample
delete 1000 mappings
calculates all samples.
Below tests are performed on Intel(R) Pentium(R) Gold 7505 @ 2.00GHz,
4G memory, Chromebook.
From 5.10 to 6.8
munmap: added 250-550 ns in time, or 500-1100 in cpu cycle, per vma.
mprotect: added 200-750 ns in time, or 500-1200 in cpu cycle, per vma.
madvise: added 33-50 ns in time, or 70-110 in cpu cycle, per vma.
In comparison to mseal, which adds 20-40 ns or 50-100 CPU cycles, the
increase from 5.10 to 6.8 is significantly larger, approximately ten times
greater for munmap and mprotect.
When I discuss the mm performance with Brian Makin, an engineer who worked
on performance, it was brought to my attention that such performance
benchmarks, which measuring millions of mm syscall in a tight loop, may
not accurately reflect real-world scenarios, such as that of a database
service. Also this is tested using a single HW and ChromeOS, the data
from another HW or distribution might be different. It might be best to
take this data with a grain of salt.
This patch (of 5):
Wire up mseal syscall for all architectures.
Link: https://lkml.kernel.org/r/20240415163527.626541-1-jeffxu@chromium.org Link: https://lkml.kernel.org/r/20240415163527.626541-2-jeffxu@chromium.org Signed-off-by: Jeff Xu <jeffxu@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guenter Roeck <groeck@chromium.org> Cc: Jann Horn <jannh@google.com> [Bug #2] Cc: Jeff Xu <jeffxu@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jorge Lucangeli Obes <jorgelo@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Pedro Falcato <pedro.falcato@gmail.com> Cc: Stephen Röttger <sroettger@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Amer Al Shanawany <amer.shanawany@gmail.com> Cc: Javier Carrasco <javier.carrasco.cruz@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Thu, 23 May 2024 20:51:09 +0000 (13:51 -0700)]
Merge tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Stable fixes:
- nfs: fix undefined behavior in nfs_block_bits()
- NFSv4.2: Fix READ_PLUS when server doesn't support OP_READ_PLUS
Bugfixes:
- Fix mixing of the lock/nolock and local_lock mount options
- NFSv4: Fixup smatch warning for ambiguous return
- NFSv3: Fix remount when using the legacy binary mount api
- SUNRPC: Fix the handling of expired RPCSEC_GSS contexts
- SUNRPC: fix the NFSACL RPC retries when soft mounts are enabled
- rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
Features and cleanups:
- NFSv3: Use the atomic_open API to fix open(O_CREAT|O_TRUNC)
- pNFS/filelayout: S layout segment range in LAYOUTGET
- pNFS: rework pnfs_generic_pg_check_layout to check IO range
- NFSv2: Turn off enabling of NFS v2 by default"
* tag 'nfs-for-6.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: fix undefined behavior in nfs_block_bits()
pNFS: rework pnfs_generic_pg_check_layout to check IO range
pNFS/filelayout: check layout segment range
pNFS/filelayout: fixup pNfs allocation modes
rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
NFS: Don't enable NFS v2 by default
NFS: Fix READ_PLUS when server doesn't support OP_READ_PLUS
sunrpc: fix NFSACL RPC retry on soft mount
SUNRPC: fix handling expired GSS context
nfs: keep server info for remounts
NFSv4: Fixup smatch warning for ambiguous return
NFS: make sure lock/nolock overriding local_lock mount option
NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.
pNFS/filelayout: Specify the layout segment range in LAYOUTGET
pNFS/filelayout: Remove the whole file layout requirement
Linus Torvalds [Thu, 23 May 2024 20:44:47 +0000 (13:44 -0700)]
Merge tag 'block-6.10-20240523' of git://git.kernel.dk/linux
Pull more block updates from Jens Axboe:
"Followup block updates, mostly due to NVMe being a bit late to the
party. But nothing major in there, so not a big deal.
* tag 'block-6.10-20240523' of git://git.kernel.dk/linux: (24 commits)
null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues'
blk-throttle: remove unused struct 'avg_latency_bucket'
block: fix lost bio for plug enabled bio based device
block: t10-pi: add MODULE_DESCRIPTION()
blk-mq: add helper for checking if one CPU is mapped to specified hctx
blk-cgroup: Properly propagate the iostat update up the hierarchy
blk-cgroup: fix list corruption from reorder of WRITE ->lqueued
blk-cgroup: fix list corruption from resetting io stat
cdrom: rearrange last_media_change check to avoid unintentional overflow
nbd: Fix signal handling
nbd: Remove a local variable from nbd_send_cmd()
nbd: Improve the documentation of the locking assumptions
nbd: Remove superfluous casts
nbd: Use NULL to represent a pointer
brd: implement discard support
null_blk: Fix two sparse warnings
ublk_drv: set DMA alignment mask to 3
nvme-rdma, nvme-tcp: include max reconnects for reconnect logging
nvmet-rdma: Avoid o(n^2) loop in delete_ctrl
nvme: do not retry authentication failures
...
Linus Torvalds [Thu, 23 May 2024 20:41:49 +0000 (13:41 -0700)]
Merge tag 'io_uring-6.10-20240523' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
"Single fix here for a regression in 6.9, and then a simple cleanup
removing some dead code"
* tag 'io_uring-6.10-20240523' of git://git.kernel.dk/linux:
io_uring: remove checks for NULL 'sq_offset'
io_uring/sqpoll: ensure that normal task_work is also run timely
Linus Torvalds [Thu, 23 May 2024 20:39:42 +0000 (13:39 -0700)]
Merge tag 'regulator-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A bunch of fixes that came in during the merge window.
Matti found several issues with some of the more complexly configured
Rohm regulators and the helpers they use and there were some errors in
the specification of tps6594 when regulators are grouped together"
* tag 'regulator-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: tps6594-regulator: Correct multi-phase configuration
regulator: tps6287x: Force writing VSEL bit
regulator: pickable ranges: don't always cache vsel
regulator: rohm-regulator: warn if unsupported voltage is set
regulator: bd71828: Don't overwrite runtime voltages
Linus Torvalds [Thu, 23 May 2024 20:38:31 +0000 (13:38 -0700)]
Merge tag 'regmap-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
"Guenter ran with memory sanitisers and found an issue in the new KUnit
tests that Richard added where an assumption in older test code was
exposed, this was fixed quickly by Richard"
* tag 'regmap-fix-v6.10-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: kunit: Fix array overflow in stride() test
Dongli Zhang [Wed, 22 May 2024 22:02:18 +0000 (15:02 -0700)]
genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline
The absence of IRQD_MOVE_PCNTXT prevents immediate effectiveness of
interrupt affinity reconfiguration via procfs. Instead, the change is
deferred until the next instance of the interrupt being triggered on the
original CPU.
When the interrupt next triggers on the original CPU, the new affinity is
enforced within __irq_move_irq(). A vector is allocated from the new CPU,
but the old vector on the original CPU remains and is not immediately
reclaimed. Instead, apicd->move_in_progress is flagged, and the reclaiming
process is delayed until the next trigger of the interrupt on the new CPU.
Upon the subsequent triggering of the interrupt on the new CPU,
irq_complete_move() adds a task to the old CPU's vector_cleanup list if it
remains online. Subsequently, the timer on the old CPU iterates over its
vector_cleanup list, reclaiming old vectors.
However, a rare scenario arises if the old CPU is outgoing before the
interrupt triggers again on the new CPU.
In that case irq_force_complete_move() is not invoked on the outgoing CPU
to reclaim the old apicd->prev_vector because the interrupt isn't currently
affine to the outgoing CPU, and irq_needs_fixup() returns false. Even
though __vector_schedule_cleanup() is later called on the new CPU, it
doesn't reclaim apicd->prev_vector; instead, it simply resets both
apicd->move_in_progress and apicd->prev_vector to 0.
As a result, the vector remains unreclaimed in vector_matrix, leading to a
CPU vector leak.
To address this issue, move the invocation of irq_force_complete_move()
before the irq_needs_fixup() call to reclaim apicd->prev_vector, if the
interrupt is currently or used to be affine to the outgoing CPU.
Additionally, reclaim the vector in __vector_schedule_cleanup() as well,
following a warning message, although theoretically it should never see
apicd->move_in_progress with apicd->prev_cpu pointing to an offline CPU.
Fixes: f0383c24b485 ("genirq/cpuhotplug: Add support for cleaning up move in progress") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240522220218.162423-1-dongli.zhang@oracle.com