Jonas Karlman [Thu, 27 Jun 2024 21:17:31 +0000 (21:17 +0000)]
arm64: dts: rockchip: Add Radxa ROCK 3B
The Radxa ROCK 3B is a single-board computer based on the Pico-ITX form
factor (100mm x 75mm). Two versions of the ROCK 3B exists, a community
version based on the RK3568 SoC and an industrial version based on the
RK3568J SoC.
Add initial support for eMMC, SD-card, Ethernet, HDMI, PCIe and USB.
Jonas Karlman [Thu, 27 Jun 2024 21:17:30 +0000 (21:17 +0000)]
dt-bindings: arm: rockchip: Add Radxa ROCK 3B
Add devicetree binding documentation for the Radxa ROCK 3B board.
The Radxa ROCK 3B is a single-board computer based on the Pico-ITX form
factor (100mm x 75mm). Two versions of the ROCK 3B exists, a community
version based on the RK3568 SoC and an industrial version based on the
RK3568J SoC.
The ROCK 5 ITX as the name suggests is made in the ITX form factor and
actually built in a form to be used in a regular case even providing
connectors for regular front-panel io.
It can be powered either by 12V, ATX power-supply or PoE.
Notable peripherals are the 4 SATA ports, M.2 M-Key slot, M.2 E-key slot,
2*2.5Gb PCIe-connected Ethernet NICs.
As of yet unsupported display options consist of 2*HDMI, DP via USB-c,
eDP + 2*DSI via PCB connectors.
USB ports are 4*USB3 + 2*USB2 on the back panel and 2-port front-panel
connector.
Schematics for the board can be found on
- https://dl.radxa.com/rock5/5itx/radxa_rock_5_itx_X1100_schematic.pdf
- https://dl.radxa.com/rock5/5itx/v1110/radxa_rock_5itx_v1110_schematic.pdf
arm64: dts: rockchip: Add dma-names to uart1 on Pine64 rk3566 devices
Similar to bf6f26deb0e8 ("arm64: dts: rockchip: Add dma-names to uart1
on quartz64-b") also add the dma-names property to the other rk3566
devices from Pine64 with bluetooth functionality.
arm64: dts: rockchip: Add avdd supplies to hdmi on rock64
Pine64's Rock64 was missing the avdd supply properties on the hdmi node,
causing the following warnings:
dwhdmi-rockchip ff3c0000.hdmi: supply avdd-0v9 not found, using dummy regulator
dwhdmi-rockchip ff3c0000.hdmi: supply avdd-1v8 not found, using dummy regulator
In the Rock64 Schematic document version 2.0 those supplies are marked
as DVIDEO_AVDD_1V0 and DVIDEO_AVDD_1V8 respectively, but in version 3.0
those are named HDMI_AVDD_1V0 and HDMI_AVDD_1V8, which is a bit clearer.
In both versions those are connected to LDO3 and LDO1 respectively.
While the DeviceTree property is named 'avdd-0v9-supply' the
'rockchip,dw-hdmi.yaml' binding document notes the following:
A 0.9V supply that powers up the SoC internal circuitry. The actual
pin name varies between the different SoCs and is usually
HDMI_TX_AVDD_0V9 or sometimes HDMI_AVDD_1V0.
Chukun Pan [Sun, 30 Jun 2024 15:00:04 +0000 (23:00 +0800)]
arm64: dts: rockchip: fix pmu_io supply for Lunzn Fastrhino R6xS
Fixes pmu_io_domains supply according to the schematic. Among them,
the vccio3 is responsible for the io voltage of sdcard. There is no
sdcard slot on the R68S, and it's connected to vcc_3v3, so describe
the supply of vccio3 separately.
Chukun Pan [Mon, 1 Jul 2024 14:30:26 +0000 (22:30 +0800)]
arm64: dts: rockchip: fix usb regulator for Lunzn Fastrhino R6xS
Remove the non-existent usb_host regulator and fix the supply according
to the schematic. Also remove the unnecessary always-on and boot-on for
the usb_otg regulator.
Diederik de Haas [Fri, 28 Jun 2024 12:00:43 +0000 (14:00 +0200)]
arm64: dts: rockchip: Add dma-names to uart1 on quartz64-b
There have been several attempts to set the dma-names property on the
SoC level (in rk356x.dtsi), but that appears to cause problems when set
on channels without flow control.
Quoting part of a previous attempt for clarification:
> Nah, enabling it for bluetooth is fine because you have flow control.
> My issues have been on channels without flow control. Without DMA it
> simply drops messages or the channel hangs until you close and reopen
> it. With DMA, when an overflow locks up the channel it is usually
> unavailable until the board is rebooted.
Setting it on the board level for the bluetooth connection was deemed
safe, so do so for the Quartz64 Model B.
This fixes the following error/warning:
of_dma_request_slave_channel: dma-names property of node
'/serial@fe650000' missing or empty
dw-apb-uart fe650000.serial: failed to request DMA
Dragan Simic [Sun, 30 Jun 2024 16:00:41 +0000 (18:00 +0200)]
arm64: dts: rockchip: Update GPU OPP voltages in RK356x SoC dtsi
Update the values for the exact Rockchip RK356x GPU OPP voltages and the
lower limits for the GPU OPP voltage ranges, using the most conservative
values (i.e. the highest per-OPP voltages) found in the vendor kernel source
(cf. downstream commit f8b9431ee38e ("arm64: dts: rockchip: rk3568: support
adjust opp-table by otp")). [1][2]
Using the most conservative per-OPP voltages ensures reliable GPU operation
regardless of the actual GPU binning, with the downside of possibly using
a bit more power than absolutely needed.
Dragan Simic [Sun, 30 Jun 2024 16:00:40 +0000 (18:00 +0200)]
arm64: dts: rockchip: Add GPU OPP voltage ranges to RK356x SoC dtsi
Add support for voltage ranges to the GPU OPPs defined in the SoC dtsi for
Rockchip RK356x. This is, for example, useful for RK356x-based boards that
are designed to use the same power supply for the GPU and NPU portions of
the SoC, which is described further in the following documents:
- Rockchip RK3566 Hardware Design Guide, version 1.1.0, page 37
- Rockchip RK3568 Hardware Design Guide, version 1.2, page 78
The values for the exact GPU OPP voltages and the lower limits for the GPU
OPP voltage ranges differ from the values found in the vendor kernel source
(cf. downstream commit f8b9431ee38e ("arm64: dts: rockchip: rk3568: support
adjust opp-table by otp")), [1][2] and present the exact GPU OPP voltage
values that have served us well so far.
Marek Vasut [Sun, 30 Jun 2024 03:48:42 +0000 (05:48 +0200)]
arm64: dts: rockchip: Drop ethernet-phy-ieee802.3-c22 from PHY compatible string on all RK3588 boards
The rtl82xx DT bindings do not require ethernet-phy-ieee802.3-c22
as the fallback compatible string. There are fewer users of the
Realtek PHY compatible string with fallback compatible string than
there are users without fallback compatible string, so drop the
fallback compatible string from the few remaining users:
arm64: dts: rockchip: Add missing power-domains for rk356x vop_mmu
The iommu@fe043e00 on RK356x SoC shares the VOP power domain, but the
power-domains property was not provided when the node has been added.
The consequence is that an attempt to reload the rockchipdrm module will
freeze the entire system. That is because on probe time,
pm_runtime_get_suppliers() gets called for vop@fe040000, which blocks
when pm_runtime_get_sync() is being invoked for iommu@fe043e00.
Dragan Simic [Sun, 2 Jun 2024 06:25:38 +0000 (08:25 +0200)]
arm64: dts: rockchip: Delete the SoC variant dtsi for RK3399Pro
The commit 587b4ee24fc7 ("arm64: dts: rockchip: add core dtsi file for
RK3399Pro SoCs") describes the RK3399Pro's PCI Express interface as the way
built-in NPU communicates with the rest of the SoC. All available evidence
shows this not to be accurate, as described in detail below. Moreover, the
rk3399pro.dtsi isn't used anywhere, so let's delete it.
The publicly available schematics of the Radxa Rock Pi N10 carrier board [1]
and the Vamrs VMARC RK3399Pro SoM, [2] which put together form the currently
single supported RK3399Pro-based board, clearly show that the PCI Express x4
interface of this SoC is fully functional and actually not used by the SoC
to communicate with the built-in NPU. In more detail, the VMARC SoM exports
the SoC's PCI Express interface at its board-to-board connector, and the Rock
Pi N10 routes it to an M.2 M-key slot with four PCI Express lanes.
Both the Rockchip RK3399Pro datasheet, version 1.1, [3] and the Rockchip
RK3399Pro technical reference manual (TRM), first part of the version 1.0, [4]
don't describe that the SoC's PCI Express interface is reserved for the NPU.
Instead, the RK3399Pro TRM describes that the NPU uses AHB and AXI interfaces
as the host interface (HIF). The RK3399Pro datasheet clearly describes that
the PCI Express x4 interface is available for general-purpose use, just like
it's the case with the standard Rockchip RK3399 SoC, [5] albeit with a bit
shorter feature list provided in the RK3399Pro datasheet.
Even the publicly available reference RK3399Pro schematic from Rockchip [6]
shows the availability of a standard PCI Express slot with four lanes, which
would be pretty much impossible if the PCI Express interface was reserved
for the communication with the built-in NPU.
Based on the RK3399Pro datasheet [3] and the board schematics, [2][6] the
built-in NPU actually exports NPU_PCIE as a separate PCI Express x2 interface
that's partially pinmuxed with the NPU's separate USB 3.0 interface, which is
described further in the next paragraph. However, the NPU's separate PCI
Express x2 interface is left undocumented in the publicly available RK3399Pro
documentation, in which it's clearly described as reserved for internal use
and not intended for the communication with the NPU. Finally, the evidently
independent nature of the separate NPU_PCIE x2 interface makes ignoring it
safe when it comes to determining the nature and the availability of the
RK3399Pro's main PCI Express x4 interface.
The actual application-level communication with the built-in NPU, including
powering it up and down and uploading the NPU firmware, is performed through
the separate USB 2.0 and USB 3.0 interfaces exported by the NPU, [7] which
the VMARC SoM [2] and the reference board design from Rockchip [6] route to
the SoC's standard USB 2.0 and USB 3.0 interfaces, to make the NPU accessible
to software running on the SoC's ARM cores.
rk3568-evb1-v10.dtb: pmic@20: codec: 'mic-in-differential' does not match any of the regexes: 'pinctrl-[0-9]+'
from schema $id: http://devicetree.org/schemas/mfd/rockchip,rk809.yaml#
Cristian Ciocaltea [Fri, 21 Jun 2024 21:57:21 +0000 (00:57 +0300)]
arm64: dts: rockchip: Fix mic-in-differential usage on rk3566-roc-pc
The 'mic-in-differential' DT property supported by the RK809/RK817 audio
codec driver is actually valid if prefixed with 'rockchip,':
DTC_CHK arch/arm64/boot/dts/rockchip/rk3566-roc-pc.dtb
rk3566-roc-pc.dtb: pmic@20: codec: 'mic-in-differential' does not match any of the regexes: 'pinctrl-[0-9]+'
from schema $id: http://devicetree.org/schemas/mfd/rockchip,rk809.yaml#
Cristian Ciocaltea [Fri, 21 Jun 2024 21:57:20 +0000 (00:57 +0300)]
arm64: dts: rockchip: Drop invalid mic-in-differential on rk3568-rock-3a
The 'mic-in-differential' DT property supported by the RK809/RK817 audio
codec driver is actually valid if prefixed with 'rockchip,':
DTC_CHK arch/arm64/boot/dts/rockchip/rk3568-rock-3a.dtb
rk3568-rock-3a.dtb: pmic@20: codec: 'mic-in-differential' does not match any of the regexes: 'pinctrl-[0-9]+'
from schema $id: http://devicetree.org/schemas/mfd/rockchip,rk809.yaml#
However, the board doesn't make use of differential signaling, hence
drop the incorrect property and the now unnecessary 'codec' node.
Fixes: 22a442e6586c ("arm64: dts: rockchip: add basic dts for the radxa rock3 model a") Reported-by: Jonas Karlman <jonas@kwiboo.se> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20240622-rk809-fixes-v2-3-c0db420d3639@collabora.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Niklas Cassel [Fri, 7 Jun 2024 11:14:33 +0000 (13:14 +0200)]
arm64: dts: rockchip: Add rock5b overlays for PCIe endpoint mode
Add rock5b overlays for PCIe endpoint mode support.
If using the rock5b as an endpoint against a normal PC, only the
rk3588-rock-5b-pcie-ep.dtbo needs to be applied.
If using two rock5b:s, with one board as EP and the other board as RC,
rk3588-rock-5b-pcie-ep.dtbo and rk3588-rock-5b-pcie-srns.dtbo has to
be applied to the respective boards.
Trevor Woerner [Thu, 20 Jun 2024 01:32:49 +0000 (21:32 -0400)]
arm64: dts: rockchip: add gpio-line-names to radxa-zero-3
Add names to the pins of the general-purpose expansion header as given
in the Radxa documentation[1] following the conventions in the kernel[2]
to make it easier for users to correlate pins with functions when using
utilities such as 'gpioinfo'.
Alexey Charkov [Mon, 17 Jun 2024 18:28:58 +0000 (22:28 +0400)]
arm64: dts: rockchip: Split GPU OPPs of RK3588 and RK3588j
RK3588j uses a different set of OPPs for its GPU, both in terms of
allowed frequencies and in terms of voltages.
Move the GPU OPPs table into per-variant .dtsi files to accommodate
for this difference.
The table for RK3588j is adapted from Rockchip downstream sources [1],
while RK3588 one is moved verbatim into the per-variant .dtsi file.
The values provided for RK3588 in the downstream sources match those
in the original commit.
Alexey Charkov [Mon, 17 Jun 2024 18:28:57 +0000 (22:28 +0400)]
arm64: dts: rockchip: Add OPP data for CPU cores on RK3588j
RK3588j is the 'industrial' variant of RK3588, and it uses a different
set of OPPs both in terms of allowed frequencies and in terms of
applicable voltages at each frequency setpoint.
Add the OPPs that apply to RK3588j (and apparently RK3588m too) to
enable dynamic CPU frequency scaling.
OPP values are derived from Rockchip downstream sources [1] by taking
only those OPPs which have the highest frequency for a given voltage
level and dropping the rest (if they are included, the kernel complains
at boot time about them being inefficient)
Alexey Charkov [Mon, 17 Jun 2024 18:28:56 +0000 (22:28 +0400)]
arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
By default the CPUs on RK3588 start up in a conservative performance
mode. Add frequency and voltage mappings to the device tree to enable
dynamic scaling via cpufreq.
OPP values are adapted from Radxa's downstream kernel for Rock 5B [1],
stripping them down to the minimum frequency and voltage combinations
as expected by the generic upstream cpufreq-dt driver, and also dropping
those OPPs that don't differ in voltage but only in frequency (keeping
the top frequency OPP in each case).
Note that this patch ignores voltage scaling for the CPU memory
interface which the downstream kernel does through a custom cpufreq
driver, and which is why the downstream version has two sets of voltage
values for each OPP (the second one being meant for the memory
interface supply regulator). This is done instead via regulator
coupling between CPU and memory interface supplies on affected boards.
This has been tested on Rock 5B with u-boot 2023.11 compiled from
Collabora's integration tree [2] with binary bl31 and appears to be
stable both under active cooling and passive cooling (with throttling)
RK3588 chips allow for their CPU cores to be powered by a different
supply vs. their corresponding memory interfaces, and two of the
boards currently upstream do that (EVB1 and QuartzPro64).
The voltage of the memory interface though has to match that of the
CPU cores that use it, which downstream kernels achieve by the means
of a custom cpufreq driver which adjusts both at the same time.
It seems that regulator coupling is a more appropriate generic
interface for it, so this patch introduces coupling to affected
device trees to ensure that memory interface voltage is also updated
whenever cpufreq switches between CPU OPPs.
Note that other boards, such as Radxa Rock 5B, define both the CPU
and memory interface regulators as aliases to the same DT node, so
this doesn't apply there.
FUKAUMI Naoki [Thu, 20 Jun 2024 22:44:35 +0000 (07:44 +0900)]
arm64: dts: rockchip: fix mmc aliases for Radxa ZERO 3E/3W
align with other Radxa products.
- mmc0 is eMMC
- mmc1 is microSD
for ZERO 3E, there is no eMMC, but aliases should start at 0, so mmc0
is microSD as exception.
Fixes: 1a5c8d307c83 ("arm64: dts: rockchip: Add Radxa ZERO 3W/3E") Signed-off-by: FUKAUMI Naoki <naoki@radxa.com>
Changes in v3:
- fix syntax error in rk3566-radxa-zero-3e.dts
Changes in v2:
- microSD is mmc0 instead of mmc1 for ZERO 3E
Peter Robinson [Sun, 23 Jun 2024 16:53:20 +0000 (17:53 +0100)]
arm64: dts: rockchip: Add Pinephone Pro support for GPIO LEDs
The PinePhone Pro has a cluster of 3 single RGB GPIO LEDs.
Add the GPIO entries for the 3 red/green/blue LEDs and an
entry for the multi-color group to allow them to be used
as a combined RGB LED.
Alexey Charkov [Mon, 17 Jun 2024 18:28:54 +0000 (22:28 +0400)]
arm64: dts: rockchip: enable automatic fan control on Rock 5B
This links the PWM fan on Radxa Rock 5B as an active cooling device
managed automatically by the thermal subsystem, with a target SoC
temperature of 65C and a minimum-spin interval from 55C to 65C to
ensure airflow when the system gets warm
Alexey Charkov [Mon, 17 Jun 2024 18:28:52 +0000 (22:28 +0400)]
arm64: dts: rockchip: enable thermal management on all RK3588 boards
This enables the on-chip thermal monitoring sensor (TSADC) on all
RK3588(s) boards that don't have it enabled yet. It provides temperature
monitoring for the SoC and emergency thermal shutdowns, and is thus
important to have in place before CPU DVFS is enabled, as high CPU
operating performance points can overheat the chip quickly in the
absence of thermal management.
Alexey Charkov [Mon, 17 Jun 2024 18:28:51 +0000 (22:28 +0400)]
arm64: dts: rockchip: add thermal zones information on RK3588
This includes the necessary device tree data to allow thermal
monitoring on RK3588(s) using the on-chip TSADC device, along with
trip points for automatic thermal management.
Each of the CPU clusters (one for the little cores and two for
the big cores) get a passive cooling trip point at 85C, which
will trigger DVFS throttling of the respective cluster upon
reaching a high temperature condition.
All zones also have a critical trip point at 115C, which will
trigger a reset.
Rename the Rockchip RK3588 SoC dtsi files and, consequently, adjust their
contents appropriately, to prepare them for the ability to specify different
CPU and GPU OPPs for each of the supported RK3588 SoC variants.
As already discussed, [1][2][3][4] some of the RK3588 SoC variants require
different OPPs, and it makes more sense to have the OPPs already defined when
a board dts(i) file includes one of the SoC variant dtsi files (rk3588.dtsi,
rk3588j.dtsi or rk3588s.dtsi), rather than requiring the board dts(i) file
to also include a separate rk3588*-opp.dtsi file. The choice of the SoC
variant is already made by the inclusion of the SoC dtsi file into the board
dts(i) file, and it doesn't make much sense to, effectively, allow the board
dts(i) file to include and use an incompatible set of OPPs for the already
selected RK3588 SoC variant.
The new naming scheme for the RK3588 SoC dtsi files uses "-base" and "-extra"
suffixes to denote the DT data shared between all RK5588 SoC variants, and
the DT data shared between the unrestricted SoC variants, respectively.
For example, the DT data for the RK3588 includes both rk3588-base.dtsi and
rk3588-extra.dtsi, because it's an unrestricted SoC variant, while the DT
data for the RK3588S variant includes rk3588-base.dtsi only, because it's
a restricted SoC variant, feature- and interface-wise. This achieves a more
logical naming of the RK3588 SoC dtsi files, which reflects the way DT data
for the SoC variants is built by "stacking" the SoC variant features made
available through the "-base" and "-extra" SoC dtsi files. Additionally,
the SoC variant dtsi files (rk3588.dtsi, rk3588j.dtsi and rk3588s.dtsi) are
no longer parents to any other SoC variant dtsi files, which should help with
making the new "stacking" approach cleaner and easier to follow.
The RK3588 pinctrl dtsi files are also renamed in the same way, for the sake
of consistency. This also keeps the "-base" and "-extra" groups of the dtsi
files together when looked at in a directory listing, which is helpful.
The per-SoC-variant OPPs should go directly into the SoC dtsi files, if no
more than one SoC variant uses those OPPs, or be put into a separate "-opp"
dtsi file that's shared between and included from two or more SoC variant
dtsi files. An example for the former is the non-shared OPP data that should
go directly into the RK3588J SoC variant dtsi file (i.e. rk3588j.dtsi), and
an example for the latter is the shared OPP data that should be put into
rk3588-opp.dtsi and be included from the RK3588 and RK3588S SoC variant dtsi
files (i.e. rk3588.dtsi and rk3588s.dtsi, respectively). Consequently, if
the OPPs for the RK3588 and RK3588S SoC variants are ever made different,
the shared rk3588-opp.dtsi file should be deleted and the new OPPs should
be put directly into rk3588.dtsi and rk3588s.dtsi. [4]
No functional changes are introduced, which was validated by decompiling and
comparing all affected dtb files before and after these changes.
As a side note, due to the nature of introduced changes, this commit is best
viewed using the --break-rewrites option for git-log(1).
Sebastian Kropatsch [Sun, 16 Jun 2024 21:48:30 +0000 (23:48 +0200)]
arm64: dts: rockchip: Add FriendlyElec CM3588 NAS board
The CM3588 NAS by FriendlyElec pairs the CM3588 compute module, based on
the Rockchip RK3588 SoC, with the CM3588 NAS Kit carrier board.
To reflect the hardware setup, add device tree sources for the SoM and
the NAS daughter board as separate files.
Hardware features:
- Rockchip RK3588 SoC
- 4GB/8GB/16GB LPDDR4x RAM
- 64GB eMMC
- MicroSD card slot
- 1x RTL8125B 2.5G Ethernet
- 4x M.2 M-Key with PCIe 3.0 x1 (via bifurcation) for NVMe SSDs
- 2x USB 3.0 (USB 3.1 Gen1) Type-A, 1x USB 2.0 Type-A
- 1x USB 3.0 Type-C with DP AltMode support
- 2x HDMI 2.1 out, 1x HDMI in
- MIPI-CSI Connector, MIPI-DSI Connector
- 40-pin GPIO header
- 4 buttons: power, reset, recovery, MASK, user button
- 3.5mm Headphone out, 2.0mm PH-2A Mic in
- 5V Fan connector, PWM beeper, IR receiver, RTC battery connector
PCIe bifurcation is used to handle all four M.2 sockets at PCIe 3.0 x1
speed. Data lane mapping in the DT is done like described in commit f8020dfb311d ("phy: rockchip-snps-pcie3: fix bifurcation on rk3588").
This device tree includes support for eMMC, SD card, ethernet, all USB2
and USB3 ports, all four M.2 slots, GPU, beeper, IR, RTC, UART debugging
as well as the buttons and LEDs.
The GPIOs are labeled according to the schematics.
Alexey Charkov [Fri, 17 May 2024 12:25:08 +0000 (16:25 +0400)]
arm64: dts: rockchip: add rfkill node for M.2 Key E Bluetooth on Rock 5B
By default the BT WAKE signal inside the M.2 key E connector on Radxa
Rock 5B is driven low, which results in the Bluetooth function being
disabled even if the inserted M.2 card supports it. Expose this signal
as an RFKILL device so that it can be enabled by the userspace.
Tested with an Intel AX210 card, which connects a Bluetooth device over
the USB 2.0 bus.
Jonas Karlman [Tue, 21 May 2024 21:10:07 +0000 (21:10 +0000)]
arm64: dts: rockchip: Add sdmmc related properties on rk3308-rock-pi-s
Add cap-mmc-highspeed to allow use of high speed MMC mode using an eMMC
to uSD board. Use disable-wp to signal that no physical write-protect
line is present. Also add vcc_io used for card and IO line power as
vmmc-supply.
Jonas Karlman [Tue, 21 May 2024 20:28:05 +0000 (20:28 +0000)]
arm64: dts: rockchip: Add Radxa ZERO 3W/3E
The Radxa ZERO 3W/3E is an ultra-small, high-performance single board
computer based on the Rockchip RK3566, with a compact form factor and
rich interfaces.
The ZERO 3W and ZERO 3E are basically the same size and model, but
differ only in storage and network interfaces.
Jonas Karlman [Tue, 21 May 2024 20:28:04 +0000 (20:28 +0000)]
dt-bindings: arm: rockchip: Add Radxa ZERO 3W/3E
Add devicetree binding documentation for Radxa ZERO 3W/3E boards.
The Radxa ZERO 3W/3E is an ultra-small, high-performance single board
computer based on the Rockchip RK3566, with a compact form factor and
rich interfaces.
Kent Overstreet [Fri, 24 May 2024 15:42:09 +0000 (11:42 -0400)]
mm: percpu: Include smp.h in alloc_tag.h
percpu.h depends on smp.h, but doesn't include it directly because of
circular header dependency issues; percpu.h is needed in a bunch of low
level headers.
This fixes a randconfig build error on mips:
include/linux/alloc_tag.h: In function '__alloc_tag_ref_set':
include/asm-generic/percpu.h:31:40: error: implicit declaration of function 'raw_smp_processor_id' [-Werror=implicit-function-declaration]
Linus Torvalds [Sun, 26 May 2024 16:54:26 +0000 (09:54 -0700)]
Merge tag 'perf-tools-fixes-for-v6.10-1-2024-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tool fix from Arnaldo Carvalho de Melo:
"Revert a patch causing a regression.
This made a simple 'perf record -e cycles:pp make -j199' stop working
on the Ampere ARM64 system Linus uses to test ARM64 kernels".
* tag 'perf-tools-fixes-for-v6.10-1-2024-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"
This made a simple 'perf record -e cycles:pp make -j199' stop working on
the Ampere ARM64 system Linus uses to test ARM64 kernels, as discussed
at length in the threads in the Link tags below.
The fix provided by Ian wasn't acceptable and work to fix this will take
time we don't have at this point, so lets revert this and work on it on
the next devel cycle.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com> Cc: Ethan Adams <j.ethan.adams@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tycho Andersen <tycho@tycho.pizza> Cc: Yang Jihong <yangjihong@bytedance.com> Link: https://lore.kernel.org/lkml/CAHk-=wi5Ri=yR2jBVk-4HzTzpoAWOgstr1LEvg_-OXtJvXXJOA@mail.gmail.com Link: https://lore.kernel.org/lkml/CAHk-=wiWvtFyedDNpoV7a8Fq_FpbB+F5KmWK2xPY3QoYseOf_A@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Sun, 26 May 2024 05:33:10 +0000 (22:33 -0700)]
Merge tag '6.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- two important netfs integration fixes - including for a data
corruption and also fixes for multiple xfstests
- reenable swap support over SMB3
* tag '6.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix missing set of remote_i_size
cifs: Fix smb3_insert_range() to move the zero_point
cifs: update internal version number
smb3: reenable swapfiles over SMB3 mounts
Linus Torvalds [Sat, 25 May 2024 22:10:33 +0000 (15:10 -0700)]
Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"16 hotfixes, 11 of which are cc:stable.
A few nilfs2 fixes, the remainder are for MM: a couple of selftests
fixes, various singletons fixing various issues in various parts"
* tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/ksm: fix possible UAF of stable_node
mm/memory-failure: fix handling of dissolved but not taken off from buddy pages
mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
nilfs2: fix potential hang in nilfs_detach_log_writer()
nilfs2: fix unexpected freezing of nilfs_segctor_sync()
nilfs2: fix use-after-free of timer for log writer thread
selftests/mm: fix build warnings on ppc64
arm64: patching: fix handling of execmem addresses
selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
selftests/mm: compaction_test: fix bogus test success on Aarch64
mailmap: update email address for Satya Priya
mm/huge_memory: don't unpoison huge_zero_folio
kasan, fortify: properly rename memintrinsics
lib: add version into /proc/allocinfo output
mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
Linus Torvalds [Sat, 25 May 2024 21:48:40 +0000 (14:48 -0700)]
Merge tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
- Fix x86 IRQ vector leak caused by a CPU offlining race
- Fix build failure in the riscv-imsic irqchip driver
caused by an API-change semantic conflict
- Fix use-after-free in irq_find_at_or_after()
* tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()
genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline
irqchip/riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
Linus Torvalds [Sat, 25 May 2024 21:40:09 +0000 (14:40 -0700)]
Merge tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix regressions of the new x86 CPU VFM (vendor/family/model)
enumeration/matching code
- Fix crash kernel detection on buggy firmware with
non-compliant ACPI MADT tables
- Address Kconfig warning
* tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL
crypto: x86/aes-xts - switch to new Intel CPU model defines
x86/topology: Handle bogus ACPI tables correctly
x86/kconfig: Select ARCH_WANT_FRAME_POINTERS again when UNWINDER_FRAME_POINTER=y
Linus Torvalds [Sat, 25 May 2024 21:23:58 +0000 (14:23 -0700)]
Merge tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"A series from Xiubo that adds support for additional access checks
based on MDS auth caps which were recently made available to clients.
This is needed to prevent scenarios where the MDS quietly discards
updates that a UID-restricted client previously (wrongfully) acked to
the user.
Other than that, just a documentation fixup"
* tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client:
doc: ceph: update userspace command to get CephFS metadata
ceph: add CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK feature bit
ceph: check the cephx mds auth access for async dirop
ceph: check the cephx mds auth access for open
ceph: check the cephx mds auth access for setattr
ceph: add ceph_mds_check_access() helper
ceph: save cap_auths in MDS client when session is opened
Linus Torvalds [Sat, 25 May 2024 21:19:01 +0000 (14:19 -0700)]
Merge tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 updates from Konstantin Komarov:
"Fixes:
- reusing of the file index (could cause the file to be trimmed)
- infinite dir enumeration
- taking DOS names into account during link counting
- le32_to_cpu conversion, 32 bit overflow, NULL check
- some code was refactored
Changes:
- removed max link count info display during driver init
Remove:
- atomic_open has been removed for lack of use"
* tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3:
fs/ntfs3: Break dir enumeration if directory contents error
fs/ntfs3: Fix case when index is reused during tree transformation
fs/ntfs3: Mark volume as dirty if xattr is broken
fs/ntfs3: Always make file nonresident on fallocate call
fs/ntfs3: Redesign ntfs_create_inode to return error code instead of inode
fs/ntfs3: Use variable length array instead of fixed size
fs/ntfs3: Use 64 bit variable to avoid 32 bit overflow
fs/ntfs3: Check 'folio' pointer for NULL
fs/ntfs3: Missed le32_to_cpu conversion
fs/ntfs3: Remove max link count info display during driver init
fs/ntfs3: Taking DOS names into account during link counting
fs/ntfs3: remove atomic_open
fs/ntfs3: use kcalloc() instead of kzalloc()
Linus Torvalds [Sat, 25 May 2024 21:15:39 +0000 (14:15 -0700)]
Merge tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
"Two ksmbd server fixes, both for stable"
* tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: ignore trailing slashes in share paths
ksmbd: avoid to send duplicate oplock break notifications
Linus Torvalds [Sat, 25 May 2024 20:33:53 +0000 (13:33 -0700)]
Merge tag 'rtc-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"There is one new driver and then most of the changes are the device
tree bindings conversions to yaml.
New driver:
- Epson RX8111
Drivers:
- Many Device Tree bindings conversions to dtschema
- pcf8563: wakeup-source support"
* tag 'rtc-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
pcf8563: add wakeup-source support
rtc: rx8111: handle VLOW flag
rtc: rx8111: demote warnings to debug level
rtc: rx6110: Constify struct regmap_config
dt-bindings: rtc: convert trivial devices into dtschema
dt-bindings: rtc: stmp3xxx-rtc: convert to dtschema
dt-bindings: rtc: pxa-rtc: convert to dtschema
rtc: Add driver for Epson RX8111
dt-bindings: rtc: Add Epson RX8111
rtc: mcp795: drop unneeded MODULE_ALIAS
rtc: nuvoton: Modify part number value
rtc: test: Split rtc unit test into slow and normal speed test
dt-bindings: rtc: nxp,lpc1788-rtc: convert to dtschema
dt-bindings: rtc: digicolor-rtc: move to trivial-rtc
dt-bindings: rtc: alphascale,asm9260-rtc: convert to dtschema
dt-bindings: rtc: armada-380-rtc: convert to dtschema
rtc: cros-ec: provide ID table for avoiding fallback match
Linus Torvalds [Sat, 25 May 2024 20:28:29 +0000 (13:28 -0700)]
Merge tag 'i3c/for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull i3c updates from Alexandre Belloni:
"Runtime PM (power management) is improved and hot-join support has
been added to the dw controller driver.
Core:
- Allow device driver to trigger controller runtime PM
Drivers:
- dw: hot-join support
- svc: better IBI handling"
* tag 'i3c/for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: dw: Add hot-join support.
i3c: master: Enable runtime PM for master controller
i3c: master: svc: fix invalidate IBI type and miss call client IBI handler
i3c: master: svc: change ENXIO to EAGAIN when IBI occurs during start frame
i3c: Add comment for -EAGAIN in i3c_device_do_priv_xfers()
Linus Torvalds [Sat, 25 May 2024 20:23:42 +0000 (13:23 -0700)]
Merge tag 'jffs2-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull jffs2 updates from Richard Weinberger:
- Fix illegal memory access in jffs2_free_inode()
- Kernel-doc fixes
- print symbolic error names
* tag 'jffs2-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
jffs2: Fix potential illegal address access in jffs2_free_inode
jffs2: Simplify the allocation of slab caches
jffs2: nodemgmt: fix kernel-doc comments
jffs2: print symbolic error name instead of error code
Linus Torvalds [Sat, 25 May 2024 20:17:48 +0000 (13:17 -0700)]
Merge tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML updates from Richard Weinberger:
- Fixes for -Wmissing-prototypes warnings and further cleanup
- Remove callback returning void from rtc and virtio drivers
- Fix bash location
* tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (26 commits)
um: virtio_uml: Convert to platform remove callback returning void
um: rtc: Convert to platform remove callback returning void
um: Remove unused do_get_thread_area function
um: Fix -Wmissing-prototypes warnings for __vdso_*
um: Add an internal header shared among the user code
um: Fix the declaration of kasan_map_memory
um: Fix the -Wmissing-prototypes warning for get_thread_reg
um: Fix the -Wmissing-prototypes warning for __switch_mm
um: Fix -Wmissing-prototypes warnings for (rt_)sigreturn
um: Stop tracking host PID in cpu_tasks
um: process: remove unused 'n' variable
um: vector: remove unused len variable/calculation
um: vector: fix bpfflash parameter evaluation
um: slirp: remove set but unused variable 'pid'
um: signal: move pid variable where needed
um: Makefile: use bash from the environment
um: Add winch to winch_handlers before registering winch IRQ
um: Fix -Wmissing-prototypes warnings for __warp_* and foo
um: Fix -Wmissing-prototypes warnings for text_poke*
um: Move declarations to proper headers
...
Linus Torvalds [Sat, 25 May 2024 00:28:02 +0000 (17:28 -0700)]
Merge tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Some fixes for the end of the merge window, mostly amdgpu and panthor,
with one nouveau uAPI change that fixes a bad decision we made a few
months back.
nouveau:
- fix bo metadata uAPI for vm bind
panthor:
- Fixes for panthor's heap logical block.
- Reset on unrecoverable fault
- Fix VM references.
- Reset fix.
xlnx:
- xlnx compile and doc fixes.
amdgpu:
- Handle vbios table integrated info v2.3
amdkfd:
- Handle duplicate BOs in reserve_bo_and_cond_vms
- Handle memory limitations on small APUs
dp/mst:
- MST null deref fix.
bridge:
- Don't let next bridge create connector in adv7511 to make probe
work"
* tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel:
drm/amdgpu/atomfirmware: add intergrated info v2.3 table
drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2
drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms
drm/bridge: adv7511: Attach next bridge without creating connector
drm/buddy: Fix the warn on's during force merge
drm/nouveau: use tile_mode and pte_kind for VM_BIND bo allocations
drm/panthor: Call panthor_sched_post_reset() even if the reset failed
drm/panthor: Reset the FW VM to NULL on unplug
drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level
drm/panthor: Force an immediate reset on unrecoverable faults
drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints
drm/panthor: Fix an off-by-one in the heap context retrieval logic
drm/panthor: Relax the constraints on the tiler chunk size
drm/panthor: Make sure the tiler initial/max chunks are consistent
drm/panthor: Fix tiler OOM handling to allow incremental rendering
drm: xlnx: zynqmp_dpsub: Fix compilation error
drm: xlnx: zynqmp_dpsub: Fix few function comments
David Howells [Fri, 24 May 2024 14:23:36 +0000 (15:23 +0100)]
cifs: Fix missing set of remote_i_size
Occasionally, the generic/001 xfstest will fail indicating corruption in
one of the copy chains when run on cifs against a server that supports
FSCTL_DUPLICATE_EXTENTS_TO_FILE (eg. Samba with a share on btrfs). The
problem is that the remote_i_size value isn't updated by cifs_setsize()
when called by smb2_duplicate_extents(), but i_size *is*.
This may cause cifs_remap_file_range() to then skip the bit after calling
->duplicate_extents() that sets sizes.
Fix this by calling netfs_resize_file() in smb2_duplicate_extents() before
calling cifs_setsize() to set i_size.
This means we don't then need to call netfs_resize_file() upon return from
->duplicate_extents(), but we also fix the test to compare against the pre-dup
inode size.
[Note that this goes back before the addition of remote_i_size with the
netfs_inode struct. It should probably have been setting cifsi->server_eof
previously.]
Fixes: cfc63fc8126a ("smb3: fix cached file size problems in duplicate extents (reflink)") Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev Signed-off-by: Steve French <stfrench@microsoft.com>
David Howells [Wed, 22 May 2024 08:38:48 +0000 (09:38 +0100)]
cifs: Fix smb3_insert_range() to move the zero_point
Fix smb3_insert_range() to move the zero_point over to the new EOF.
Without this, generic/147 fails as reads of data beyond the old EOF point
return zeroes.
Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") Signed-off-by: David Howells <dhowells@redhat.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev Signed-off-by: Steve French <stfrench@microsoft.com>
Chengming Zhou [Mon, 13 May 2024 03:07:56 +0000 (11:07 +0800)]
mm/ksm: fix possible UAF of stable_node
The commit 2c653d0ee2ae ("ksm: introduce ksm_max_page_sharing per page
deduplication limit") introduced a possible failure case in the
stable_tree_insert(), where we may free the new allocated stable_node_dup
if we fail to prepare the missing chain node.
Then that kfolio return and unlock with a freed stable_node set... And
any MM activities can come in to access kfolio->mapping, so UAF.
Fix it by moving folio_set_stable_node() to the end after stable_node
is inserted successfully.
Link: https://lkml.kernel.org/r/20240513-b4-ksm-stable-node-uaf-v1-1-f687de76f452@linux.dev Fixes: 2c653d0ee2ae ("ksm: introduce ksm_max_page_sharing per page deduplication limit") Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Stefan Roesch <shr@devkernel.io> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
memory_failure
try_memory_failure_hugetlb
me_huge_page
__page_handle_poison
dissolve_free_hugetlb_folio
drain_all_pages -- Buddy page can be isolated e.g. for compaction.
take_page_off_buddy -- Failed as page is not in the buddy list.
-- Page can be putback into buddy after compaction.
page_ref_inc -- Leads to buddy page with refcnt = 1.
Then unpoison_memory() can unpoison the page and send the buddy page back
into buddy list again leading to the above bad page state warning. And
bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy
page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to
allocate this page.
Fix this issue by only treating __page_handle_poison() as successful when
it returns 1.
Link: https://lkml.kernel.org/r/20240523071217.1696196-1-linmiaohe@huawei.com Fixes: ceaf8fbea79a ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yuanyuan Zhong [Thu, 23 May 2024 18:35:31 +0000 (12:35 -0600)]
mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
After switching smaps_rollup to use VMA iterator, searching for next entry
is part of the condition expression of the do-while loop. So the current
VMA needs to be addressed before the continue statement.
Otherwise, with some VMAs skipped, userspace observed memory
consumption from /proc/pid/smaps_rollup will be smaller than the sum of
the corresponding fields from /proc/pid/smaps.
Link: https://lkml.kernel.org/r/20240523183531.2535436-1-yzhong@purestorage.com Fixes: c4c84f06285e ("fs/proc/task_mmu: stop using linked list and highest_vm_end") Signed-off-by: Yuanyuan Zhong <yzhong@purestorage.com> Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ryusuke Konishi [Mon, 20 May 2024 13:26:21 +0000 (22:26 +0900)]
nilfs2: fix potential hang in nilfs_detach_log_writer()
Syzbot has reported a potential hang in nilfs_detach_log_writer() called
during nilfs2 unmount.
Analysis revealed that this is because nilfs_segctor_sync(), which
synchronizes with the log writer thread, can be called after
nilfs_segctor_destroy() terminates that thread, as shown in the call trace
below:
Fix this issue by changing nilfs_segctor_sync() so that the log writer
thread returns normally without synchronizing after it terminates, and by
forcing tasks that are already waiting to complete once after the thread
terminates.
The skipped inode metadata flushout will then be processed together in the
subsequent cleanup work in nilfs_segctor_destroy().
Ryusuke Konishi [Mon, 20 May 2024 13:26:20 +0000 (22:26 +0900)]
nilfs2: fix unexpected freezing of nilfs_segctor_sync()
A potential and reproducible race issue has been identified where
nilfs_segctor_sync() would block even after the log writer thread writes a
checkpoint, unless there is an interrupt or other trigger to resume log
writing.
This turned out to be because, depending on the execution timing of the
log writer thread running in parallel, the log writer thread may skip
responding to nilfs_segctor_sync(), which causes a call to schedule()
waiting for completion within nilfs_segctor_sync() to lose the opportunity
to wake up.
The reason why waking up the task waiting in nilfs_segctor_sync() may be
skipped is that updating the request generation issued using a shared
sequence counter and adding an wait queue entry to the request wait queue
to the log writer, are not done atomically. There is a possibility that
log writing and request completion notification by nilfs_segctor_wakeup()
may occur between the two operations, and in that case, the wait queue
entry is not yet visible to nilfs_segctor_wakeup() and the wake-up of
nilfs_segctor_sync() will be carried over until the next request occurs.
Fix this issue by performing these two operations simultaneously within
the lock section of sc_state_lock. Also, following the memory barrier
guidelines for event waiting loops, move the call to set_current_state()
in the same location into the event waiting loop to ensure that a memory
barrier is inserted just before the event condition determination.
Ryusuke Konishi [Mon, 20 May 2024 13:26:19 +0000 (22:26 +0900)]
nilfs2: fix use-after-free of timer for log writer thread
Patch series "nilfs2: fix log writer related issues".
This bug fix series covers three nilfs2 log writer-related issues,
including a timer use-after-free issue and potential deadlock issue on
unmount, and a potential freeze issue in event synchronization found
during their analysis. Details are described in each commit log.
This patch (of 3):
A use-after-free issue has been reported regarding the timer sc_timer on
the nilfs_sc_info structure.
The problem is that even though it is used to wake up a sleeping log
writer thread, sc_timer is not shut down until the nilfs_sc_info structure
is about to be freed, and is used regardless of the thread's lifetime.
Fix this issue by limiting the use of sc_timer only while the log writer
thread is alive.
Link: https://lkml.kernel.org/r/20240520132621.4054-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240520132621.4054-2-konishi.ryusuke@gmail.com Fixes: fdce895ea5dd ("nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: "Bai, Shuangpeng" <sjb7183@psu.edu> Closes: https://groups.google.com/g/syzkaller/c/MK_LYqtt8ko/m/8rgdWeseAwAJ Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Michael Ellerman [Tue, 21 May 2024 03:02:19 +0000 (13:02 +1000)]
selftests/mm: fix build warnings on ppc64
Fix warnings like:
In file included from uffd-unit-tests.c:8:
uffd-unit-tests.c: In function `uffd_poison_handle_fault':
uffd-common.h:45:33: warning: format `%llu' expects argument of type
`long long unsigned int', but argument 3 has type `__u64' {aka `long
unsigned int'} [-Wformat=]
By switching to unsigned long long for u64 for ppc64 builds.
The problem is because bpf_arch_text_copy() silently fails to write to the
read-only area as a result of patch_map() faulting and the resulting
-EFAULT being chucked away.
Update patch_map() to use CONFIG_EXECMEM instead of
CONFIG_STRICT_MODULE_RWX to check for vmalloc addresses.
Link: https://lkml.kernel.org/r/20240521213813.703309-1-rppt@kernel.org Fixes: 2c9e5d4a0082 ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of") Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Reported-by: Klara Modin <klarasmodin@gmail.com> Closes: https://lore.kernel.org/all/7983fbbf-0127-457c-9394-8d6e4299c685@gmail.com Tested-by: Klara Modin <klarasmodin@gmail.com> Cc: Björn Töpel <bjorn@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dev Jain [Tue, 21 May 2024 07:43:58 +0000 (13:13 +0530)]
selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
Reset nr_hugepages to zero before the start of the test.
If a non-zero number of hugepages is already set before the start of the
test, the following problems arise:
- The probability of the test getting OOM-killed increases. Proof:
The test wants to run on 80% of available memory to prevent OOM-killing
(see original code comments). Let the value of mem_free at the start
of the test, when nr_hugepages = 0, be x. In the other case, when
nr_hugepages > 0, let the memory consumed by hugepages be y. In the
former case, the test operates on 0.8 * x of memory. In the latter,
the test operates on 0.8 * (x - y) of memory, with y already filled,
hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 *
x. Q.E.D
- The probability of a bogus test success increases. Proof: Let the
memory consumed by hugepages be greater than 25% of x, with x and y
defined as above. The definition of compaction_index is c_index = (x -
y)/z where z is the memory consumed by hugepages after trying to
increase them again. In check_compaction(), we set the number of
hugepages to zero, and then increase them back; the probability that
they will be set back to consume at least y amount of memory again is
very high (since there is not much delay between the two attempts of
changing nr_hugepages). Hence, z >= y > (x/4) (by the 25% assumption).
Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3
hence, c_index can always be forced to be less than 3, thereby the test
succeeding always. Q.E.D
Link: https://lkml.kernel.org/r/20240521074358.675031-4-dev.jain@arm.com Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Dev Jain <dev.jain@arm.com> Cc: <stable@vger.kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sri Jayaramappa <sjayaram@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dev Jain [Tue, 21 May 2024 07:43:57 +0000 (13:13 +0530)]
selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
Currently, the test tries to set nr_hugepages to zero, but that is not
actually done because the file offset is not reset after read(). Fix that
using lseek().
Link: https://lkml.kernel.org/r/20240521074358.675031-3-dev.jain@arm.com Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Dev Jain <dev.jain@arm.com> Cc: <stable@vger.kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sri Jayaramappa <sjayaram@akamai.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dev Jain [Tue, 21 May 2024 07:43:56 +0000 (13:13 +0530)]
selftests/mm: compaction_test: fix bogus test success on Aarch64
Patch series "Fixes for compaction_test", v2.
The compaction_test memory selftest introduces fragmentation in memory
and then tries to allocate as many hugepages as possible. This series
addresses some problems.
On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since
compaction_index becomes 0, which is less than 3, due to no division by
zero exception being raised. We fix that by checking for division by
zero.
Secondly, correctly set the number of hugepages to zero before trying
to set a large number of them.
Now, consider a situation in which, at the start of the test, a non-zero
number of hugepages have been already set (while running the entire
selftests/mm suite, or manually by the admin). The test operates on 80%
of memory to avoid OOM-killer invocation, and because some memory is
already blocked by hugepages, it would increase the chance of OOM-killing.
Also, since mem_free used in check_compaction() is the value before we
set nr_hugepages to zero, the chance that the compaction_index will
be small is very high if the preset nr_hugepages was high, leading to a
bogus test success.
This patch (of 3):
Currently, if at runtime we are not able to allocate a huge page, the test
will trivially pass on Aarch64 due to no exception being raised on
division by zero while computing compaction_index. Fix that by checking
for nr_hugepages == 0. Anyways, in general, avoid a division by zero by
exiting the program beforehand. While at it, fix a typo, and handle the
case where the number of hugepages may overflow an integer.
The root cause is that HWPoison flag will be set for huge_zero_folio
without increasing the folio refcnt. But then unpoison_memory() will
decrease the folio refcnt unexpectedly as it appears like a successfully
hwpoisoned folio leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0) when
releasing huge_zero_folio.
Skip unpoisoning huge_zero_folio in unpoison_memory() to fix this issue.
We're not prepared to unpoison huge_zero_folio yet.
Link: https://lkml.kernel.org/r/20240516122608.22610-1-linmiaohe@huawei.com Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Xu Yu <xuyu@linux.alibaba.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrey Konovalov [Fri, 17 May 2024 13:01:18 +0000 (15:01 +0200)]
kasan, fortify: properly rename memintrinsics
After commit 69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*()
functions") and the follow-up fixes, with CONFIG_FORTIFY_SOURCE enabled,
even though the compiler instruments meminstrinsics by generating calls to
__asan/__hwasan_ prefixed functions, FORTIFY_SOURCE still uses
uninstrumented memset/memmove/memcpy as the underlying functions.
As a result, KASAN cannot detect bad accesses in memset/memmove/memcpy.
This also makes KASAN tests corrupt kernel memory and cause crashes.
To fix this, use __asan_/__hwasan_memset/memmove/memcpy as the underlying
functions whenever appropriate. Do this only for the instrumented code
(as indicated by __SANITIZE_ADDRESS__).
Link: https://lkml.kernel.org/r/20240517130118.759301-1-andrey.konovalov@linux.dev Fixes: 69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*() functions") Fixes: 51287dcb00cc ("kasan: emit different calls for instrumentable memintrinsics") Fixes: 36be5cba99f6 ("kasan: treat meminstrinsic as builtins in uninstrumented files") Signed-off-by: Andrey Konovalov <andreyknvl@gmail.com> Reported-by: Erhard Furtner <erhard_f@mailbox.org> Reported-by: Nico Pache <npache@redhat.com> Closes: https://lore.kernel.org/all/20240501144156.17e65021@outsider.home/ Reviewed-by: Marco Elver <elver@google.com> Tested-by: Nico Pache <npache@redhat.com> Acked-by: Nico Pache <npache@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Daniel Axtens <dja@axtens.net> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Hailong.Liu [Fri, 10 May 2024 10:01:31 +0000 (18:01 +0800)]
mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
commit a421ef303008 ("mm: allow !GFP_KERNEL allocations for kvmalloc")
includes support for __GFP_NOFAIL, but it presents a conflict with commit dd544141b9eb ("vmalloc: back off when the current task is OOM-killed"). A
possible scenario is as follows:
process-a
__vmalloc_node_range(GFP_KERNEL | __GFP_NOFAIL)
__vmalloc_area_node()
vm_area_alloc_pages()
--> oom-killer send SIGKILL to process-a
if (fatal_signal_pending(current)) break;
--> return NULL;
To fix this, do not check fatal_signal_pending() in vm_area_alloc_pages()
if __GFP_NOFAIL set.
This issue occurred during OPLUS KASAN TEST. Below is part of the log
-> oom-killer sends signal to process
[65731.222840] [ T1308] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/apps/uid_10198,task=gs.intelligence,pid=32454,uid=10198
Linus Torvalds [Fri, 24 May 2024 17:46:35 +0000 (10:46 -0700)]
Merge tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
- The compression format used for boot images is now configurable at
build time, and these formats are shown in `make help`
- access_ok() has been optimized
- A pair of performance bugs have been fixed in the uaccess handlers
- Various fixes and cleanups, including one for the IMSIC build failure
and one for the early-boot ftrace illegal NOPs bug
* tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix early ftrace nop patching
irqchip: riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
riscv: selftests: Add signal handling vector tests
riscv: mm: accelerate pagefault when badaccess
riscv: uaccess: Relax the threshold for fast path
riscv: uaccess: Allow the last potential unrolled copy
riscv: typo in comment for get_f64_reg
Use bool value in set_cpu_online()
riscv: selftests: Add hwprobe binaries to .gitignore
riscv: stacktrace: fixed walk_stackframe()
ftrace: riscv: move from REGS to ARGS
riscv: do not select MODULE_SECTIONS by default
riscv: show help string for riscv-specific targets
riscv: make image compression configurable
riscv: cpufeature: Fix extension subset checking
riscv: cpufeature: Fix thead vector hwcap removal
riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context
riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled
riscv: Define TASK_SIZE_MAX for __access_ok()
riscv: Remove PGDIR_SIZE_L3 and TASK_SIZE_MIN
Linus Torvalds [Fri, 24 May 2024 17:24:49 +0000 (10:24 -0700)]
Merge tag 'for-linus-6.10a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- a small cleanup in the drivers/xen/xenbus Makefile
- a fix of the Xen xenstore driver to improve connecting to a late
started Xenstore
- an enhancement for better support of ballooning in PVH guests
- a cleanup using try_cmpxchg() instead of open coding it
* tag 'for-linus-6.10a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
drivers/xen: Improve the late XenStore init protocol
xen/xenbus: Use *-y instead of *-objs in Makefile
xen/x86: add extra pages to unpopulated-alloc if available
locking/x86/xen: Use try_cmpxchg() in xen_alloc_p2m_entry()
Linus Torvalds [Fri, 24 May 2024 16:40:31 +0000 (09:40 -0700)]
Merge tag 'for-6.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull more btrfs updates from David Sterba:
"A few more updates, mostly stability fixes or user visible changes:
- fix race in zoned mode during device replace that can lead to
use-after-free
- update return codes and lower message levels for quota rescan where
it's causing false alerts
- fix unexpected qgroup id reuse under some conditions
- fix condition when looking up extent refs
- add option norecovery (removed in 6.8), the intended replacements
haven't been used and some aplications still rely on the old one
- build warning fixes"
* tag 'for-6.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: re-introduce 'norecovery' mount option
btrfs: fix end of tree detection when searching for data extent ref
btrfs: scrub: initialize ret in scrub_simple_mirror() to fix compilation warning
btrfs: zoned: fix use-after-free due to race with dev replace
btrfs: qgroup: fix qgroup id collision across mounts
btrfs: qgroup: update rescan message levels and error codes
Linus Torvalds [Fri, 24 May 2024 16:31:50 +0000 (09:31 -0700)]
Merge tag 'erofs-for-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull more erofs updates from Gao Xiang:
"The main ones are metadata API conversion to byte offsets by Al Viro.
Another patch gets rid of unnecessary memory allocation out of DEFLATE
decompressor. The remaining one is a trivial cleanup.
- Convert metadata APIs to byte offsets
- Avoid allocating DEFLATE streams unnecessarily
- Some erofs_show_options() cleanup"
* tag 'erofs-for-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: avoid allocating DEFLATE streams before mounting
z_erofs_pcluster_begin(): don't bother with rounding position down
erofs: don't round offset down for erofs_read_metabuf()
erofs: don't align offset for erofs_read_metabuf() (simple cases)
erofs: mechanically convert erofs_read_metabuf() to offsets
erofs: clean up erofs_show_options()