The Tegra PMC driver does ungodly things with the interrupt hierarchy,
repeatedly corrupting it by pulling hwirq numbers out of thin air,
overriding existing IRQ mappings and changing the handling flow
of unsuspecting users.
All of this is done in the name of preserving the interrupt hierarchy
even when these levels do not exist in the HW. Together with the use
of proper IRQs for IPIs, this leads to an unbootable system as the
rescheduling IPI gets repeatedly repurposed for random drivers...
Instead, let's simply mark the level from which the hierarchy does
not make sense for the HW, and let the core code trim the usused
levels from the hierarchy.
Marc Zyngier [Tue, 6 Oct 2020 09:10:20 +0000 (10:10 +0100)]
genirq/irqdomain: Allow partial trimming of irq_data hierarchy
It appears that some HW is ugly enough that not all the interrupts
connected to a particular interrupt controller end up with the same
hierarchy depth (some of them are terminated early). This leaves
the irqchip hacker with only two choices, both equally bad:
- create discrete domain chains, one for each "hierarchy depth",
which is very hard to maintain
- create fake hierarchy levels for the shallow paths, leading
to all kind of problems (what are the safe hwirq values for these
fake levels?)
Implement the ability to cut short a single interrupt hierarchy
from a level marked as being disconnected by using the new
irq_domain_disconnect_hierarchy() helper.
The irqdomain allocation code will then perform the trimming
Maulik Shah [Mon, 28 Sep 2020 04:32:04 +0000 (10:02 +0530)]
irqchip/qcom-pdc: Reset PDC interrupts during init
Kexec can directly boot into a new kernel without going to complete
reboot. This can leave the previous kernel's configuration for PDC
interrupts as is.
Clear previous kernel's configuration during init by setting interrupts
in enable bank to zero. The IRQs specified in qcom,pdc-ranges property
are the only ones that can be used by the new kernel so clear only those
IRQs. The remaining ones may be in use by a different kernel and should
not be set by new kernel.
Suggested-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Maulik Shah <mkshah@codeaurora.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/1601267524-20199-7-git-send-email-mkshah@codeaurora.org
Maulik Shah [Mon, 28 Sep 2020 04:32:03 +0000 (10:02 +0530)]
irqchip/qcom-pdc: Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag
Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag to enable/unmask the
wakeirqs during suspend entry.
Signed-off-by: Maulik Shah <mkshah@codeaurora.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/1601267524-20199-6-git-send-email-mkshah@codeaurora.org
Maulik Shah [Mon, 28 Sep 2020 04:32:02 +0000 (10:02 +0530)]
pinctrl: qcom: Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag
Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag to enable/unmask the
wakeirqs during suspend entry.
Signed-off-by: Maulik Shah <mkshah@codeaurora.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/1601267524-20199-5-git-send-email-mkshah@codeaurora.org
Maulik Shah [Mon, 28 Sep 2020 04:32:01 +0000 (10:02 +0530)]
genirq/PM: Introduce IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag
An interrupt that is disabled/masked but set for wakeup may still need to
be able to wake up the system from sleep states like "suspend to RAM".
To that effect, introduce the IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag.
If the irqchip have this flag set, the irq PM code will enable/unmask
the irqs that are marked for wakeup, but that are in a disabled state.
On resume, such irqs will be restored back to their disabled state.
Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Maulik Shah <mkshah@codeaurora.org>
[maz: commit message fix-up] Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/1601267524-20199-4-git-send-email-mkshah@codeaurora.org
Maulik Shah [Mon, 28 Sep 2020 04:32:00 +0000 (10:02 +0530)]
pinctrl: qcom: Use return value from irq_set_wake() call
msmgpio irqchip was not using return value of irq_set_irq_wake() callback
since previously GIC-v3 irqchip neither had IRQCHIP_SKIP_SET_WAKE flag nor
it implemented .irq_set_wake callback. This lead to irq_set_irq_wake()
return error -ENXIO.
However from 'commit 4110b5cbb014 ("irqchip/gic-v3: Allow interrupt to be
configured as wake-up sources")' GIC irqchip has IRQCHIP_SKIP_SET_WAKE
flag.
Use return value from irq_set_irq_wake() and irq_chip_set_wake_parent()
instead of always returning success.
Fixes: e35a6ae0eb3a ("pinctrl/msm: Setup GPIO chip in hierarchy") Signed-off-by: Maulik Shah <mkshah@codeaurora.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/1601267524-20199-3-git-send-email-mkshah@codeaurora.org
Maulik Shah [Mon, 28 Sep 2020 04:31:59 +0000 (10:01 +0530)]
pinctrl: qcom: Set IRQCHIP_SET_TYPE_MASKED and IRQCHIP_MASK_ON_SUSPEND flags
Both IRQCHIP_SET_TYPE_MASKED and IRQCHIP_MASK_ON_SUSPEND flags are already
set for msmgpio's parent PDC irqchip but GPIO interrupts do not get masked
during suspend or during setting irq type since genirq checks irqchip flag
of msmgpio irqchip which forwards these calls to its parent PDC irqchip.
Add irqchip specific flags for msmgpio irqchip to mask non wakeirqs during
suspend and mask before setting irq type. Masking before changing type make
sures any spurious interrupt is not detected during this operation.
Fixes: e35a6ae0eb3a ("pinctrl/msm: Setup GPIO chip in hierarchy") Signed-off-by: Maulik Shah <mkshah@codeaurora.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/1601267524-20199-2-git-send-email-mkshah@codeaurora.org
This interrupt controller is found in the Actions Semi Owl SoCs (S500,
S700 and S900) and provides support for handling up to 3 external
interrupt lines.
Each line can be independently configured as interrupt and triggers on
either of the edges or either of the levels. Additionally, each line
can also be masked individually.
Actions Semi Owl SoCs SIRQ interrupt controller is found in S500, S700
and S900 SoCs and provides support for handling up to 3 external
interrupt lines.
Zhen Lei [Thu, 24 Sep 2020 07:17:49 +0000 (15:17 +0800)]
genirq: Add stub for set_handle_irq() when !GENERIC_IRQ_MULTI_HANDLER
In order to avoid compilation errors when a driver references set_handle_irq(),
but that the architecture doesn't select GENERIC_IRQ_MULTI_HANDLER,
add a stub function that will just WARN_ON_ONCE() if ever used.
Marc Zyngier [Tue, 15 Sep 2020 13:03:51 +0000 (14:03 +0100)]
irqchip/gic: Cleanup Franken-GIC handling
Introduce a static key identifying Samsung's unique creation, allowing
to replace the indirect call to compute the base addresses with
a simple test on the static key.
Faster, cheaper, negative diffstat.
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Mon, 14 Sep 2020 16:21:16 +0000 (17:21 +0100)]
irqchip/bcm2836: Provide mask/unmask dummy methods for IPIs
Although it doesn't seem possible to disable individual mailbox
interrupts, we still need to provide some callbacks.
Fixes: 09eb672ce4fb ("irqchip/bcm2836: Configure mailbox interrupts as standard interrupts") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Mon, 22 Jun 2020 20:23:36 +0000 (21:23 +0100)]
irqchip/armada-370-xp: Configure IPIs as standard interrupts
To introduce IPIs as standard interrupts to the Armada 370-XP
driver, let's allocate a completely separate irqdomain and
irqchip combo that lives parallel to the "standard" one.
This effectively should be modelled as a chained interrupt
controller, but the code is in such a state that it is
pretty hard to shoehorn, as it would require the rewrite
of the MSI layer as well.
Marc Zyngier [Sat, 20 Jun 2020 19:02:18 +0000 (20:02 +0100)]
irqchip/hip04: Configure IPIs as standard interrupts
In order to switch the hip04 driver to provide standard interrupts
for IPIs, rework the way interrupts are allocated, making sure
the irqdomain covers the SGIs as well as the rest of the interrupt
range.
The driver is otherwise so old-school that it creates all interrupts
upfront (duh!), so there is hardly anything else to change, apart
from communicating the IPIs to the arch code.
Marc Zyngier [Sat, 25 Apr 2020 14:24:01 +0000 (15:24 +0100)]
irqchip/gic: Configure SGIs as standard interrupts
Change the way we deal with GIC SGIs by turning them into proper
IRQs, and calling into the arch code to register the interrupt range
instead of a callback.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Sat, 25 Apr 2020 14:24:01 +0000 (15:24 +0100)]
irqchip/gic-v3: Configure SGIs as standard interrupts
Change the way we deal with GICv3 SGIs by turning them into proper
IRQs, and calling into the arch code to register the interrupt range
instead of a callback.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
Suman Anna [Wed, 16 Sep 2020 16:36:38 +0000 (18:36 +0200)]
irqchip/irq-pruss-intc: Add support for ICSSG INTC on K3 SoCs
The K3 AM65x and J721E SoCs have the next generation of the PRU-ICSS IP,
commonly called ICSSG. The PRUSS INTC present within the ICSSG supports
more System Events (160 vs 64), more Interrupt Channels and Host Interrupts
(20 vs 10) compared to the previous generation PRUSS INTC instances. The
first 2 and the last 10 of these host interrupt lines are used by the
PRU and other auxiliary cores and sub-modules within the ICSSG, with 8
host interrupts connected to MPU. The host interrupts 5, 6, 7 are also
connected to the other ICSSG instances within the SoC and can be
partitioned as per system integration through the board dts files.
Enhance the PRUSS INTC driver to add support for this ICSSG INTC
instance.
Co-developed-by: Grzegorz Jaszczyk <grzegorz.jaszczyk@linaro.org> Signed-off-by: Suman Anna <s-anna@ti.com> Signed-off-by: Grzegorz Jaszczyk <grzegorz.jaszczyk@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
This implements the irq_get_irqchip_state and irq_set_irqchip_state
callbacks for the TI PRUSS INTC driver. The set callback can be used
by drivers to "kick" a PRU by injecting a PRU system event.
Co-developed-by: Suman Anna <s-anna@ti.com> Co-developed-by: Grzegorz Jaszczyk <grzegorz.jaszczyk@linaro.org> Signed-off-by: Suman Anna <s-anna@ti.com> Signed-off-by: David Lechner <david@lechnology.com> Signed-off-by: Grzegorz Jaszczyk <grzegorz.jaszczyk@linaro.org> Reviewed-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
Suman Anna [Wed, 16 Sep 2020 16:36:36 +0000 (18:36 +0200)]
irqchip/irq-pruss-intc: Add logic for handling reserved interrupts
The PRUSS INTC has a fixed number of output interrupt lines that are
connected to a number of processors or other PRUSS instances or other
devices (like DMA) on the SoC. The output interrupt lines 2 through 9
are usually connected to the main Arm host processor and are referred
to as host interrupts 0 through 7 from ARM/MPU perspective.
All of these 8 host interrupts are not always exclusively connected
to the Arm interrupt controller. Some SoCs have some interrupt lines
not connected to the Arm interrupt controller at all, while a few others
have the interrupt lines connected to multiple processors in which they
need to be partitioned as per SoC integration needs. For example, AM437x
and 66AK2G SoCs have 2 PRUSS instances each and have the host interrupt 5
connected to the other PRUSS, while AM335x has host interrupt 0 shared
between MPU and TSC_ADC and host interrupts 6 & 7 shared between MPU and
a DMA controller.
Add logic to the PRUSS INTC driver to ignore both these shared and
invalid interrupts.
Co-developed-by: Grzegorz Jaszczyk <grzegorz.jaszczyk@linaro.org> Signed-off-by: Suman Anna <s-anna@ti.com> Signed-off-by: Grzegorz Jaszczyk <grzegorz.jaszczyk@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
Grzegorz Jaszczyk [Wed, 16 Sep 2020 16:36:03 +0000 (18:36 +0200)]
irqchip/irq-pruss-intc: Add a PRUSS irqchip driver for PRUSS interrupts
The Programmable Real-Time Unit Subsystem (PRUSS) contains a local
interrupt controller (INTC) that can handle various system input events
and post interrupts back to the device-level initiators. The INTC can
support upto 64 input events with individual control configuration and
hardware prioritization. These events are mapped onto 10 output interrupt
lines through two levels of many-to-one mapping support. Different
interrupt lines are routed to the individual PRU cores or to the host
CPU, or to other devices on the SoC. Some of these events are sourced
from peripherals or other sub-modules within that PRUSS, while a few
others are sourced from SoC-level peripherals/devices.
The PRUSS INTC platform driver manages this PRUSS interrupt controller
and implements an irqchip driver to provide a Linux standard way for
the PRU client users to enable/disable/ack/re-trigger a PRUSS system
event. The system events to interrupt channels and output interrupts
relies on the mapping configuration provided either through the PRU
firmware blob (for interrupts routed to PRU cores) or via the PRU
application's device tree node (for interrupt routed to the main CPU).
In the first case the mappings will be programmed on PRU remoteproc
driver demand (via irq_create_fwspec_mapping) during the boot of a PRU
core and cleaned up after the PRU core is stopped.
Reference counting is used to allow multiple system events to share a
single channel and to allow multiple channels to share a single host
event.
The PRUSS INTC module is reference counted during the interrupt
setup phase through the irqchip's irq_request_resources() and
irq_release_resources() ops. This restricts the module from being
removed as long as there are active interrupt users.
The driver currently supports and can be built for OMAP architecture
based AM335x, AM437x and AM57xx SoCs; Keystone2 architecture based
66AK2G SoCs and Davinci architecture based OMAP-L13x/AM18x/DA850 SoCs.
All of these SoCs support 64 system events, 10 interrupt channels and
10 output interrupt lines per PRUSS INTC with a few SoC integration
differences.
NOTE:
Each PRU-ICSS's INTC on AM57xx SoCs is preceded by a Crossbar that
enables multiple external events to be routed to a specific number
of input interrupt events. Any non-default external interrupt event
directed towards PRUSS needs this crossbar to be setup properly.
Co-developed-by: Suman Anna <s-anna@ti.com> Co-developed-by: Andrew F. Davis <afd@ti.com> Co-developed-by: Roger Quadros <rogerq@ti.com> Co-developed-by: David Lechner <david@lechnology.com> Signed-off-by: Suman Anna <s-anna@ti.com> Signed-off-by: Andrew F. Davis <afd@ti.com> Signed-off-by: Roger Quadros <rogerq@ti.com> Signed-off-by: David Lechner <david@lechnology.com> Signed-off-by: Grzegorz Jaszczyk <grzegorz.jaszczyk@linaro.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
The Programmable Real-Time Unit and Industrial Communication Subsystem
(PRU-ICSS or simply PRUSS) contains an interrupt controller (INTC) that
can handle various system input events and post interrupts back to the
device-level initiators. The INTC can support up to 64 input events on
most SoCs with individual control configuration and h/w prioritization.
These events are mapped onto 10 output interrupt lines through two levels
of many-to-one mapping support. Different interrupt lines are routed to
the individual PRU cores or to the host CPU or to other PRUSS instances.
The K3 AM65x and J721E SoCs have the next generation of the PRU-ICSS IP,
commonly called ICSSG. The ICSSG interrupt controller on K3 SoCs provide
a higher number of host interrupts (20 vs 10) and can handle an increased
number of input events (160 vs 64) from various SoC interrupt sources.
Add the bindings document for these interrupt controllers on all the
applicable SoCs. It covers the OMAP architecture SoCs - AM33xx, AM437x
and AM57xx; the Keystone 2 architecture based 66AK2G SoC; the Davinci
architecture based OMAPL138 SoCs, and the K3 architecture based AM65x
and J721E SoCs.
Co-developed-by: Andrew F. Davis <afd@ti.com> Co-developed-by: Roger Quadros <rogerq@ti.com> Co-developed-by: Grzegorz Jaszczyk <grzegorz.jaszczyk@linaro.org> Signed-off-by: Andrew F. Davis <afd@ti.com> Signed-off-by: Roger Quadros <rogerq@ti.com> Signed-off-by: Suman Anna <s-anna@ti.com> Signed-off-by: Grzegorz Jaszczyk <grzegorz.jaszczyk@linaro.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
Alexandru Elisei [Sat, 12 Sep 2020 15:37:07 +0000 (16:37 +0100)]
irqchip/gic-v3: Support pseudo-NMIs when SCR_EL3.FIQ == 0
The GIC's internal view of the priority mask register and the assigned
interrupt priorities are based on whether GIC security is enabled and
whether firmware routes Group 0 interrupts to EL3. At the moment, we
support priority masking when ICC_PMR_EL1 and interrupt priorities are
either both modified by the GIC, or both left unchanged.
Trusted Firmware-A's default interrupt routing model allows Group 0
interrupts to be delivered to the non-secure world (SCR_EL3.FIQ == 0).
Unfortunately, this is precisely the case that the GIC driver doesn't
support: ICC_PMR_EL1 remains unchanged, but the GIC's view of interrupt
priorities is different from the software programmed values.
Support pseudo-NMIs when SCR_EL3.FIQ == 0 by using a different value to
mask regular interrupts. All the other values remain the same.
Alexandru Elisei [Sat, 12 Sep 2020 15:37:06 +0000 (16:37 +0100)]
irqchip/gic-v3: Spell out when pseudo-NMIs are enabled
When NMIs cannot be enabled, the driver prints a message stating that
unambiguously. When they are enabled, the only feedback we get is a message
regarding the use of synchronization for ICC_PMR_EL1 writes, which is not
as useful for a user who is not intimately familiar with how NMIs are
implemented.
Let's make it obvious that pseudo-NMIs are enabled. Keep the message about
using a barrier for ICC_PMR_EL1 writes, because it has a non-negligible
impact on performance.
Marc Zyngier [Tue, 23 Jun 2020 19:38:41 +0000 (20:38 +0100)]
ARM: Allow IPIs to be handled as normal interrupts
In order to deal with IPIs as normal interrupts, let's add
a new way to register them with the architecture code.
set_smp_ipi_range() takes a range of interrupts, and allows
the arch code to request them as if the were normal interrupts.
A standard handler is then called by the core IRQ code to deal
with the IPI.
This means that we don't need to call irq_enter/irq_exit, and
that we don't need to deal with set_irq_regs either. So let's
move the dispatcher into its own function, and leave handle_IPI()
as a compatibility function.
On the sending side, let's make use of ipi_send_mask, which
already exists for this purpose.
One of the major difference is that we end up, in some cases
(such as when performing IRQ time accounting on the scheduler
IPI), end up with nested irq_enter()/irq_exit() pairs.
Other than the (relatively small) overhead, there should be
no consequences to it (these pairs are designed to nest
correctly, and the accounting shouldn't be off).
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Sat, 25 Apr 2020 14:03:47 +0000 (15:03 +0100)]
arm64: Allow IPIs to be handled as normal interrupts
In order to deal with IPIs as normal interrupts, let's add
a new way to register them with the architecture code.
set_smp_ipi_range() takes a range of interrupts, and allows
the arch code to request them as if the were normal interrupts.
A standard handler is then called by the core IRQ code to deal
with the IPI.
This means that we don't need to call irq_enter/irq_exit, and
that we don't need to deal with set_irq_regs either. So let's
move the dispatcher into its own function, and leave handle_IPI()
as a compatibility function.
On the sending side, let's make use of ipi_send_mask, which
already exists for this purpose.
One of the major difference is that we end up, in some cases
(such as when performing IRQ time accounting on the scheduler
IPI), end up with nested irq_enter()/irq_exit() pairs.
Other than the (relatively small) overhead, there should be
no consequences to it (these pairs are designed to nest
correctly, and the accounting shouldn't be off).
Marc Zyngier [Tue, 19 May 2020 13:58:13 +0000 (14:58 +0100)]
genirq: Allow interrupts to be excluded from /proc/interrupts
A number of architectures implement IPI statistics directly,
duplicating the core kstat_irqs accounting. As we move IPIs to
being actual IRQs, we would end-up with a confusing display
in /proc/interrupts (where the IPIs would appear twice).
In order to solve this, allow interrupts to be flagged as
"hidden", which excludes them from /proc/interrupts.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Tue, 19 May 2020 09:41:00 +0000 (10:41 +0100)]
genirq: Add fasteoi IPI flow
For irqchips using the fasteoi flow, IPIs are a bit special.
They need to be EOI'd early (before calling the handler), as
funny things may happen in the handler (they do not necessarily
behave like a normal interrupt).
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
YueHaibing [Wed, 26 Aug 2020 03:53:21 +0000 (11:53 +0800)]
irqchip/ti-sci-intr: Fix unsigned comparison to zero
ti_sci_intr_xlate_irq() return -ENOENT on fail, p_hwirq
should be int type.
Fixes: a5b659bd4bc7 ("irqchip/ti-sci-intr: Add support for INTR being a parent to INTR") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Lokesh Vutla <lokeshvutla@ti.com> Link: https://lore.kernel.org/r/20200826035321.18620-1-yuehaibing@huawei.com
Merge tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block
Pull more io_uring fixes from Jens Axboe:
"Two followup fixes. One is fixing a regression from this merge window,
the other is two commits fixing cancelation of deferred requests.
Both have gone through full testing, and both spawned a few new
regression test additions to liburing.
- Don't play games with const, properly store the output iovec and
assign it as needed.
- Deferred request cancelation fix (Pavel)"
* tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block:
io_uring: fix linked deferred ->files cancellation
io_uring: fix cancel of deferred reqs with ->files
io_uring: fix explicit async read/write mapping for large segments
Merge tag 'iommu-fixes-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- three Intel VT-d fixes to fix address handling on 32bit, fix a NULL
pointer dereference bug and serialize a hardware register access as
required by the VT-d spec.
- two patches for AMD IOMMU to force AMD GPUs into translation mode
when memory encryption is active and disallow using IOMMUv2
functionality. This makes the AMDGPU driver work when memory
encryption is active.
- two more fixes for AMD IOMMU to fix updating the Interrupt Remapping
Table Entries.
- MAINTAINERS file update for the Qualcom IOMMU driver.
* tag 'iommu-fixes-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Handle 36bit addressing for x86-32
iommu/amd: Do not use IOMMUv2 functionality when SME is active
iommu/amd: Do not force direct mapping when SME is active
iommu/amd: Use cmpxchg_double() when updating 128-bit IRTE
iommu/amd: Restore IRTE.RemapEn bit after programming IRTE
iommu/vt-d: Fix NULL pointer dereference in dev_iommu_priv_set()
iommu/vt-d: Serialize IOMMU GCMD register modifications
MAINTAINERS: Update QUALCOMM IOMMU after Arm SMMU drivers move
Merge tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- more generic entry code ABI fallout
- debug register handling bugfixes
- fix vmalloc mappings on 32-bit kernels
- kprobes instrumentation output fix on 32-bit kernels
- fix over-eager WARN_ON_ONCE() on !SMAP hardware
- NUMA debugging fix
- fix Clang related crash on !RETPOLINE kernels
* tag 'x86-urgent-2020-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry: Unbreak 32bit fast syscall
x86/debug: Allow a single level of #DB recursion
x86/entry: Fix AC assertion
tracing/kprobes, x86/ptrace: Fix regs argument order for i386
x86, fakenuma: Fix invalid starting node ID
x86/mm/32: Bring back vmalloc faulting on x86_32
x86/cmdline: Disable jump tables for cmdline.c
The GIC irqchips can now use a HW resend when a retrigger is invoked by
check_irq_resend(). However, should the HW resend fail, check_irq_resend()
will still attempt to trigger a SW resend, which is still a bad idea for
the GICs.
Prevent this from happening by setting IRQD_HANDLE_ENFORCE_IRQCTX on all
GIC IRQs. Technically per-cpu IRQs do not need this, as their flow handlers
never set IRQS_PENDING, but this aligns all IRQs wrt context enforcement:
this also forces all GIC IRQ handling to happen in IRQ context (as defined
by in_irq()).
While digging around IRQCHIP_EOI_IF_HANDLED and irq/resend.c, it has come
to my attention that the IRQ resend situation seems a bit precarious for
the GIC(s).
When marking an IRQ with IRQS_PENDING, handle_fasteoi_irq() will bail out
and issue an irq_eoi(). Should the IRQ in question be re-enabled,
check_irq_resend() will trigger a SW resend, which will go through the flow
handler again and issue *another* irq_eoi() on the *same* IRQ
activation. This is something the GIC spec clearly describes as a bad idea:
any EOI must match a previous ACK.
Implement irq_chip.irq_retrigger() for the GIC chips by setting the GIC
pending bit of the relevant IRQ. After being called by check_irq_resend(),
this will eventually trigger a *new* interrupt which we will handle as usual.
Marc Zyngier [Wed, 26 Aug 2020 17:37:50 +0000 (18:37 +0100)]
genirq: Walk the irq_data hierarchy when resending an interrupt
On resending an interrupt, we only check the outermost irqchip for
a irq_retrigger callback. However, this callback could be implemented
at an inner level. Use irq_chip_retrigger_hierarchy() in this case.
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
Merge tags 'auxdisplay-for-linus-v5.9-rc4', 'clang-format-for-linus-v5.9-rc4' and 'compiler-attributes-for-linus-v5.9-rc4' of git://github.com/ojeda/linux
Pull misc fixes from Miguel Ojeda:
"A trivial patch for auxdisplay:
- Replace HTTP links with HTTPS ones (Alexander A. Klimov)
The usual clang-format trivial update:
- Update with the latest for_each macro list (Miguel Ojeda)
And Luc requested me to pick a sparse fix on my queue, so here it goes
along with other two trivial Compiler Attributes ones (also from Luc).
- sparse: use static inline for __chk_{user,io}_ptr() (Luc Van
Oostenryck)
- Compiler Attributes: remove comment about sparse not supporting
__has_attribute (Luc Van Oostenryck)"
* tag 'auxdisplay-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
auxdisplay: Replace HTTP links with HTTPS ones
* tag 'clang-format-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
clang-format: Update with the latest for_each macro list
* tag 'compiler-attributes-for-linus-v5.9-rc4' of git://github.com/ojeda/linux:
sparse: use static inline for __chk_{user,io}_ptr()
Compiler Attributes: fix comment concerning GCC 4.6
Compiler Attributes: remove comment about sparse not supporting __has_attribute
Merge tag 'arc-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- HSDK-4xd Dev system: perf driver updates for sampling interrupt
- HSDK* Dev System: Ethernet broken [Evgeniy Didin]
- HIGHMEM broken (2 memory banks) [Mike Rapoport]
- show_regs() rewrite once and for all
- Other minor fixes
* tag 'arc-5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: [plat-hsdk]: Switch ethernet phy-mode to rgmii-id
arc: fix memory initialization for systems with two memory banks
irqchip/eznps: Fix build error for !ARC700 builds
ARC: show_regs: fix r12 printing and simplify
ARC: HSDK: wireup perf irq
ARC: perf: don't bail setup if pct irq missing in device-tree
ARC: pgalloc.h: delete a duplicated word + other fixes
Subsystems affected by this patch series: MAINTAINERS, ipc, fork,
checkpatch, lib, and mm (memcg, slub, pagemap, madvise, migration,
hugetlb)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
include/linux/log2.h: add missing () around n in roundup_pow_of_two()
mm/khugepaged.c: fix khugepaged's request size in collapse_file
mm/hugetlb: fix a race between hugetlb sysctl handlers
mm/hugetlb: try preferred node first when alloc gigantic page from cma
mm/migrate: preserve soft dirty in remove_migration_pte()
mm/migrate: remove unnecessary is_zone_device_page() check
mm/rmap: fixup copying of soft dirty and uffd ptes
mm/migrate: fixup setting UFFD_WP flag
mm: madvise: fix vma user-after-free
checkpatch: fix the usage of capture group ( ... )
fork: adjust sysctl_max_threads definition to match prototype
ipc: adjust proc_ipc_sem_dointvec definition to match prototype
mm: track page table modifications in __apply_to_page_range()
MAINTAINERS: IA64: mark Status as Odd Fixes only
MAINTAINERS: add LLVM maintainers
MAINTAINERS: update Cavium/Marvell entries
mm: slub: fix conversion of freelist_corrupted()
mm: memcg: fix memcg reclaim soft lockup
memcg: fix use-after-free in uncharge_batch
Jason Gunthorpe [Fri, 4 Sep 2020 23:36:19 +0000 (16:36 -0700)]
include/linux/log2.h: add missing () around n in roundup_pow_of_two()
Otherwise gcc generates warnings if the expression is complicated.
Fixes: 312a0c170945 ("[PATCH] LOG2: Alter roundup_pow_of_two() so that it can use a ilog2() on a constant") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/0-v1-8a2697e3c003+41165-log_brackets_jgg@nvidia.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Fri, 4 Sep 2020 23:36:16 +0000 (16:36 -0700)]
mm/khugepaged.c: fix khugepaged's request size in collapse_file
collapse_file() in khugepaged passes PAGE_SIZE as the number of pages to
be read to page_cache_sync_readahead(). The intent was probably to read
a single page. Fix it to use the number of pages to the end of the
window instead.
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Yang Shi <shy828301@gmail.com> Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Eric Biggers <ebiggers@google.com> Link: https://lkml.kernel.org/r/20200903140844.14194-2-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Muchun Song [Fri, 4 Sep 2020 23:36:13 +0000 (16:36 -0700)]
mm/hugetlb: fix a race between hugetlb sysctl handlers
There is a race between the assignment of `table->data` and write value
to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
the other thread.
Fix this by duplicating the `table`, and only update the duplicate of
it. And introduce a helper of proc_hugetlb_doulongvec_minmax() to
simplify the code.
Li Xinhai [Fri, 4 Sep 2020 23:36:10 +0000 (16:36 -0700)]
mm/hugetlb: try preferred node first when alloc gigantic page from cma
Since commit cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic
hugepages using cma"), the gigantic page would be allocated from node
which is not the preferred node, although there are pages available from
that node. The reason is that the nid parameter has been ignored in
alloc_gigantic_page().
Besides, the __GFP_THISNODE also need be checked if user required to
alloc only from the preferred node.
After this patch, the preferred node is tried first before other allowed
nodes, and don't try to allocate from other nodes if __GFP_THISNODE is
specified. If user don't specify the preferred node, the current node
will be used as preferred node, which makes sure consistent behavior of
allocating gigantic and non-gigantic hugetlb page.
Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Link: https://lkml.kernel.org/r/20200902025016.697260-1-lixinhai.lxh@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ralph Campbell [Fri, 4 Sep 2020 23:36:07 +0000 (16:36 -0700)]
mm/migrate: preserve soft dirty in remove_migration_pte()
The code to remove a migration PTE and replace it with a device private
PTE was not copying the soft dirty bit from the migration entry. This
could lead to page contents not being marked dirty when faulting the page
back from device private memory.
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Bharata B Rao <bharata@linux.ibm.com> Link: https://lkml.kernel.org/r/20200831212222.22409-3-rcampbell@nvidia.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm/migrate: preserve soft dirty in remove_migration_pte()".
I happened to notice this from code inspection after seeing Alistair
Popple's patch ("mm/rmap: Fixup copying of soft dirty and uffd ptes").
This patch (of 2):
The check for is_zone_device_page() and is_device_private_page() is
unnecessary since the latter is sufficient to determine if the page is a
device private page. Simplify the code for easier reading.
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Bharata B Rao <bharata@linux.ibm.com> Link: https://lkml.kernel.org/r/20200831212222.22409-1-rcampbell@nvidia.com Link: https://lkml.kernel.org/r/20200831212222.22409-2-rcampbell@nvidia.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/rmap: fixup copying of soft dirty and uffd ptes
During memory migration a pte is temporarily replaced with a migration
swap pte. Some pte bits from the existing mapping such as the soft-dirty
and uffd write-protect bits are preserved by copying these to the
temporary migration swap pte.
However these bits are not stored at the same location for swap and
non-swap ptes. Therefore testing these bits requires using the
appropriate helper function for the given pte type.
Unfortunately several code locations were found where the wrong helper
function is being used to test soft_dirty and uffd_wp bits which leads to
them getting incorrectly set or cleared during page-migration.
Fix these by using the correct tests based on pte type.
Fixes: a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in migration") Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration") Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Alistair Popple <alistair@popple.id.au> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200825064232.10023-2-alistair@popple.id.au Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit f45ec5ff16a75 ("userfaultfd: wp: support swap and page migration")
introduced support for tracking the uffd wp bit during page migration.
However the non-swap PTE variant was used to set the flag for zone device
private pages which are a type of swap page.
This leads to corruption of the swap offset if the original PTE has the
uffd_wp flag set.
Fixes: f45ec5ff16a75 ("userfaultfd: wp: support swap and page migration") Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Link: https://lkml.kernel.org/r/20200825064232.10023-1-alistair@popple.id.au Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Fri, 4 Sep 2020 23:35:55 +0000 (16:35 -0700)]
mm: madvise: fix vma user-after-free
The syzbot reported the below use-after-free:
BUG: KASAN: use-after-free in madvise_willneed mm/madvise.c:293 [inline]
BUG: KASAN: use-after-free in madvise_vma mm/madvise.c:942 [inline]
BUG: KASAN: use-after-free in do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
Read of size 8 at addr ffff8880a6163eb0 by task syz-executor.0/9996
It is because vma is accessed after releasing mmap_lock, but someone
else acquired the mmap_lock and the vma is gone.
Releasing mmap_lock after accessing vma should fix the problem.
Fixes: 692fe62433d4c ("mm: Handle MADV_WILLNEED through vfs_fadvise()") Reported-by: syzbot+b90df26038d1d5d85c97@syzkaller.appspotmail.com Signed-off-by: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: <stable@vger.kernel.org> [5.4+] Link: https://lkml.kernel.org/r/20200816141204.162624-1-shy828301@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
checkpatch: fix the usage of capture group ( ... )
The usage of "capture group (...)" in the immediate condition after `&&`
results in `$1` being uninitialized. This issues a warning "Use of
uninitialized value $1 in regexp compilation at ./scripts/checkpatch.pl
line 2638".
I noticed this bug while running checkpatch on the set of commits from
v5.7 to v5.8-rc1 of the kernel on the commits with a diff content in
their commit message.
This bug was introduced in the script by commit e518e9a59ec3
("checkpatch: emit an error when there's a diff in a changelog"). It
has been in the script since then.
The author intended to store the match made by capture group in variable
`$1`. This should have contained the name of the file as `[\w/]+`
matched. However, this couldn't be accomplished due to usage of capture
group and `$1` in the same regular expression.
Fix this by placing the capture group in the condition before `&&`.
Thus, `$1` can be initialized to the text that capture group matches
thereby setting it to the desired and required value.
Fixes: e518e9a59ec3 ("checkpatch: emit an error when there's a diff in a changelog") Signed-off-by: Mrinal Pandey <mrinalmni@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Joe Perches <joe@perches.com> Link: https://lkml.kernel.org/r/20200714032352.f476hanaj2dlmiot@mrinalpandey Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fork: adjust sysctl_max_threads definition to match prototype
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
definition of sysctl_max_threads to match its prototype in
linux/sysctl.h which fixes the following sparse error/warning:
kernel/fork.c:3050:47: warning: incorrect type in argument 3 (different address spaces)
kernel/fork.c:3050:47: expected void *
kernel/fork.c:3050:47: got void [noderef] __user *buffer
kernel/fork.c:3036:5: error: symbol 'sysctl_max_threads' redeclared with different type (incompatible argument 3 (different address spaces)):
kernel/fork.c:3036:5: int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )
kernel/fork.c: note: in included file (through include/linux/key.h, include/linux/cred.h, include/linux/sched/signal.h, include/linux/sched/cputime.h):
include/linux/sysctl.h:242:5: note: previously declared as:
include/linux/sysctl.h:242:5: int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )
Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lkml.kernel.org/r/20200825093647.24263-1-tklauser@distanz.ch Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ipc: adjust proc_ipc_sem_dointvec definition to match prototype
Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
signature of proc_ipc_sem_dointvec to match ctl_table.proc_handler which
fixes the following sparse error/warning:
ipc/ipc_sysctl.c:94:47: warning: incorrect type in argument 3 (different address spaces)
ipc/ipc_sysctl.c:94:47: expected void *buffer
ipc/ipc_sysctl.c:94:47: got void [noderef] __user *buffer
ipc/ipc_sysctl.c:194:35: warning: incorrect type in initializer (incompatible argument 3 (different address spaces))
ipc/ipc_sysctl.c:194:35: expected int ( [usertype] *proc_handler )( ... )
ipc/ipc_sysctl.c:194:35: got int ( * )( ... )
Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Link: https://lkml.kernel.org/r/20200825105846.5193-1-tklauser@distanz.ch Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm: track page table modifications in __apply_to_page_range()
__apply_to_page_range() is also used to change and/or allocate
page-table pages in the vmalloc area of the address space. Make sure
these changes get synchronized to other page-tables in the system by
calling arch_sync_kernel_mappings() when necessary.
The impact appears limited to x86-32, where apply_to_page_range may miss
updating the PMD. That leads to explosions in drivers like
Nick Desaulniers [Fri, 4 Sep 2020 23:35:37 +0000 (16:35 -0700)]
MAINTAINERS: add LLVM maintainers
Nominate Nathan and myself to be point of contact for clang/LLVM related
support, after a poll at the LLVM BoF at Linux Plumbers Conf 2020.
While corporate sponsorship is beneficial, its important to not entrust
the keys to the nukes with any one entity. Should Nathan and I find
ourselves at the same employer, I would gladly step down.
Robert Richter [Fri, 4 Sep 2020 23:35:33 +0000 (16:35 -0700)]
MAINTAINERS: update Cavium/Marvell entries
I am leaving Marvell and already do not have access to my @marvell.com
email address. So switching over to my korg mail address or removing my
address there another maintainer is already listed. For the entries
there no other maintainer is listed I will keep looking into patches for
Cavium systems for a while until someone from Marvell takes it over.
Since I might have limited access to hardware and also limited time I
changed state to 'Odd Fixes' for those entries.
Signed-off-by: Robert Richter <rric@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Ganapatrao Kulkarni <gkulkarni@marvell.com> Cc: Sunil Goutham <sgoutham@marvell.com> CC: Borislav Petkov <bp@alien8.de> Cc: Marc Zyngier <maz@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Wolfram Sang <wsa@kernel.org>, Link: https://lkml.kernel.org/r/20200824122050.31164-1-rric@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 52f23478081ae0 ("mm/slub.c: fix corrupted freechain in
deactivate_slab()") suffered an update when picked up from LKML [1].
Specifically, relocating 'freelist = NULL' into 'freelist_corrupted()'
created a no-op statement. Fix it by sticking to the behavior intended
in the original patch [1]. In addition, make freelist_corrupted()
immune to passing NULL instead of &freelist.
The issue has been spotted via static analysis and code review.
It only happens on our 1-vcpu instances, because there's no chance for
oom reaper to run to reclaim the to-be-killed process.
Add a cond_resched() at the upper shrink_node_memcgs() to solve this
issue, this will mean that we will get a scheduling point for each memcg
in the reclaimed hierarchy without any dependency on the reclaimable
memory in that memcg thus making it more predictable.
Suggested-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: http://lkml.kernel.org/r/1598495549-67324-1-git-send-email-xlpang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 1a3e1f40962c ("mm: memcontrol: decouple reference counting from
page accounting") reworked the memcg lifetime to be bound the the struct
page rather than charges. It also removed the css_put_many from
uncharge_batch and that is causing the above splat.
uncharge_batch() is supposed to uncharge accumulated charges for all
pages freed from the same memcg. The queuing is done by uncharge_page
which however drops the memcg reference after it adds charges to the
batch. If the current page happens to be the last one holding the
reference for its memcg then the memcg is OK to go and the next page to
be freed will trigger batched uncharge which needs to access the memcg
which is gone already.
Fix the issue by taking a reference for the memcg in the current batch.
Fixes: 1a3e1f40962c ("mm: memcontrol: decouple reference counting from page accounting") Reported-by: syzbot+b305848212deec86eabe@syzkaller.appspotmail.com Reported-by: syzbot+b5ea6fb6f139c8b9482b@syzkaller.appspotmail.com Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Roman Gushchin <guro@fb.com> Cc: Hugh Dickins <hughd@google.com> Link: https://lkml.kernel.org/r/20200820090341.GC5033@dhcp22.suse.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'xfs-5.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fix from Darrick Wong:
"Fix a broken metadata verifier that would incorrectly validate attr
fork extents of a realtime file against the realtime volume"
* tag 'xfs-5.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix xfs_bmap_validate_extent_raw when checking attr fork of rt files
When running in a dax mode, if the user maps a page with MAP_PRIVATE and
PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime
when the user hits a COW fault.
This breaks building of the Linux kernel. How to reproduce:
1. extract the Linux kernel tree on dax-mounted xfs filesystem
2. run make clean
3. run make -j12
4. run make -j12
at step 4, make would incorrectly rebuild the whole kernel (although it
was already built in step 3).
The reason for the breakage is that almost all object files depend on
objtool. When we run objtool, it takes COW page fault on its .data
section, and these faults will incorrectly update the timestamp of the
objtool binary. The updated timestamp causes make to rebuild the whole
tree.
When running in a dax mode, if the user maps a page with MAP_PRIVATE and
PROT_WRITE, the ext2 filesystem would incorrectly update ctime and mtime
when the user hits a COW fault.
This breaks building of the Linux kernel. How to reproduce:
1. extract the Linux kernel tree on dax-mounted ext2 filesystem
2. run make clean
3. run make -j12
4. run make -j12
at step 4, make would incorrectly rebuild the whole kernel (although it
was already built in step 3).
The reason for the breakage is that almost all object files depend on
objtool. When we run objtool, it takes COW page fault on its .data
section, and these faults will incorrectly update the timestamp of the
objtool binary. The updated timestamp causes make to rebuild the whole
tree.
io_uring: fix explicit async read/write mapping for large segments
If we exceed UIO_FASTIOV, we don't handle the transition correctly
between an allocated vec for requests that are queued with IOSQE_ASYNC.
Store the iovec appropriately and re-set it in the iter iov in case
it changed.
Fixes: ff6165b2d7f6 ("io_uring: retain iov_iter state over io_read/io_write calls") Reported-by: Nick Hill <nick@nickhill.org> Tested-by: Norman Maurer <norman.maurer@googlemail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge tag 's390-5.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix GENERIC_LOCKBREAK dependency on PREEMPTION in Kconfig broken
because of a typo
- Update defconfigs
* tag 's390-5.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: update defconfigs
s390: fix GENERIC_LOCKBREAK dependency typo in Kconfig
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Fix the loading of modules built with binutils-2.35. This version
produces writable and executable .text.ftrace_trampoline section
which is rejected by the kernel.
- Remove the exporting of cpu_logical_map() as the Tegra driver has now
been fixed and no longer uses this function.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/module: set trampoline section flags regardless of CONFIG_DYNAMIC_FTRACE
arm64: Remove exporting cpu_logical_map symbol