Alexandre Belloni [Sun, 27 Aug 2023 22:15:31 +0000 (00:15 +0200)]
rtc: stop warning for invalid alarms when the alarm is disabled
When the alarm is not enabled, it may never have been set and so we can't
expect it to be valid. This will prevent the apparition of boot messages
like this one:
Mike Looijmans [Mon, 21 Aug 2023 07:20:13 +0000 (09:20 +0200)]
rtc: pcf85363: Allow to wake up system without IRQ
When wakeup-source is set in the devicetree, set up the device for
using the output as interrupt instead of clock. This is similar to
how other RTC devices handle this.
This allows the clock chip to turn on the board when wired to do
so in hardware.
Guenter Roeck [Thu, 17 Aug 2023 22:55:37 +0000 (15:55 -0700)]
rtc: rzn1: Report maximum alarm limit to rtc core
RZN1 only supports alarms up to one week in the future.
Report the limit to the RTC core and use the reported limit
to validate the requested alarm time when setting it.
Guenter Roeck [Thu, 17 Aug 2023 22:55:36 +0000 (15:55 -0700)]
rtc: ds1305: Report maximum alarm limit to rtc core
DS1305 only supports alarms up to 24 hours in the future.
Report the limit to the RTC core, and use the reported limit
to validate the requested alarm time when setting it.
If the alarm is too large when trying to set an alarm, return -ERANGE
instead of -EDOM to align with error codes returned by other rtc drivers.
Guenter Roeck [Thu, 17 Aug 2023 22:55:33 +0000 (15:55 -0700)]
rtc: cros-ec: Detect and report supported alarm window size
The RTC on some older Chromebooks can only handle alarms less than
24 hours in the future. The only way to find out is to try to set
an alarm further in the future. If that fails, assume that the RTC
connected to the EC can only handle less than 24 hours of alarm
window, and report that value to the RTC core.
After that change, it is no longer necessary to limit the alarm time
when setting it. Report any excessive alarms to the caller instead.
Guenter Roeck [Thu, 17 Aug 2023 22:55:31 +0000 (15:55 -0700)]
rtc: Add support for limited alarm timer offsets
Some alarm timers are based on time offsets, not on absolute times.
In some situations, the amount of time that can be scheduled in the
future is limited. This may result in a refusal to suspend the system,
causing substantial battery drain.
Some RTC alarm drivers remedy the situation by setting the alarm time
to the maximum supported time if a request for an out-of-range timeout
is made. This is not really desirable since it may result in unexpected
early wakeups.
To reduce the impact of this problem, let RTC drivers report the maximum
supported alarm timer offset. The code setting alarm timers can then
decide if it wants to reject setting alarm timers to a larger value, if it
wants to implement recurring alarms until the actually requested alarm
time is met, or if it wants to accept the limited alarm time.
Only introduce the necessary variable into struct rtc_device.
Code to set and use the variable will follow with subsequent patches.
Lukas Bulwahn [Fri, 25 Aug 2023 05:39:10 +0000 (07:39 +0200)]
MAINTAINERS: remove obsolete pattern in RTC SUBSYSTEM section
Commit d890cfc25fe9 ("rtc: ds2404: Convert to GPIO descriptors") removes
the rtc-ds2404.h platform data and with that, there is no file remaining
matching the pattern 'include/linux/platform_data/rtc-*'. Hence,
./scripts/get_maintainer.pl --self-test=patterns complains about a broken
reference.
Remove the obsolete file pattern in the REAL TIME CLOCK (RTC) SUBSYSTEM.
Ruan Jinjie [Thu, 3 Aug 2023 08:07:13 +0000 (16:07 +0800)]
rtc: tps65910: Remove redundant dev_warn() and do not check for 0 return after calling platform_get_irq()
It is not possible for platform_get_irq() to return 0. Use the
return value from platform_get_irq().
And there is no need to call the dev_warn() function directly to print
a custom message when handling an error from platform_get_irq()
function as it is going to display an appropriate error message
in case of a failure.
Samuel Holland [Mon, 17 Jul 2023 19:09:37 +0000 (12:09 -0700)]
rtc: da9063: Mark the alarm IRQ as a wake IRQ
This keeps the IRQ enabled during system suspend, if the RTC's wakeup
source is enabled. Since the IRQ is not required to wake from shutdown,
continue to add the wakeup source even if registering the wakeirq fails.
See commit 029d3a6f2f3c ("rtc: da9063: add as wakeup source").
Hugo Villeneuve [Fri, 28 Jul 2023 17:12:12 +0000 (13:12 -0400)]
rtc: pcf2127: add error checking when disabling POR0
If PCF2127 device is absent from the I2C bus, or if there is a
communication problem, disabling POR0 may fail silently and we
still continue with probing the device. In that case, abort probe
operation.
Linus Walleij [Mon, 7 Aug 2023 07:31:06 +0000 (09:31 +0200)]
rtc: ds2404: Convert to GPIO descriptors
This converts the DS2404 to use GPIO descriptors instead of
hard-coded global GPIO numbers.
The platform data can be deleted because there are no in-tree
users and it only contained GPIO numbers which are now
passed using descriptor tables (or device tree or ACPI).
The driver was rewritten to use a state container for the
device driver state (struct ds2404 *chip) and pass that
around instead of using a global singleton storage for the
GPIO handles.
When declaring GPIO descriptor tables or other hardware
descriptions for the RTC driver, implementers should take care
to flag the RESET line as active low, such as by using the
GPIOD_ACTIVE_LOW flag in the descriptor table.
Nathan Chancellor [Tue, 15 Aug 2023 22:16:41 +0000 (15:16 -0700)]
rtc: stm32: Use NOIRQ_SYSTEM_SLEEP_PM_OPS()
After the switch to SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() and a subsequent
fix, stm32_rtc_{suspend,resume}() are unused when CONFIG_PM_SLEEP is not
set because SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() is a no-op in that
configuration:
drivers/rtc/rtc-stm32.c:904:12: error: 'stm32_rtc_resume' defined but not used [-Werror=unused-function]
904 | static int stm32_rtc_resume(struct device *dev)
| ^~~~~~~~~~~~~~~~
drivers/rtc/rtc-stm32.c:894:12: error: 'stm32_rtc_suspend' defined but not used [-Werror=unused-function]
894 | static int stm32_rtc_suspend(struct device *dev)
| ^~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
The non-"SET_" version of this macro, NOIRQ_SYSTEM_SLEEP_PM_OPS(), is
designed to handle this situation by only assigning the callbacks when
CONFIG_PM_SLEEP is set while allowing the functions to appear used to
the compiler. Switch to that macro to resolve the warnings. There is no
functional change with this, as SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() is
defined using NOIRQ_SYSTEM_SLEEP_PM_OPS() when CONFIG_PM_SLEEP is set.
Rasmus Villemoes [Thu, 15 Jun 2023 10:58:26 +0000 (12:58 +0200)]
rtc: isl12022: implement support for the #clock-cells DT property
If device tree implies that the chip's IRQ/F_OUT pin is used as a
clock, expose that in the driver. For now, pretend it is a
fixed-rate (32kHz) clock; if other use cases appear the driver can be
updated to provide its own clk_ops etc.
When the clock output is not used on a given board, one can prolong
the battery life by ensuring that the FOx bits are 0. For the hardware
I'm currently working on, the RTC draws 1.2uA with the FOx bits at
their default 0001 value, dropping to 0.88uA when those bits are
cleared.
Rasmus Villemoes [Thu, 15 Jun 2023 10:58:24 +0000 (12:58 +0200)]
rtc: isl12022: trigger battery level detection during probe
Since the meaning of the SR_LBAT85 and SR_LBAT75 bits are different in
battery backup mode, they may very well be set after power on, and
stay set for up to a minute (i.e. until the battery detection in VDD
mode happens when the seconds counter hits 59). This would mean that
userspace doing a ioctl(RTC_VL_READ) early on could get a false
positive.
The battery level detection can also be triggered by explicitly
writing a 1 to the TSE bit in the BETA register. Do that once during
boot. Empirically, this does not immediately update the bits in
the status register (i.e., an immediate read of SR after this write
can still show stale values), but the update is done after a few
milliseconds, so certainly before the RTC device gets registered and
userspace has a chance of doing the ioctl() on this device.
Rasmus Villemoes [Thu, 15 Jun 2023 10:58:23 +0000 (12:58 +0200)]
rtc: isl12022: implement RTC_VL_READ ioctl
Hook up support for reading the values of the SR_LBAT85 and SR_LBAT75
bits. Translate the former to "battery low", and the latter to
"battery empty or not-present".
Rasmus Villemoes [Thu, 15 Jun 2023 10:58:22 +0000 (12:58 +0200)]
rtc: isl12022: add support for trip level DT binding
Implement support for using the values given in the
isil,battery-trip-levels-microvolt property to set appropriate values
in the VB85TP/VB75TP bits in the PWR_VBAT register.
Rasmus Villemoes [Thu, 15 Jun 2023 10:58:21 +0000 (12:58 +0200)]
dt-bindings: rtc: isl12022: add bindings for battery alarm trip levels
The isl12022 has a built-in support for monitoring the voltage of the
backup battery, and setting bits in the status register when that
voltage drops below two predetermined levels (usually 85% and 75% of
the nominal voltage). However, since it can operate at wide range of
battery voltages (2.5V - 5.5V), one must configure those trip levels
according to which battery is used on a given board.
Add bindings for defining these two trip levels. While the register
and bit names suggest that they should correspond to 85% and 75% of
the nominal battery voltage, the data sheet also says
There are total of 7 levels that could be selected for the first
alarm. Any of the of levels could be selected as the first alarm
with no reference as to nominal Battery voltage level.
Hence this provides the hardware designer the ability to choose values
based on the discharge characteristics of the battery chosen for the
given product, rather than just having one battery-microvolt property
and having the driver choose levels close to 0.85/0.75 times that.
Rasmus Villemoes [Thu, 15 Jun 2023 10:58:19 +0000 (12:58 +0200)]
rtc: isl12022: remove wrong warning for low battery level
There are multiple problems with this warning.
First of all, it triggers way too often, in fact nearly on every boot,
because the SR_LBAT85/SR_LBAT75 bits have another meaning when in
battery backup mode. Quoting from the data sheet:
LOW BATTERY INDICATOR 85% BIT (LBAT85)
In Normal Mode (VDD), this bit indicates when the battery level has
dropped below the pre-selected trip levels. [...] The LBAT85
detection happens automatically once every minute when seconds
register reaches 59.
In Battery Mode (VBAT), this bit indicates the device has entered
into battery mode by polling once every 10 minutes. The LBAT85
detection happens automatically once when the minute register
reaches x9h or x0h minutes.
Similar wording applies to the LBAT75 bit.
This means that if the device is powered off for more than 10 minutes,
the LBAT85 bit is guaranteed to be set. Upon power-on, unless we're
close enough to the end of a minute and/or the boot is slow enough
that the second register passes 59, the LBAT85 bit is still set when
the kernel (or early userspace) reads the RTC to set the system's
wallclock time.
Another minor problem is with the bit logic. If the 75% level is
reached, logically we're also below 85%, so both bits would most
likely be set. So even if the battery is below 75%, the warning would
still say "voltage dropped below 85%".
A third problem is that the driver and current DT binding offer no way
to indicate the nominal battery level and/or settings of the Battery
Level Monitor Trip Bits. Since the default value of the VB85TP[2:0] and
VB75TP[2:0] bits are 000, this means the actual setting of the
LBAT85/LBAT75 bits in VDD mode doesn't happen until the battery is below
2.125V/1.875V, which for a standard 3V battery is way too late.
A fourth problem is emitting this warning from ->read_time:
util-linux' hwclock will, in the absence of support for getting an
interrupt when the seconds counter is updated, issue
ioctl(RTC_RD_TIME) in a busy-loop until it sees a change in the
seconds field. In that case, if the battery low bits are set (either
genuinely, more than a minute after boot, due to the battery actually
being low, or as above, bogusly shortly after boot), the kernel log is
swamped with hundreds of identical warnings.
Subsequent patches will add such bindings and driver support, and also
proper support for RTC_VL_READ. For now, remove the broken warning.
Chen Jiahao [Wed, 2 Aug 2023 09:36:50 +0000 (17:36 +0800)]
rtc: sunplus: Clean up redundant dev_err_probe()
Referring to platform_get_irq()'s definition, the return value has
already been checked if ret < 0, and printed via dev_err_probe().
Calling dev_err_probe() one more time outside platform_get_irq()
is obviously redundant.
Removing dev_err_probe() outside platform_get_irq() to clean up
above problem.
Arnd Bergmann [Tue, 1 Aug 2023 10:59:15 +0000 (12:59 +0200)]
rtc: stm32: remove incorrect #ifdef check
After a previous commit changed the driver over to
SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(), the suspend/resume
functions must no longer be hidden behind an #ifdef:
In file included from include/linux/clk.h:13,
from drivers/rtc/rtc-stm32.c:8:
drivers/rtc/rtc-stm32.c:927:39: error: 'stm32_rtc_suspend' undeclared here (not in a function); did you mean 'stm32_rtc_probe'?
927 | SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(stm32_rtc_suspend, stm32_rtc_resume)
| ^~~~~~~~~~~~~~~~~
include/linux/kernel.h:58:44: note: in definition of macro 'PTR_IF'
58 | #define PTR_IF(cond, ptr) ((cond) ? (ptr) : NULL)
| ^~~
include/linux/pm.h:329:26: note: in expansion of macro 'pm_sleep_ptr'
329 | .suspend_noirq = pm_sleep_ptr(suspend_fn), \
| ^~~~~~~~~~~~
rtc: isl12026: Drop "_new" from probe callback name
The driver was introduced when .probe_new was the right probe callback
to use for i2c drivers. Today .probe is the right one (again) and the
driver was already switched in commit 31b0cecb4042 ("rtc: Switch i2c
drivers back to use .probe()") but the name continued to include "_new"
in its name.
To prevent code readers wondering about what might be new here, drop
that irritating part of the name.
Rob Herring [Mon, 24 Jul 2023 20:54:54 +0000 (14:54 -0600)]
rtc: Explicitly include correct DT includes
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.
Andrej Picej [Fri, 23 Jun 2023 08:15:33 +0000 (10:15 +0200)]
rtc: rv3028: Add support for "aux-voltage-chargeable" property
Property "trickle-resistor-ohms" allows us to set trickle charger
resistor. However there is no possibility to disable it afterwards.
Add support for "aux-voltage-chargeable" property which can be used to
enable/disable the trickle charger circuit explicitly. The default
behavior of the code is kept as it is!
Additionally, lets make sure we only update internal EEPROM in case of a
change. This prevents wear due to excessive EEPROM writes on each probe.
The added HAS_IOPORT dependency might not actually be necessary as Geert
points out, but the driver is also only used on one architecture. Sparc
is also a special case here since it converts port numbers into virtual
addresses rather than having them mapped into a particular part of the
__iomem address space, so the difference is actually not important here.
Add a dependency on sparc, but allow compile-testing otherwise, to
make this clearer without anyone having to spend much time modernizing
the driver beyond that.
Biju Das [Mon, 17 Jul 2023 12:40:59 +0000 (13:40 +0100)]
rtc: pcf85063: Drop enum pcf85063_type and split pcf85063_cfg[]
Drop enum pcf85063_type and split the array pcf85063_cfg[] as individual
variables, and make lines shorter by referring to e.g. &pcf85063_cfg
instead of &pcf85063_cfg[PCF85063].
Biju Das [Mon, 17 Jul 2023 12:40:58 +0000 (13:40 +0100)]
rtc: pcf85063: Simplify probe()
The pcf85063_ids[].driver_data could store a pointer to the config,
like for DT-based matching, making I2C and DT-based matching
more similar.
After that, we can simplify the probe() by replacing of_device_get_
match_data() and i2c_match_id() by i2c_get_match_data() as we have
similar I2C and DT-based matching table.
Biju Das [Mon, 10 Jul 2023 11:47:47 +0000 (12:47 +0100)]
rtc: isl1208: Simplify probe()
Simplify the probe() by replacing of_device_get_match_data() and
i2c_match_id() by i2c_get_match_data() as we have similar I2C
and DT-based matching table.
rtc: stm32: fix issues of stm32_rtc_valid_alrm function
stm32_rtc_valid_alrm function has some issues :
- arithmetical operations are impossible on BCD values
- "cur_mon + 1" can overflow
- the use case with the next month, the same day/hour/minutes went wrong
To solve that, we prefer to use timestamp comparison.
e.g. : On 5 Dec. 2021, the alarm limit is 5 Jan. 2022 (+31 days)
On 31 Jan 2021, the alarm limit is 28 Feb. 2022 (+28 days)
Gabriel Fernandez [Wed, 5 Jul 2023 17:43:55 +0000 (19:43 +0200)]
rtc: stm32: change PM callbacks to "_noirq()"
The RTC driver stops the RTCAPB clock during suspend, but the
irq handler from RTC is called before starting clock. Then we are
blocked while accessing RTC registers.
We changes PM callbacks to '_no_irq()' to disable irq during
resume callback and so irq handler will be called after the enable
of RTCAPB clock.
The rtc is used to update the stgen counter on wake up from
low power modes, so it needs to be as much accurate as possible.
The maximization of asynchronous divider leads to a 4ms rtc
precision clock.
By decreasing pred_a to 0, it will have pred_s=32767 (when
need_accuracy is true), so stgen clock becomes more accurate
with 30us precision.
Nevertheless this will leads to an increase of power consumption.
Antonio Borneo [Wed, 5 Jul 2023 17:43:52 +0000 (19:43 +0200)]
rtc: stm32: don't stop time counter if not needed
RTC counters are stopped when INIT bit in ISR register is set and
start counting from the (eventual) new value when INIT is reset.
In stm32_rtc_init(), called during probe, the INIT bit is set to
program the prescaler and the 24h mode. This halts the RTC counter
at each probe tentative causing the RTC time to loose from 0.3s to
0.8s at each kernel boot.
If the RTC is battery powered, both prescaler value and 24h mode
are kept during power cycle and there is no need to program them
again.
Check if the desired prescaler value and the 24h mode are already
programmed, then skip reprogramming them to avoid halting the time
counter.
Antonio Borneo [Wed, 5 Jul 2023 17:43:51 +0000 (19:43 +0200)]
rtc: stm32: use the proper register sequence to read date/time
Date and time are read from two separate RTC registers.
To ensure consistency between the two registers, reading the time
register locks the values in the shadow date register until the
date register is read.
Thus, the whole date/time read requires reading the time register
first, followed by reading the date register.
If the reads are done in reversed order, the shadow date register
will remain locked until a future read operation. The future read
will read the former date value that could be already invalid.
Fix the read order of date/time registers in stm32_rtc_valid_alrm()
Hugo Villeneuve [Thu, 22 Jun 2023 14:57:58 +0000 (10:57 -0400)]
rtc: pcf2127: add flag for watchdog register value read support
The watchdog value register cannot be read on the PCF2131 after being
set.
Add a new flag to identify which variant has read access to this
register, and use this flag to selectively test if watchdog timer was
started by bootloader.
Hugo Villeneuve [Thu, 22 Jun 2023 14:57:57 +0000 (10:57 -0400)]
rtc: pcf2127: support generic watchdog timing configuration
Introduce in the configuration structure two new values to hold the
watchdog clock source and the min_hw_heartbeat_ms value.
The minimum and maximum timeout values are automatically computed from
the watchdog clock source value for each variant.
The PCF2131 has no 1Hz watchdog clock source, as is the case for
PCF2127/29.
The next best choice is using a 1/4Hz clock, giving a watchdog timeout
range between 4 and 1016s. By using the same register configuration as
for the PCF2127/29, the 1/4Hz clock source is selected.
Note: the PCF2127 datasheet gives a min/max range between 1 and 255s,
but it should be between 2 and 254s, because the watchdog is triggered
when the timer value reaches 1, not 0.
Hugo Villeneuve [Thu, 22 Jun 2023 14:57:56 +0000 (10:57 -0400)]
rtc: pcf2127: adapt time/date registers write sequence for PCF2131
The sequence for updating the time/date registers is slightly
different between PCF2127/29 and PCF2131.
For PCF2127/29, during write operations, the time counting
circuits (memory locations 03h through 09h) are automatically blocked.
For PCF2131, time/date registers write access requires setting the
STOP bit and sending the clear prescaler instruction (CPR). STOP then
needs to be released once write operation is completed.
Hugo Villeneuve [Thu, 22 Jun 2023 14:57:55 +0000 (10:57 -0400)]
rtc: pcf2127: add support for PCF2131 interrupts on output INT_A
The PCF2127 and PCF2129 have one output interrupt pin. The PCF2131 has
two, named INT_A and INT_B. The hardware support that any interrupt
source can be routed to either one or both of them.
Force all interrupt sources to go to the INT A pin.
Support to route any interrupt source to INT A/B pins is not supported
by this driver at the moment.
Hugo Villeneuve [Thu, 22 Jun 2023 14:57:54 +0000 (10:57 -0400)]
rtc: pcf2127: add support for PCF2131 RTC
This RTC is very similar in functionality to the PCF2127/29.
Basically it:
-supports two new control registers at offsets 4 and 5
-supports a new reset register (not implemented in this driver)
-supports 4 tamper detection functions instead of 1
-has no nvmem (like the PCF2129)
-has two output interrupt pins
Because of that, most of the register addresses are very different,
although they still follow the same layout. For example, the tamper
registers have a different base address, but the offsets are all the same.
Create variant-specific configuration structures to simplify the
implementation of new variants into this driver. It will also avoid
to have too many tests for a specific variant, or a list of variants
for new devices, inside the code itself.
Add configuration options for the support of the NVMEM, bit CD0 in
register WD_CTL as well as the maximum number of registers for each
variant, instead of hardcoding the variant (PCF2127) inside the
i2c_device_id and spi_device_id structures.
Also specify a different maximum number of registers (max_register)
for the PCF2129.
Reading the 7 timetamp registers currently involves reading 25 registers
solely to be able to print the content of the three control registers,
in addition to the 7 timestamp registers. This print never occurs,
unless the user enables dynamic debug in this driver or set
CONFIG_RTC_DEBUG.
Reading the timestamp registers should consist of reading 7
consecutive timestamp registers.
This patch optimize the performance of reading the timestamp registers
by reading 7 consecutive registers instead of 25, and dropping the
print of the control registers.
Hugo Villeneuve [Thu, 22 Jun 2023 14:57:44 +0000 (10:57 -0400)]
rtc: pcf2127: improve rtc_read_time() performance
Improve performance and readability of rtc_read_time() by reading only
the 7 time registers, instead of reading 8 registers (additional CTRL3
register).
We drop reading of CTRL3 to monitor the low battery flag, as this
check is already available in the ioctl. Anyway, this check only
display an info message and has no other impacts.
The code readability also improves as we do not have to fiddle with
buffer pointer and size arithmetic.
We just sorted the entries and fields last release, so just out of a
perverse sense of curiosity, I decided to see if we can keep things
ordered for even just one release.
The answer is "No. No we cannot".
I suggest that all kernel developers will need weekly training sessions,
involving a lot of Big Bird and Sesame Street. And at the yearly
maintainer summit, we will all sing the alphabet song together.
I doubt I will keep doing this. At some point "perverse sense of
curiosity" turns into just a cold dark place filled with sadness and
despair.
Repeats: 80e62bc8487b ("MAINTAINERS: re-sort all entries and fields") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- swiotlb area sizing fixes (Petr Tesarik)
* tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: reduce the number of areas to match actual memory pool size
swiotlb: always set the number of areas before allocating the pool
Merge tag 'x86-core-2023-07-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A single fix for the mechanism to park CPUs with an INIT IPI.
On shutdown or kexec, the kernel tries to park the non-boot CPUs with
an INIT IPI. But the same code path is also used by the crash utility.
If the CPU which panics is not the boot CPU then it sends an INIT IPI
to the boot CPU which resets the machine.
Prevent this by validating that the CPU which runs the stop mechanism
is the boot CPU. If not, leave the other CPUs in HLT"
* tag 'x86-core-2023-07-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smp: Don't send INIT to boot CPU
Merge tag 'mips_6.5_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- fixes for KVM
- fix for loongson build and cpu probing
- DT fixes
* tag 'mips_6.5_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: kvm: Fix build error with KVM_MIPS_DEBUG_COP0_COUNTERS enabled
MIPS: dts: add missing space before {
MIPS: Loongson: Fix build error when make modules_install
MIPS: KVM: Fix NULL pointer dereference
MIPS: Loongson: Fix cpu_probe_loongson() again
Merge tag '6.5-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French:
- fix potential use after free in unmount
- minor cleanup
- add worker to cleanup stale directory leases
* tag '6.5-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Add a laundromat thread for cached directories
smb: client: remove redundant pointer 'server'
cifs: fix session state transition to avoid use-after-free issue
Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"16 hotfixes. Six are cc:stable and the remainder address post-6.4
issues"
The merge undoes the disabling of the CONFIG_PER_VMA_LOCK feature, since
it was all hopefully fixed in mainline.
* tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
lib: dhry: fix sleeping allocations inside non-preemptable section
kasan, slub: fix HW_TAGS zeroing with slub_debug
kasan: fix type cast in memory_is_poisoned_n
mailmap: add entries for Heiko Stuebner
mailmap: update manpage link
bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page
MAINTAINERS: add linux-next info
mailmap: add Markus Schneider-Pargmann
writeback: account the number of pages written back
mm: call arch_swap_restore() from do_swap_page()
squashfs: fix cache race with migration
mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
docs: update ocfs2-devel mailing list address
MAINTAINERS: update ocfs2-devel mailing list address
mm: disable CONFIG_PER_VMA_LOCK until its fixed
fork: lock VMAs of the parent process when forking
fork: lock VMAs of the parent process when forking
When forking a child process, the parent write-protects anonymous pages
and COW-shares them with the child being forked using copy_present_pte().
We must not take any concurrent page faults on the source vma's as they
are being processed, as we expect both the vma and the pte's behind it
to be stable. For example, the anon_vma_fork() expects the parents
vma->anon_vma to not change during the vma copy.
A concurrent page fault on a page newly marked read-only by the page
copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the
source vma, defeating the anon_vma_clone() that wasn't done because the
parent vma originally didn't have an anon_vma, but we now might end up
copying a pte entry for a page that has one.
Before the per-vma lock based changes, the mmap_lock guaranteed
exclusion with concurrent page faults. But now we need to do a
vma_start_write() to make sure no concurrent faults happen on this vma
while it is being processed.
This fix can potentially regress some fork-heavy workloads. Kernel
build time did not show noticeable regression on a 56-core machine while
a stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
mm: lock newly mapped VMA which can be modified after it becomes visible
mmap_region adds a newly created VMA into VMA tree and might modify it
afterwards before dropping the mmap_lock. This poses a problem for page
faults handled under per-VMA locks because they don't take the mmap_lock
and can stumble on this VMA while it's still being modified. Currently
this does not pose a problem since post-addition modifications are done
only for file-backed VMAs, which are not handled under per-VMA lock.
However, once support for handling file-backed page faults with per-VMA
locks is added, this will become a race.
Fix this by write-locking the VMA before inserting it into the VMA tree.
Other places where a new VMA is added into VMA tree do not modify it
after the insertion, so do not need the same locking.
With recent changes necessitating mmap_lock to be held for write while
expanding a stack, per-VMA locks should follow the same rules and be
write-locked to prevent page faults into the VMA being expanded. Add
the necessary locking.
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"A few late arriving patches that missed the initial pull request. It's
mostly bug fixes (the dt-bindings is a fix for the initial pull)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Remove unused function declaration
scsi: target: docs: Remove tcm_mod_builder.py
scsi: target: iblock: Quiet bool conversion warning with pr_preempt use
scsi: dt-bindings: ufs: qcom: Fix ICE phandle
scsi: core: Simplify scsi_cdl_check_cmd()
scsi: isci: Fix comment typo
scsi: smartpqi: Replace one-element arrays with flexible-array members
scsi: target: tcmu: Replace strlcpy() with strscpy()
scsi: ncr53c8xx: Replace strlcpy() with strscpy()
scsi: lpfc: Fix lpfc_name struct packing
Merge tag 'i2c-for-6.5-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull more i2c updates from Wolfram Sang:
- xiic patch should have been in the original pull but slipped through
- mpc patch fixes a build regression
- nomadik cleanup
* tag 'i2c-for-6.5-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mpc: Drop unused variable
i2c: nomadik: Remove a useless call in the remove function
i2c: xiic: Don't try to handle more interrupt events after error
Merge tag 'hardening-v6.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- Check for NULL bdev in LoadPin (Matthias Kaehlcke)
- Revert unwanted KUnit FORTIFY build default
- Fix 1-element array causing boot warnings with xhci-hub
* tag 'hardening-v6.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
usb: ch9: Replace bmSublinkSpeedAttr 1-element array with flexible array
Revert "fortify: Allow KUnit test to build without FORTIFY"
dm: verity-loadpin: Add NULL pointer check for 'bdev' parameter
The debugfs_create_dir function returns ERR_PTR in case of error, and the
only correct way to check if an error occurred is 'IS_ERR' inline function.
This patch will replace the null-comparison with IS_ERR.
Signed-off-by: Anup Sharma <anupnewsmail@gmail.com> Suggested-by: Ivan Orlov <ivan.orlov0322@gmail.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
- Update AMD IBS event error message since it now support per-process
profiling but no priviledge filters.
$ sudo perf record -e ibs_op//k -C 0
Error:
AMD IBS doesn't support privilege filtering. Try again without
the privilege modifiers (like 'k') at the end.
- Add --output option to save the data to a file not to be interfered
by other debug messages.
Test:
- Fix event parsing test on ARM where there's no raw PMU nor supports
PERF_PMU_CAP_EXTENDED_HW_TYPE.
- Update the lock contention test case for CSV output.
- Fix a segfault in the daemon command test.
Vendor events (JSON):
- Add has_event() to check if the given event is available on system
at runtime. On Intel machines, some transaction events may not be
present when TSC extensions are disabled.
- Update Intel event metrics.
Misc:
- Sort symbols by name using an external array of pointers instead of
a rbtree node in the symbol. This will save 16-bytes or 24-bytes
per symbol whether the sorting is actually requested or not.
- Fix unwinding DWARF callstacks using libdw when --symfs option is
used"
* tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (38 commits)
perf test: Fix event parsing test when PERF_PMU_CAP_EXTENDED_HW_TYPE isn't supported.
perf test: Fix event parsing test on Arm
perf evsel amd: Fix IBS error message
perf: unwind: Fix symfs with libdw
perf symbol: Fix uninitialized return value in symbols__find_by_name()
perf test: Test perf lock contention CSV output
perf lock contention: Add --output option
perf lock contention: Add -x option for CSV style output
perf lock: Remove stale comments
perf vendor events intel: Update tigerlake to 1.13
perf vendor events intel: Update skylakex to 1.31
perf vendor events intel: Update skylake to 57
perf vendor events intel: Update sapphirerapids to 1.14
perf vendor events intel: Update icelakex to 1.21
perf vendor events intel: Update icelake to 1.19
perf vendor events intel: Update cascadelakex to 1.19
perf vendor events intel: Update meteorlake to 1.03
perf vendor events intel: Add rocketlake events/metrics
perf vendor metrics intel: Make transaction metrics conditional
perf jevents: Support for has_event function
...
The tests that don't use expect_eq() macro to determine that a test
is failured must increment failed_tests explicitly.
- lib/bitmap: drop optimization of bitmap_{from,to}_arr64
bitmap_{from,to}_arr64() optimization is overly optimistic
on 32-bit LE architectures when it's wired to
bitmap_copy_clear_tail().
- nodemask: Drop duplicate check in for_each_node_mask()
As the return value type of first_node() became unsigned, the node
>= 0 became unnecessary.
- cpumask: fix function description kernel-doc notation
- MAINTAINERS: Add bits.h and bitfield.h to the BITMAP API record
Add linux/bits.h and linux/bitfield.h for visibility"
* tag 'bitmap-6.5-rc1' of https://github.com/norov/linux:
MAINTAINERS: Add bitfield.h to the BITMAP API record
MAINTAINERS: Add bits.h to the BITMAP API record
cpumask: fix function description kernel-doc notation
nodemask: Drop duplicate check in for_each_node_mask()
lib/bitmap: drop optimization of bitmap_{from,to}_arr64
lib/test_bitmap: increment failure counter properly
Commit 946fa0dbf2d8 ("mm/slub: extend redzone check to extra allocated
kmalloc space than requested") added precise kmalloc redzone poisoning to
the slub_debug functionality.
However, this commit didn't account for HW_TAGS KASAN fully initializing
the object via its built-in memory initialization feature. Even though
HW_TAGS KASAN memory initialization contains special memory initialization
handling for when slub_debug is enabled, it does not account for in-object
slub_debug redzones. As a result, HW_TAGS KASAN can overwrite these
redzones and cause false-positive slub_debug reports.
To fix the issue, avoid HW_TAGS KASAN memory initialization when
slub_debug is enabled altogether. Implement this by moving the
__slub_debug_enabled check to slab_post_alloc_hook. Common slab code
seems like a more appropriate place for a slub_debug check anyway.
Link: https://lkml.kernel.org/r/678ac92ab790dba9198f9ca14f405651b97c8502.1688561016.git.andreyknvl@google.com Fixes: 946fa0dbf2d8 ("mm/slub: extend redzone check to extra allocated kmalloc space than requested") Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reported-by: Will Deacon <will@kernel.org> Acked-by: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: kasan-dev@googlegroups.com Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Collingbourne <pcc@google.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit bb6e04a173f0 ("kasan: use internal prototypes matching gcc-13
builtins") introduced a bug into the memory_is_poisoned_n implementation:
it effectively removed the cast to a signed integer type after applying
KASAN_GRANULE_MASK.
As a result, KASAN started failing to properly check memset, memcpy, and
other similar functions.
Fix the bug by adding the cast back (through an additional signed integer
variable to make the code more readable).
I am going to lose my vrull.eu address at the end of july, and while
adding it to mailmap I also realised that there are more old addresses
from me dangling, so update .mailmap for all of them.
Liu Shixin [Tue, 4 Jul 2023 10:19:42 +0000 (18:19 +0800)]
bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page
commit dd0ff4d12dd2 ("bootmem: remove the vmemmap pages from kmemleak in
put_page_bootmem") fix an overlaps existing problem of kmemleak. But the
problem still existed when HAVE_BOOTMEM_INFO_NODE is disabled, because in
this case, free_bootmem_page() will call free_reserved_page() directly.
Fix the problem by adding kmemleak_free_part() in free_bootmem_page() when
HAVE_BOOTMEM_INFO_NODE is disabled.
Link: https://lkml.kernel.org/r/20230704101942.2819426-1-liushixin2@huawei.com Fixes: f41f2ed43ca5 ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Wed, 28 Jun 2023 18:55:48 +0000 (19:55 +0100)]
writeback: account the number of pages written back
nr_to_write is a count of pages, so we need to decrease it by the number
of pages in the folio we just wrote, not by 1. Most callers specify
either LONG_MAX or 1, so are unaffected, but writeback_sb_inodes() might
end up writing 512x as many pages as it asked for.
Dave added:
: XFS is the only filesystem this would affect, right? AFAIA, nothing
: else enables large folios and uses writeback through
: write_cache_pages() at this point...
:
: In which case, I'd be surprised if much difference, if any, gets
: noticed by anyone.
Link: https://lkml.kernel.org/r/20230628185548.981888-1-willy@infradead.org Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>