Guenter Roeck [Thu, 13 Mar 2025 11:43:26 +0000 (11:43 +0000)]
sh: add support for suppressing warning backtraces
Add name of functions triggering warning backtraces to the __bug_table
object section to enable support for suppressing WARNING backtraces.
To limit image size impact, the pointer to the function name is only added
to the __bug_table section if both CONFIG_KUNIT_SUPPRESS_BACKTRACE and
CONFIG_DEBUG_BUGVERBOSE are enabled. Otherwise, the __func__ assembly
parameter is replaced with a (dummy) NULL parameter to avoid an image size
increase due to unused __func__ entries (this is necessary because
__func__ is not a define but a virtual variable).
Link: https://lkml.kernel.org/r/20250313114329.284104-12-acarmina@redhat.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Alessandro Carminati <acarmina@redhat.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Acked-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arthur Grillo <arthurgrillo@riseup.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Charlie Jenkins <charlie@rivosinc.com> Cc: Daniel Diaz <daniel.diaz@linaro.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Gow <davidgow@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maíra Canal <mcanal@igalia.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rae Moar <rmoar@google.com> Cc: Simon Horman <horms@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Zimemrmann <tzimmermann@suse.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Guenter Roeck [Thu, 13 Mar 2025 11:43:25 +0000 (11:43 +0000)]
s390: add support for suppressing warning backtraces
Add name of functions triggering warning backtraces to the __bug_table
object section to enable support for suppressing WARNING backtraces.
To limit image size impact, the pointer to the function name is only added
to the __bug_table section if both CONFIG_KUNIT_SUPPRESS_BACKTRACE and
CONFIG_DEBUG_BUGVERBOSE are enabled. Otherwise, the __func__ assembly
parameter is replaced with a (dummy) NULL parameter to avoid an image size
increase due to unused __func__ entries (this is necessary because
__func__ is not a define but a virtual variable).
Link: https://lkml.kernel.org/r/20250313114329.284104-11-acarmina@redhat.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Alessandro Carminati <acarmina@redhat.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Acked-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arthur Grillo <arthurgrillo@riseup.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Charlie Jenkins <charlie@rivosinc.com> Cc: Daniel Diaz <daniel.diaz@linaro.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Gow <davidgow@google.com> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kees Cook <keescook@chromium.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maíra Canal <mcanal@igalia.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rae Moar <rmoar@google.com> Cc: Rich Felker <dalias@libc.org> Cc: Simon Horman <horms@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Zimemrmann <tzimmermann@suse.de> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Guenter Roeck [Thu, 13 Mar 2025 11:43:24 +0000 (11:43 +0000)]
parisc: add support for suppressing warning backtraces
Add name of functions triggering warning backtraces to the __bug_table
object section to enable support for suppressing WARNING backtraces.
To limit image size impact, the pointer to the function name is only added
to the __bug_table section if both CONFIG_KUNIT_SUPPRESS_BACKTRACE and
CONFIG_DEBUG_BUGVERBOSE are enabled. Otherwise, the __func__ assembly
parameter is replaced with a (dummy) NULL parameter to avoid an image size
increase due to unused __func__ entries (this is necessary because
__func__ is not a define but a virtual variable).
While at it, declare assembler parameters as constants where possible.
Refine .blockz instructions to calculate the necessary padding instead of
using fixed values.
Link: https://lkml.kernel.org/r/20250313114329.284104-10-acarmina@redhat.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Alessandro Carminati <acarmina@redhat.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Acked-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Helge Deller <deller@gmx.de> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arthur Grillo <arthurgrillo@riseup.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Charlie Jenkins <charlie@rivosinc.com> Cc: Daniel Diaz <daniel.diaz@linaro.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Gow <davidgow@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kees Cook <keescook@chromium.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maíra Canal <mcanal@igalia.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rae Moar <rmoar@google.com> Cc: Rich Felker <dalias@libc.org> Cc: Simon Horman <horms@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Zimemrmann <tzimmermann@suse.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Guenter Roeck [Thu, 13 Mar 2025 11:43:23 +0000 (11:43 +0000)]
loongarch: add support for suppressing warning backtraces
Add name of functions triggering warning backtraces to the __bug_table
object section to enable support for suppressing WARNING backtraces.
To limit image size impact, the pointer to the function name is only added
to the __bug_table section if both CONFIG_KUNIT_SUPPRESS_BACKTRACE and
CONFIG_DEBUG_BUGVERBOSE are enabled. Otherwise, the __func__ assembly
parameter is replaced with a (dummy) NULL parameter to avoid an image size
increase due to unused __func__ entries (this is necessary because
__func__ is not a define but a virtual variable).
Link: https://lkml.kernel.org/r/20250313114329.284104-9-acarmina@redhat.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Alessandro Carminati <acarmina@redhat.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Acked-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arthur Grillo <arthurgrillo@riseup.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Charlie Jenkins <charlie@rivosinc.com> Cc: Daniel Diaz <daniel.diaz@linaro.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Gow <davidgow@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kees Cook <keescook@chromium.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maíra Canal <mcanal@igalia.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rae Moar <rmoar@google.com> Cc: Rich Felker <dalias@libc.org> Cc: Simon Horman <horms@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Zimemrmann <tzimmermann@suse.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Guenter Roeck [Thu, 13 Mar 2025 11:43:22 +0000 (11:43 +0000)]
arm64: add support for suppressing warning backtraces
Add name of functions triggering warning backtraces to the __bug_table
object section to enable support for suppressing WARNING backtraces.
To limit image size impact, the pointer to the function name is only added
to the __bug_table section if both CONFIG_KUNIT_SUPPRESS_BACKTRACE and
CONFIG_DEBUG_BUGVERBOSE are enabled. Otherwise, the __func__ assembly
parameter is replaced with a (dummy) NULL parameter to avoid an image size
increase due to unused __func__ entries (this is necessary because
__func__ is not a define but a virtual variable).
Link: https://lkml.kernel.org/r/20250313114329.284104-8-acarmina@redhat.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Alessandro Carminati <acarmina@redhat.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Acked-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arthur Grillo <arthurgrillo@riseup.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Charlie Jenkins <charlie@rivosinc.com> Cc: Daniel Diaz <daniel.diaz@linaro.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Gow <davidgow@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kees Cook <keescook@chromium.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maíra Canal <mcanal@igalia.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rae Moar <rmoar@google.com> Cc: Rich Felker <dalias@libc.org> Cc: Simon Horman <horms@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Zimemrmann <tzimmermann@suse.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Guenter Roeck [Thu, 13 Mar 2025 11:43:21 +0000 (11:43 +0000)]
x86: add support for suppressing warning backtraces
Add name of functions triggering warning backtraces to the __bug_table
object section to enable support for suppressing WARNING backtraces.
To limit image size impact, the pointer to the function name is only added
to the __bug_table section if both CONFIG_KUNIT_SUPPRESS_BACKTRACE and
CONFIG_DEBUG_BUGVERBOSE are enabled. Otherwise, the __func__ assembly
parameter is replaced with a (dummy) NULL parameter to avoid an image size
increase due to unused __func__ entries (this is necessary because
__func__ is not a define but a virtual variable).
Link: https://lkml.kernel.org/r/20250313114329.284104-7-acarmina@redhat.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Alessandro Carminati <acarmina@redhat.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Acked-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arthur Grillo <arthurgrillo@riseup.net> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Charlie Jenkins <charlie@rivosinc.com> Cc: Daniel Diaz <daniel.diaz@linaro.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: David Gow <davidgow@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jani Nikula <jani.nikula@intel.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kees Cook <keescook@chromium.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maíra Canal <mcanal@igalia.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rae Moar <rmoar@google.com> Cc: Rich Felker <dalias@libc.org> Cc: Simon Horman <horms@kernel.org> Cc: Thomas Zimemrmann <tzimmermann@suse.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Guenter Roeck [Thu, 13 Mar 2025 11:43:20 +0000 (11:43 +0000)]
drm: suppress intentional warning backtraces in scaling unit tests
The drm_test_rect_calc_hscale and drm_test_rect_calc_vscale unit tests
intentionally trigger warning backtraces by providing bad parameters to
the tested functions. What is tested is the return value, not the
existence of a warning backtrace. Suppress the backtraces to avoid
clogging the kernel log and distraction from real problems.
Link: https://lkml.kernel.org/r/20250313114329.284104-6-acarmina@redhat.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Alessandro Carminati <acarmina@redhat.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Acked-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Maíra Canal <mcanal@igalia.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arthur Grillo <arthurgrillo@riseup.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Charlie Jenkins <charlie@rivosinc.com> Cc: Daniel Diaz <daniel.diaz@linaro.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Gow <davidgow@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kees Cook <keescook@chromium.org> Cc: Maxime Ripard <mripard@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rae Moar <rmoar@google.com> Cc: Rich Felker <dalias@libc.org> Cc: Simon Horman <horms@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Zimemrmann <tzimmermann@suse.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Guenter Roeck [Thu, 13 Mar 2025 11:43:18 +0000 (11:43 +0000)]
kunit: add test cases for backtrace warning suppression
Add unit tests to verify that warning backtrace suppression works.
If backtrace suppression does _not_ work, the unit tests will likely
trigger unsuppressed backtraces, which should actually help to get the
affected architectures / platforms fixed.
Link: https://lkml.kernel.org/r/20250313114329.284104-4-acarmina@redhat.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Alessandro Carminati <acarmina@redhat.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Acked-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arthur Grillo <arthurgrillo@riseup.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Charlie Jenkins <charlie@rivosinc.com> Cc: Daniel Diaz <daniel.diaz@linaro.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Gow <davidgow@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maíra Canal <mcanal@igalia.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rae Moar <rmoar@google.com> Cc: Rich Felker <dalias@libc.org> Cc: Simon Horman <horms@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Zimemrmann <tzimmermann@suse.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Guenter Roeck [Thu, 13 Mar 2025 11:43:16 +0000 (11:43 +0000)]
bug/kunit: core support for suppressing warning backtraces
Patch series "Add support for suppressing warning backtraces", v4.
Some unit tests intentionally trigger warning backtraces by passing bad
parameters to kernel API functions. Such unit tests typically check the
return value from such calls, not the existence of the warning backtrace.
Such intentionally generated warning backtraces are neither desirable
nor useful for a number of reasons.
- They can result in overlooked real problems.
- A warning that suddenly starts to show up in unit tests needs to be
investigated and has to be marked to be ignored, for example by
adjusting filter scripts. Such filters are ad-hoc because there is
no real standard format for warnings. On top of that, such filter
scripts would require constant maintenance.
One option to address problem would be to add messages such as "expected
warning backtraces start / end here" to the kernel log. However, that
would again require filter scripts, it might result in missing real
problematic warning backtraces triggered while the test is running, and
the irrelevant backtrace(s) would still clog the kernel log.
Solve the problem by providing a means to identify and suppress specific
warning backtraces while executing test code. Support suppressing
multiple backtraces while at the same time limiting changes to generic
code to the absolute minimum. Architecture specific changes are kept at
minimum by retaining function names only if both CONFIG_DEBUG_BUGVERBOSE
and CONFIG_KUNIT are enabled.
The first patch of the series introduces the necessary infrastructure.
The second patch introduces support for counting suppressed backtraces.
This capability is used in patch three to implement unit tests. Patch
four documents the new API.
The next two patches add support for suppressing backtraces in drm_rect
and dev_addr_lists unit tests. These patches are intended to serve as
examples for the use of the functionality introduced with this series.
The remaining patches implement the necessary changes for all
architectures with GENERIC_BUG support.
With CONFIG_KUNIT enabled, image size increase with this series applied is
approximately 1%. The image size increase (and with it the functionality
introduced by this series) can be avoided by disabling
CONFIG_KUNIT_SUPPRESS_BACKTRACE.
This patch (of 14):
Some unit tests intentionally trigger warning backtraces by passing bad
parameters to API functions. Such unit tests typically check the return
value from those calls, not the existence of the warning backtrace.
Such intentionally generated warning backtraces are neither desirable
nor useful for a number of reasons.
- They can result in overlooked real problems.
- A warning that suddenly starts to show up in unit tests needs to be
investigated and has to be marked to be ignored, for example by
adjusting filter scripts. Such filters are ad-hoc because there is
no real standard format for warnings. On top of that, such filter
scripts would require constant maintenance.
One option to address problem would be to add messages such as "expected
warning backtraces start / end here" to the kernel log. However, that
would again require filter scripts, it might result in missing real
problematic warning backtraces triggered while the test is running, and
the irrelevant backtrace(s) would still clog the kernel log.
Solve the problem by providing a means to identify and suppress specific
warning backtraces while executing test code. Since the new functionality
results in an image size increase of about 1% if CONFIG_KUNIT is enabled,
provide configuration option KUNIT_SUPPRESS_BACKTRACE to be able to
disable the new functionality. This option is by default enabled since
almost all systems with CONFIG_KUNIT enabled will want to benefit from it.
Link: https://lkml.kernel.org/r/20250313114329.284104-1-acarmina@redhat.com Link: https://lkml.kernel.org/r/20250313114329.284104-2-acarmina@redhat.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Alessandro Carminati <acarmina@redhat.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Acked-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Daniel Diaz <daniel.diaz@linaro.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arthur Grillo <arthurgrillo@riseup.net> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: David Gow <davidgow@google.com> Cc: Guenetr Roeck <linux@roeck-us.net> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maíra Canal <mcanal@igalia.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimemrmann <tzimmermann@suse.de> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Charlie Jenkins <charlie@rivosinc.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Rae Moar <rmoar@google.com> Cc: Rich Felker <dalias@libc.org> Cc: Simon Horman <horms@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The UEFI Specification version 2.9 introduces the concept of memory
acceptance: some Virtual Machine platforms, such as Intel TDX or AMD
SEV-SNP, require memory to be accepted before it can be used by the guest.
Accepting memory is expensive. The memory must be allocated by the VMM
and then brought to a known safe state: cache must be flushed, memory must
be zeroed with the guest's encryption key, and associated metadata must be
manipulated. These operations must be performed from a trusted
environment (firmware or TDX module). Switching context to and from it
also takes time.
This cost adds up. On large confidential VMs, memory acceptance alone can
take minutes. It is better to delay memory acceptance until the memory is
actually needed.
The kernel accepts memory when it is allocated from buddy allocator for
the first time. This reduces boot time and decreases memory overhead as
the VMM can allocate memory as needed.
It does not work when the guest attempts to kexec into a new kernel.
The kexec segments' destination addresses are not allocated by the buddy
allocator. Instead, they are searched from normal system RAM (top-down or
bottom-up) and exclude driver-managed memory, ACPI, persistent, and
reserved memory. Unaccepted memory is normal system RAM from kernel point
of view and kexec can place segments there.
Kexec bypasses the code path in buddy allocator where memory gets accepted
and it leads to a crash when kexec accesses segments' memory.
Accept the destination addresses during the kexec load, immediately after
they pass sanity checks. This ensures the code is located in a common
place shared by both the kexec_load and kexec_file_load system calls.
This will not conflict with the accounting in try_to_accept_memory_one()
since the accounting is set during kernel boot and decremented when pages
are moved to the freelists. There is no harm in invoking accept_memory()
on a page before making it available to the buddy allocator.
No need to worry about re-accepting memory since accept_memory() checks
the unaccepted bitmap before accepting a memory page.
Although a user may perform kexec loading without ever triggering the
jump, it doesn't impact much since kexec loading is not in a
performance-critical path. Additionally, the destination addresses are
always searched and found in the same location on a given system.
Changes to the destination address searching logic to locate only memory in
either unaccepted or accepted status are unnecessary and complicated.
[kirill.shutemov@linux.intel.com: update the commit message] Link: https://lkml.kernel.org/r/20250307084411.2150367-1-kirill.shutemov@linux.intel.com Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ashish Kalra <Ashish.Kalra@amd.com> Cc: Baoquan He <bhe@redhat.com> Cc: Jianxiong Gao <jxgao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Thorsten Blum [Thu, 13 Mar 2025 13:30:02 +0000 (14:30 +0100)]
watchdog/perf: optimize bytes copied and remove manual NUL-termination
Currently, up to 23 bytes of the source string are copied to the
destination buffer (including the comma and anything after it), only to
then manually NUL-terminate the destination buffer again at index 'len'
(where the comma was found).
Fix this by calling strscpy() with 'len' instead of the destination buffer
size to copy only as many bytes from the source string as needed.
Change the length check to allow 'len' to be less than or equal to the
destination buffer size to fill the whole buffer if needed.
Remove the if-check for the return value of strscpy(), because calling
strscpy() with 'len' always truncates the source string at the comma as
expected and NUL-terminates the destination buffer at the corresponding
index instead. Remove the manual NUL-termination.
Wei Yang [Mon, 10 Mar 2025 07:49:38 +0000 (07:49 +0000)]
lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
The comment of interval_tree_span_iter_next_gap() is not exact, nodes[1]
is not always !NULL.
There are threes cases here. If there is an interior hole, the statement
is correct. If there is a tailing hole or the contiguous used range span
to the end, nodes[1] is NULL.
Link: https://lkml.kernel.org/r/20250310074938.26756-8-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michel Lespinasse <michel@lespinasse.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When we come to the right node, we are sure the first two don't
contribute to node->ITSUBTREE and it must be the right node does the
job.
So skip the check before go to the right subtree.
Link: https://lkml.kernel.org/r/20250310074938.26756-7-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wei Yang [Mon, 10 Mar 2025 07:49:36 +0000 (07:49 +0000)]
lib/interval_tree: add test case for span iteration
Verify interval_tree_span_iter_xxx() helpers works as expected.
Link: https://lkml.kernel.org/r/20250310074938.26756-6-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wei Yang [Mon, 10 Mar 2025 07:49:35 +0000 (07:49 +0000)]
lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
Verify interval_tree_iter_xxx() helpers could find intersection ranges
as expected.
Link: https://lkml.kernel.org/r/20250310074938.26756-5-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wei Yang [Mon, 10 Mar 2025 07:49:34 +0000 (07:49 +0000)]
lib/rbtree: add random seed
Current test use pseudo rand function with fixed seed, which means the
test data is the same pattern each time.
Add random seed parameter to randomize the test.
Link: https://lkml.kernel.org/r/20250310074938.26756-4-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wei Yang [Mon, 10 Mar 2025 07:49:33 +0000 (07:49 +0000)]
lib/rbtree: split tests
Current tests are gathered in one big function.
Split tests into its own function for better understanding and also it
is a preparation for introducing new test cases.
Link: https://lkml.kernel.org/r/20250310074938.26756-3-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michel Lespinasse <michel@lespinasse.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Wei Yang [Mon, 10 Mar 2025 07:49:32 +0000 (07:49 +0000)]
lib/rbtree: enable userland test suite for rbtree related data structure
Patch series "lib/interval_tree: add some test cases and cleanup", v2.
Since rbtree/augmented tree/interval tree share similar data structure,
besides new cases for interval tree, this patch set also does cleanup for
others.
This patch (of 7):
Currently we have some tests for rbtree related data structure, e.g.
rbtree, augmented rbtree, interval tree, in lib/ as kernel module.
To facilitate the test and debug for those fundamental data structure,
this patch enable those tests in userland.
Philipp Hahn [Mon, 10 Mar 2025 11:22:27 +0000 (12:22 +0100)]
checkpatch: describe --min-conf-desc-length
Neither the warning nor the help message gives any hint on the unit for
length: Could be meters, inches, bytes, characters or ... lines.
Extend the output of `--help` to name the unit "lines" and the default:
- --min-conf-desc-length=n set the min description length, if shorter, warn
+ --min-conf-desc-length=n set the minimum description length for config symbols
+ in lines, if shorter, warn (default 4)
Include the minimum number of lines as other error messages already do:
- WARNING: please write a help paragraph that fully describes the config symbol
+ WARNING: please write a help paragraph that fully describes the config symbol with at least 4 lines
It turns out it not only leaks memory, also fails to unwind changes to the
resource tree (@base, usually iomem_resource).
Fix this by consolidating the devres and devm paths into just devres, and
move those details to a wrapper function. So now __get_free_mem_region()
only needs to worry about alloc_resource() unwinding, and the devres
failure path is resolved before touching the resource tree.
Link: https://lkml.kernel.org/r/174114125011.1367354.2670882864492961789.stgit@dwillia2-xfh.jf.intel.com Fixes: 14b80582c43e ("resource: Introduce alloc_free_mem_region()") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Li Zhijian <lizhijian@fujitsu.com> Closes: http://lore.kernel.org/20250304043415.610286-1-lizhijian@fujitsu.com Tested-by: Li Zhijian <lizhijian@fujitsu.com> Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Dan Willaims <dan.j.williams@intel.com> Cc: "Huang, Ying" <huang.ying.caritas@gmail.com> Cc: Ilpo Jarvinen <ilpo.jarvinen@linux.intel.com> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Mika Westeberg <mika.westerberg@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ilya Leoshkevich [Mon, 3 Mar 2025 11:03:58 +0000 (12:03 +0100)]
scripts/gdb/symbols: determine KASLR offset on s390
Use QEMU's qemu.PhyMemMode [1] functionality to read vmcore from the
physical memory the same way the existing dump tooling does this.
Gracefully handle non-QEMU targets, early boot, and memory corruptions;
print a warning if such situation is detected.
Mateusz Guzik [Mon, 3 Mar 2025 13:49:08 +0000 (14:49 +0100)]
signal: avoid clearing TIF_SIGPENDING in recalc_sigpending() if unset
Clearing is an atomic op and the flag is not set most of the time.
When creating and destroying threads in the same process with the pthread
family, the primary bottleneck is calls to sigprocmask which take the
process-wide sighand lock.
Avoiding the atomic gives me a 2% bump in start/teardown rate at 24-core
scale.
Antonio Quartulli [Fri, 21 Feb 2025 20:40:34 +0000 (21:40 +0100)]
scripts/gdb/linux/symbols.py: address changes to module_sect_attrs
When loading symbols from kernel modules we used to iterate
from 0 to module_sect_attrs::nsections, in order to
retrieve their name and address.
However module_sect_attrs::nsections has been removed from
the struct by a previous commit.
Re-arrange the iteration by accessing all items in
module_sect_attrs::grp::bin_attrs[] until NULL is found
(it's a NULL terminated array).
At the same time the symbol address cannot be extracted
from module_sect_attrs::attrs[]::address anymore because
it has also been deleted. Fetch it from
module_sect_attrs::grp::bin_attrs[]::private as described
in 4b2c11e4aaf7.
Link: https://lkml.kernel.org/r/20250221204034.4430-1-antonio@mandelbit.com Fixes: d8959b947a8d ("module: sysfs: Drop member 'module_sect_attrs::nsections'") Fixes: 4b2c11e4aaf7 ("module: sysfs: Drop member 'module_sect_attr::address'") Signed-off-by: Antonio Quartulli <antonio@mandelbit.com> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Kieran Bingham <kbingham@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Easwar Hariharan [Tue, 25 Feb 2025 20:17:30 +0000 (20:17 +0000)]
RDMA/bnxt_re: convert timeouts to secs_to_jiffies()
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies() to avoid the
multiplication
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
Easwar Hariharan [Tue, 25 Feb 2025 20:17:29 +0000 (20:17 +0000)]
platform/x86: thinkpad_acpi: convert timeouts to secs_to_jiffies()
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies() to avoid the
multiplication
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
Easwar Hariharan [Tue, 25 Feb 2025 20:17:28 +0000 (20:17 +0000)]
platform/x86/amd/pmf: convert timeouts to secs_to_jiffies()
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies() to avoid the
multiplication
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
Easwar Hariharan [Tue, 25 Feb 2025 20:17:27 +0000 (20:17 +0000)]
spi: spi-imx: convert timeouts to secs_to_jiffies()
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies() to avoid the
multiplication
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
Easwar Hariharan [Tue, 25 Feb 2025 20:17:26 +0000 (20:17 +0000)]
spi: spi-fsl-lpspi: convert timeouts to secs_to_jiffies()
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies() to avoid the
multiplication
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
Easwar Hariharan [Tue, 25 Feb 2025 20:17:25 +0000 (20:17 +0000)]
nvme: convert timeouts to secs_to_jiffies()
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies() to avoid the
multiplication
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
Easwar Hariharan [Tue, 25 Feb 2025 20:17:24 +0000 (20:17 +0000)]
power: supply: da9030: convert timeouts to secs_to_jiffies()
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies() to avoid the
multiplication
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
Easwar Hariharan [Tue, 25 Feb 2025 20:17:23 +0000 (20:17 +0000)]
xfs: convert timeouts to secs_to_jiffies()
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies() to avoid the
multiplication
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
Easwar Hariharan [Tue, 25 Feb 2025 20:17:22 +0000 (20:17 +0000)]
ata: libata-zpodd: convert timeouts to secs_to_jiffies()
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies() to avoid the
multiplication
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
Easwar Hariharan [Tue, 25 Feb 2025 20:17:21 +0000 (20:17 +0000)]
libceph: convert timeouts to secs_to_jiffies()
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies() to avoid the
multiplication
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
Easwar Hariharan [Tue, 25 Feb 2025 20:17:19 +0000 (20:17 +0000)]
btrfs: convert timeouts to secs_to_jiffies()
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies() to avoid the
multiplication
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
Easwar Hariharan [Tue, 25 Feb 2025 20:17:18 +0000 (20:17 +0000)]
ALSA: ac97: convert timeouts to secs_to_jiffies()
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies() to avoid the
multiplication
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
Easwar Hariharan [Tue, 25 Feb 2025 20:17:17 +0000 (20:17 +0000)]
accel/habanalabs: convert timeouts to secs_to_jiffies()
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies() to avoid the
multiplication
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
Easwar Hariharan [Tue, 25 Feb 2025 20:17:16 +0000 (20:17 +0000)]
scsi: lpfc: convert timeouts to secs_to_jiffies()
Commit b35108a51cf7 ("jiffies: Define secs_to_jiffies()") introduced
secs_to_jiffies(). As the value here is a multiple of 1000, use
secs_to_jiffies() instead of msecs_to_jiffies() to avoid the
multiplication
This is converted using scripts/coccinelle/misc/secs_to_jiffies.cocci with
the following Coccinelle rules:
Easwar Hariharan [Tue, 25 Feb 2025 20:17:15 +0000 (20:17 +0000)]
coccinelle: misc: secs_to_jiffies: Patch expressions too
Patch series "Converge on using secs_to_jiffies() part two", v3.
This is the second series that converts users of msecs_to_jiffies() that
either use the multiply pattern of either of:
- msecs_to_jiffies(N*1000) or
- msecs_to_jiffies(N*MSEC_PER_SEC)
where N is a constant or an expression, to avoid the multiplication.
The conversion is made with Coccinelle with the secs_to_jiffies() script
in scripts/coccinelle/misc. Attention is paid to what the best change can
be rather than restricting to what the tool provides.
This patch (of 15):
Teach the script to suggest conversions for timeout patterns where the
arguments to msecs_to_jiffies() are expressions as well.
Link: https://lkml.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-0-a43967e36c88@linux.microsoft.com Link: https://lkml.kernel.org/r/20250225-converge-secs-to-jiffies-part-two-v3-1-a43967e36c88@linux.microsoft.com Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Cc: Mark Brown <broonie@kernel.org> Cc: Carlos Maiolino <cem@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Damien Le Maol <dlemoal@kernel.org> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: David Sterba <dsterba@suse.com> Cc: Dick Kennedy <dick.kennedy@broadcom.com> Cc: Dongsheng Yang <dongsheng.yang@easystack.cn> Cc: Fabio Estevam <festevam@gmail.com> Cc: Frank Li <frank.li@nxp.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Ilpo Jarvinen <ilpo.jarvinen@linux.intel.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: James Bottomley <james.bottomley@HansenPartnership.com> Cc: James Smart <james.smart@broadcom.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Kalesh Anakkur Purayil <kalesh-anakkur.purayil@broadcom.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Leon Romanovsky <leon@kernel.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Nicolas Palix <nicolas.palix@imag.fr> Cc: Niklas Cassel <cassel@kernel.org> Cc: Oded Gabbay <ogabbay@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Sebastian Reichel <sre@kernel.org> Cc: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Shyam-sundar S-k <Shyam-sundar.S-k@amd.com> Cc: Takashi Iwai <tiwai@suse.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Brendan Jackman [Thu, 20 Feb 2025 12:23:40 +0000 (12:23 +0000)]
scripts/gdb: add $lx_per_cpu_ptr()
We currently have $lx_per_cpu() which works fine for stuff that kernel
code would access via per_cpu(). But this doesn't work for stuff that
kernel code accesses via per_cpu_ptr():
(gdb) p $lx_per_cpu(node_data[1].node_zones[2]->per_cpu_pageset)
Cannot access memory at address 0xffff11105fbd6c28
This is because we take the address of the pointer and use that as the
offset, instead of using the stored value.
Add a GDB version that mirrors the kernel API, which uses the pointer
value.
To be consistent with per_cpu_ptr(), we need to return the pointer value
instead of dereferencing it for the user. Therefore, move the existing
dereference out of the per_cpu() Python helper and do that only in the
$lx_per_cpu() implementation.
Kuan-Wei Chiu [Sat, 15 Feb 2025 16:56:18 +0000 (00:56 +0800)]
lib min_heap: use size_t for array size and index variables
Replace the int type with size_t for variables representing array sizes
and indices in the min-heap implementation. Using size_t aligns with
standard practices for size-related variables and avoids potential issues
on platforms where int may be insufficient to represent all valid sizes or
indices.
Ahmad Fatoum [Mon, 17 Feb 2025 20:39:52 +0000 (21:39 +0100)]
reboot: retire hw_protection_reboot and hw_protection_shutdown helpers
The hw_protection_reboot and hw_protection_shutdown functions mix
mechanism with policy: They let the driver requesting an emergency action
for hardware protection also decide how to deal with it.
This is inadequate in the general case as a driver reporting e.g. an
imminent power failure can't know whether a shutdown or a reboot would be
more appropriate for a given hardware platform.
With the addition of the hw_protection parameter, it's now possible to
configure at runtime the default emergency action and drivers are expected
to use hw_protection_trigger to have this parameter dictate policy.
As no current users of either hw_protection_shutdown or
hw_protection_shutdown helpers remain, remove them, as not to tempt driver
authors to call them.
Existing users now either defer to hw_protection_trigger or call
__hw_protection_trigger with a suitable argument directly when they have
inside knowledge on whether a reboot or shutdown would be more
appropriate.
Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-12-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org> Cc: Benson Leung <bleung@chromium.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Fabio Estevam <festevam@denx.de> Cc: Guenter Roeck <groeck@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matteo Croce <teknoraver@meta.com> Cc: Matti Vaittinen <mazziesaccount@gmail.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Rui Zhang <rui.zhang@intel.com> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ahmad Fatoum [Mon, 17 Feb 2025 20:39:51 +0000 (21:39 +0100)]
thermal: core: allow user configuration of hardware protection action
In the general case, we don't know which of system shutdown or reboot is
the better action to take to protect hardware in an emergency situation.
We thus allow the policy to come from the device-tree in the form of an
optional critical-action OF property, but so far there was no way for the
end user to configure this.
With recent addition of the hw_protection parameter, the user can now
choose a default action for the case, where the driver isn't fully sure
what's the better course of action.
Let's make use of this by passing HWPROT_ACT_DEFAULT in absence of the
critical-action OF property.
As HWPROT_ACT_DEFAULT is shutdown by default, this introduces no
functional change for users, unless they start using the new parameter.
Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-11-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org> Cc: Benson Leung <bleung@chromium.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Fabio Estevam <festevam@denx.de> Cc: Guenter Roeck <groeck@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matteo Croce <teknoraver@meta.com> Cc: Matti Vaittinen <mazziesaccount@gmail.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Rui Zhang <rui.zhang@intel.com> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ahmad Fatoum [Mon, 17 Feb 2025 20:39:50 +0000 (21:39 +0100)]
dt-bindings: thermal: give OS some leeway in absence of critical-action
An operating system may allow its user to configure the action to be
undertaken on critical overtemperature events.
However, the bindings currently mandate an absence of the critical-action
property to be equal to critical-action = "shutdown", which would mean any
differing user configuration would violate the bindings.
Resolve this by documenting the absence of the property to mean that the
OS gets to decide.
Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-10-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Acked-by: Rob Herring (Arm) <robh@kernel.org> Cc: Benson Leung <bleung@chromium.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Fabio Estevam <festevam@denx.de> Cc: Guenter Roeck <groeck@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matteo Croce <teknoraver@meta.com> Cc: Matti Vaittinen <mazziesaccount@gmail.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rui Zhang <rui.zhang@intel.com> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Tzung-Bi Shih <tzungbi@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ahmad Fatoum [Mon, 17 Feb 2025 20:39:49 +0000 (21:39 +0100)]
platform/chrome: cros_ec_lpc: prepare for hw_protection_shutdown removal
In the general case, a driver doesn't know which of system shutdown or
reboot is the better action to take to protect hardware in an emergency
situation. For this reason, hw_protection_shutdown is going to be removed
in favor of hw_protection_trigger, which defaults to shutdown, but may be
configured at kernel runtime to be a reboot instead.
The ChromeOS EC situation is different as we do know that shutdown is the
correct action as the EC is programmed to force reset after the short
period, thus replace hw_protection_shutdown with __hw_protection_trigger
with HWPROT_ACT_SHUTDOWN as argument to maintain the same behavior.
No functional change.
Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-9-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Acked-by: Tzung-Bi Shih <tzungbi@kernel.org> Cc: Benson Leung <bleung@chromium.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Fabio Estevam <festevam@denx.de> Cc: Guenter Roeck <groeck@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matteo Croce <teknoraver@meta.com> Cc: Matti Vaittinen <mazziesaccount@gmail.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Rui Zhang <rui.zhang@intel.com> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ahmad Fatoum [Mon, 17 Feb 2025 20:39:48 +0000 (21:39 +0100)]
regulator: allow user configuration of hardware protection action
When the core detects permanent regulator hardware failure or imminent
power failure of a critical supply, it will call hw_protection_shutdown in
an attempt to do a limited orderly shutdown followed by powering off the
system.
This doesn't work out well for many unattended embedded systems that don't
have support for shutdown and that power on automatically when power is
supplied:
- A brief power cycle gets detected by the driver
- The kernel powers down the system and SoC goes into shutdown mode
- Power is restored
- The system remains oblivious to the restored power
- System needs to be manually power cycled for a duration long enough
to drain the capacitors
Allow users to fix this by calling the newly introduced
hw_protection_trigger() instead: This way the hw_protection commandline or
sysfs parameter is used to dictate the policy of dealing with the
regulator fault.
Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-8-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org> Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com> Cc: Benson Leung <bleung@chromium.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Fabio Estevam <festevam@denx.de> Cc: Guenter Roeck <groeck@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matteo Croce <teknoraver@meta.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Rui Zhang <rui.zhang@intel.com> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ahmad Fatoum [Mon, 17 Feb 2025 20:39:47 +0000 (21:39 +0100)]
reboot: add support for configuring emergency hardware protection action
We currently leave the decision of whether to shutdown or reboot to
protect hardware in an emergency situation to the individual drivers.
This works out in some cases, where the driver detecting the critical
failure has inside knowledge: It binds to the system management controller
for example or is guided by hardware description that defines what to do.
In the general case, however, the driver detecting the issue can't know
what the appropriate course of action is and shouldn't be dictating the
policy of dealing with it.
Therefore, add a global hw_protection toggle that allows the user to
specify whether shutdown or reboot should be the default action when the
driver doesn't set policy.
This introduces no functional change yet as hw_protection_trigger() has no
callers, but these will be added in subsequent commits.
Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-7-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org> Cc: Benson Leung <bleung@chromium.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Fabio Estevam <festevam@denx.de> Cc: Guenter Roeck <groeck@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matteo Croce <teknoraver@meta.com> Cc: Matti Vaittinen <mazziesaccount@gmail.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Rui Zhang <rui.zhang@intel.com> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ahmad Fatoum [Mon, 17 Feb 2025 20:39:46 +0000 (21:39 +0100)]
reboot: indicate whether it is a HARDWARE PROTECTION reboot or shutdown
It currently depends on the caller, whether we attempt a hardware
protection shutdown (poweroff) or a reboot. A follow-up commit will make
this partially user-configurable, so it's a good idea to have the
emergency message clearly state whether the kernel is going for a reboot
or a shutdown.
Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-6-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org> Cc: Benson Leung <bleung@chromium.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Fabio Estevam <festevam@denx.de> Cc: Guenter Roeck <groeck@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matteo Croce <teknoraver@meta.com> Cc: Matti Vaittinen <mazziesaccount@gmail.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Rui Zhang <rui.zhang@intel.com> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ahmad Fatoum [Mon, 17 Feb 2025 20:39:45 +0000 (21:39 +0100)]
reboot: rename now misleading __hw_protection_shutdown symbols
The __hw_protection_shutdown function name has become misleading since it
can cause either a shutdown (poweroff) or a reboot depending on its
argument.
To avoid further confusion, let's rename it, so it doesn't suggest that a
poweroff is all it can do.
Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-5-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org> Cc: Benson Leung <bleung@chromium.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Fabio Estevam <festevam@denx.de> Cc: Guenter Roeck <groeck@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matteo Croce <teknoraver@meta.com> Cc: Matti Vaittinen <mazziesaccount@gmail.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Rui Zhang <rui.zhang@intel.com> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ahmad Fatoum [Mon, 17 Feb 2025 20:39:43 +0000 (21:39 +0100)]
docs: thermal: sync hardware protection doc with code
Originally, the thermal framework's only hardware protection action was to
trigger a shutdown. This has been changed a little over a year ago to
also support rebooting as alternative hardware protection action.
Update the documentation to reflect this.
Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-3-e1c09b090c0c@pengutronix.de Fixes: 62e79e38b257 ("thermal/thermal_of: Allow rebooting after critical temp") Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org> Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com> Cc: Benson Leung <bleung@chromium.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Fabio Estevam <festevam@denx.de> Cc: Guenter Roeck <groeck@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matteo Croce <teknoraver@meta.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Rui Zhang <rui.zhang@intel.com> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ahmad Fatoum [Mon, 17 Feb 2025 20:39:42 +0000 (21:39 +0100)]
reboot: reboot, not shutdown, on hw_protection_reboot timeout
hw_protection_shutdown() will kick off an orderly shutdown and if that
takes longer than a configurable amount of time, an emergency shutdown
will occur.
Recently, hw_protection_reboot() was added for those systems that don't
implement a proper shutdown and are better served by rebooting and having
the boot firmware worry about doing something about the critical
condition.
On timeout of the orderly reboot of hw_protection_reboot(), the system
would go into shutdown, instead of reboot. This is not a good idea, as
going into shutdown was explicitly not asked for.
Fix this by always doing an emergency reboot if hw_protection_reboot() is
called and the orderly reboot takes too long.
Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-2-e1c09b090c0c@pengutronix.de Fixes: 79fa723ba84c ("reboot: Introduce thermal_zone_device_critical_reboot()") Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org> Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com> Cc: Benson Leung <bleung@chromium.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Fabio Estevam <festevam@denx.de> Cc: Guenter Roeck <groeck@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Matteo Croce <teknoraver@meta.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Rui Zhang <rui.zhang@intel.com> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ahmad Fatoum [Mon, 17 Feb 2025 20:39:41 +0000 (21:39 +0100)]
reboot: replace __hw_protection_shutdown bool action parameter with an enum
Patch series "reboot: support runtime configuration of emergency
hw_protection action", v3.
We currently leave the decision of whether to shutdown or reboot to
protect hardware in an emergency situation to the individual drivers.
This works out in some cases, where the driver detecting the critical
failure has inside knowledge: It binds to the system management controller
for example or is guided by hardware description that defines what to do.
This is inadequate in the general case though as a driver reporting e.g.
an imminent power failure can't know whether a shutdown or a reboot would
be more appropriate for a given hardware platform.
To address this, this series adds a hw_protection kernel parameter and
sysfs toggle that can be used to change the action from the shutdown
default to reboot. A new hw_protection_trigger API then makes use of this
default action.
My particular use case is unattended embedded systems that don't have
support for shutdown and that power on automatically when power is
supplied:
- A brief power cycle gets detected by the driver
- The kernel powers down the system and SoC goes into shutdown mode
- Power is restored
- The system remains oblivious to the restored power
- System needs to be manually power cycled for a duration long enough
to drain the capacitors
With this series, such systems can configure the kernel with
hw_protection=reboot to have the boot firmware worry about critical
conditions.
This patch (of 12):
Currently __hw_protection_shutdown() either reboots or shuts down the
system according to its shutdown argument.
To make the logic easier to follow, both inside __hw_protection_shutdown
and at caller sites, lets replace the bool parameter with an enum.
This will be extra useful, when in a later commit, a third action is added
to the enumeration.
No functional change.
Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-0-e1c09b090c0c@pengutronix.de Link: https://lkml.kernel.org/r/20250217-hw_protection-reboot-v3-1-e1c09b090c0c@pengutronix.de Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org> Cc: Benson Leung <bleung@chromium.org> Cc: Mark Brown <broonie@kernel.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Fabio Estevam <festevam@denx.de> Cc: Guenter Roeck <groeck@chromium.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: Matteo Croce <teknoraver@meta.com> Cc: Matti Vaittinen <mazziesaccount@gmail.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Rui Zhang <rui.zhang@intel.com> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Thu, 13 Feb 2025 21:45:31 +0000 (21:45 +0000)]
ocfs2: remove reference to bh->b_page
Buffer heads are attached to folios, not to pages. Also
flush_dcache_page() is now deprecated in favour of flush_dcache_folio().
Link: https://lkml.kernel.org/r/20250213214533.2242224-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Mark Tinguely <mark.tinguely@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Thu, 13 Feb 2025 21:45:30 +0000 (21:45 +0000)]
ocfs2: use memcpy_to_folio() in ocfs2_symlink_get_block()
Replace use of kmap_atomic() with the higher-level construct
memcpy_to_folio(). This removes a use of b_page and supports large folios
as well as being easier to understand. It also removes the check for
kmap_atomic() failing (because it can't).
Link: https://lkml.kernel.org/r/20250213214533.2242224-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Tinguely <mark.tinguely@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Alban Kurti [Fri, 7 Feb 2025 18:39:06 +0000 (18:39 +0000)]
checkpatch: add warning for pr_* and dev_* macros without a trailing newline
Add a new check in scripts/checkpatch.pl to detect usage of pr_(level) and
dev_(level) macros (for both C and Rust) when the string literal does not
end with '\n'. Missing trailing newlines can lead to incomplete log lines
that do not appear properly in dmesg or in console output. To show an
example of this working after applying the patch we can run the script on
the commit that likely motivated this need/issue:
Also, the patch is able to handle correctly if there is a printing call
without a newline which then has a newline printed via pr_cont for both
Rust and C alike. If there is no newline printed and the patch ends or
there is another pr_* call before a newline with pr_cont is printed it
will show a warning. Not implemented for dev_cont because it is not clear
to me if that is used at all.
One false warning that will be generated due to this change is in case we
have a patch that modifies a `pr_* call without a newline` which has a
pr_cont with a newline following it. In this case there will be a warning
but because the patch does not include the following pr_cont it will warn
there is nothing creating a newline. I have modified the warning to be
softer due to this known problem.
I have tested with comments, whitespace, differen orders of pr_* calls and
pr_cont and the only case that I suspect to be a problem is the one
outlined above.
Link: https://lkml.kernel.org/r/20250207-checkpatch-newline2-v4-1-26d8e80d0059@invicto.ai Signed-off-by: Alban Kurti <kurti@invicto.ai> Suggested-by: Miguel Ojeda <ojeda@kernel.org> Closes: https://github.com/Rust-for-Linux/linux/issues/1140 Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@kernel.org> Cc: Andy Whitcroft <apw@canonical.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Gary Guo <gary@garyguo.net> Cc: Joe Perches <joe@perches.com> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Trevor Gross <tmgross@umich.edu> Cc: Charalampos Mitrodimas <charmitro@posteo.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sebastian Andrzej Siewior [Mon, 3 Feb 2025 15:05:25 +0000 (16:05 +0100)]
ucount: use rcuref_t for reference counting
Use rcuref_t for reference counting. This eliminates the cmpxchg loop in
the get and put path. This also eliminates the need to acquire the lock
in the put path because once the final user returns the reference, it can
no longer be obtained anymore.
Use rcuref_t for reference counting.
Link: https://lkml.kernel.org/r/20250203150525.456525-5-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai jiangshan <jiangshanlai@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mengen Sun <mengensun@tencent.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: YueHong Wu <yuehongwu@tencent.com> Cc: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sebastian Andrzej Siewior [Mon, 3 Feb 2025 15:05:24 +0000 (16:05 +0100)]
ucount: use RCU for ucounts lookups
The ucounts element is looked up under ucounts_lock. This can be
optimized by using RCU for a lockless lookup and return and element if the
reference can be obtained.
Replace hlist_head with hlist_nulls_head which is RCU compatible. Let
find_ucounts() search for the required item within a RCU section and
return the item if a reference could be obtained. This means
alloc_ucounts() will always return an element (unless the memory
allocation failed). Let put_ucounts() RCU free the element if the
reference counter dropped to zero.
Link: https://lkml.kernel.org/r/20250203150525.456525-4-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai jiangshan <jiangshanlai@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mengen Sun <mengensun@tencent.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: YueHong Wu <yuehongwu@tencent.com> Cc: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sebastian Andrzej Siewior [Mon, 3 Feb 2025 15:05:23 +0000 (16:05 +0100)]
ucount: replace get_ucounts_or_wrap() with atomic_inc_not_zero()
get_ucounts_or_wrap() increments the counter and if the counter is
negative then it decrements it again in order to reset the previous
increment. This statement can be replaced with atomic_inc_not_zero() to
only increment the counter if it is not yet 0.
This simplifies the get function because the put (if the get failed) can
be removed. atomic_inc_not_zero() is implement as a cmpxchg() loop which
can be repeated several times if another get/put is performed in parallel.
This will be optimized later.
Increment the reference counter only if not yet dropped to zero.
Link: https://lkml.kernel.org/r/20250203150525.456525-3-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai jiangshan <jiangshanlai@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mengen Sun <mengensun@tencent.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: YueHong Wu <yuehongwu@tencent.com> Cc: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sebastian Andrzej Siewior [Mon, 3 Feb 2025 15:05:22 +0000 (16:05 +0100)]
rcu: provide a static initializer for hlist_nulls_head
Patch series "ucount: Simplify refcounting with rcuref_t".
I noticed that the atomic_dec_and_lock_irqsave() in put_ucounts() loops
sometimes even during boot. Something like 2-3 iterations but still.
This series replaces the refcounting with rcuref_t and adds a RCU lookup.
This allows a lockless lookup in alloc_ucounts() if the entry is available
and a cmpxchg()less put of the item.
This patch (of 4):
Provide a static initializer for hlist_nulls_head so that it can be used
in statically defined data structures.
Link: https://lkml.kernel.org/r/20250203150525.456525-1-bigeasy@linutronix.de Link: https://lkml.kernel.org/r/20250203150525.456525-2-bigeasy@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai jiangshan <jiangshanlai@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mengen Sun <mengensun@tencent.com> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: YueHong Wu <yuehongwu@tencent.com> Cc: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yury Norov [Wed, 5 Feb 2025 21:29:32 +0000 (16:29 -0500)]
lib/zlib: drop EQUAL macro
The macro is prehistoric, and only exists to help those readers who don't
know what memcmp() returns if memory areas differ. This is pretty well
documented, so the macro looks excessive.
Now that the only user of the macro depends on DEBUG_ZLIB config, GCC
warns about unused macro if the library is built with W=2 against
defconfig. So drop it for good.
Vlastimil Babka [Tue, 11 Feb 2025 15:16:11 +0000 (16:16 +0100)]
get_maintainer: add --substatus for reporting subsystem status - fix
The automatically enabled --substatus can break existing scripts that do
not disable --rolestats. Require that script output goes to a terminal to
enable it automatically.
Vlastimil Babka [Mon, 3 Feb 2025 11:13:16 +0000 (12:13 +0100)]
get_maintainer: add --substatus for reporting subsystem status
Patch series "get_maintainer: report subsystem status separately", v2.
The subsystem status (S: field) can inform a patch submitter if the
subsystem is well maintained or e.g. maintainers are missing. In
get_maintainer, it is currently reported with --role(stats) by adjusting
the maintainer role for any status different from Maintained. This has
two downsides:
- if a subsystem has only reviewers or mailing lists and no maintainers,
the status is not reported. For example Orphan subsystems typically
have no maintainers so there's nobody to report as orphan minder.
- the Supported status means that someone is paid for maintaining, but
it is reported as "supporter" for all the maintainers, which can be
incorrect (only some of them may be paid). People (including myself)
have been also confused about what "supporter" means.
The second point has been brought up in 2022 and the discussion in the end
resulted in adjusting documentation only [1]. I however agree with Ted's
points that it's misleading to take the subsystem status and apply it to
all maintainers [2].
The attempt to modify get_maintainer output was retracted after Joe
objected that the status becomes not reported at all [3]. This series
addresses that concern by reporting the status (unless it's the most
common Maintained one) on separate lines that follow the reported emails,
using a new --substatus parameter. Care is taken to reduce the noise to
minimum by not reporting the most common Maintained status, by default
require no opt-in that would need the users to discover the new parameter,
and at the same time not to break existing git --cc-cmd usage.
The subsystem status is currently reported with --role(stats) by adjusting
the maintainer role for any status different from Maintained. This has
two downsides:
- if a subsystem has only reviewers or mailing lists and no maintainers,
the status is not reported (i.e. typically, Orphan subsystems have no
maintainers)
- the Supported status means that someone is paid for maintaining, but
it is reported as "supporter" for all the maintainers, which can be
incorrect. People have been also confused about what "supporter"
means.
This patch introduces a new --substatus option and functionality aimed to
report the subsystem status separately, without adjusting the reported
maintainer role. After the e-mails are output, the status of subsystems
will follow, for example:
Sourabh Jain [Fri, 31 Jan 2025 11:38:30 +0000 (17:08 +0530)]
powerpc/crash: use generic crashkernel reservation
Commit 0ab97169aa05 ("crash_core: add generic function to do reservation")
added a generic function to reserve crashkernel memory. So let's use the
same function on powerpc and remove the architecture-specific code that
essentially does the same thing.
The generic crashkernel reservation also provides a way to split the
crashkernel reservation into high and low memory reservations, which can
be enabled for powerpc in the future.
Along with moving to the generic crashkernel reservation, the code related
to finding the base address for the crashkernel has been separated into
its own function name get_crash_base() for better readability and
maintainability.
Link: https://lkml.kernel.org/r/20250131113830.925179-8-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Cc: Baoquan he <bhe@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sourabh Jain [Fri, 31 Jan 2025 11:38:29 +0000 (17:08 +0530)]
powerpc: insert System RAM resource to prevent crashkernel conflict
The next patch in the series with title "powerpc/crash: use generic
crashkernel reservation" enables powerpc to use generic crashkernel
reservation instead of custom implementation. This leads to exporting of
`Crash Kernel` memory to iomem_resource (/proc/iomem) via
insert_crashkernel_resources():kernel/crash_reserve.c or at another place
in the same file if HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY is set.
The add_system_ram_resources():arch/powerpc/mm/mem.c adds `System RAM` to
iomem_resource using request_resource(). This creates a conflict with
`Crash Kernel`, which is added by the generic crashkernel reservation
code. As a result, the kernel ultimately fails to add `System RAM` to
iomem_resource. Consequently, it does not appear in /proc/iomem.
There are multiple approches tried to avoid this:
1. Don't add Crash Kernel to iomem_resource:
- This has two issues.
First, it requires adding an architecture-specific hook in the
generic code. There are already two code paths to choose when to
add `Crash Kernel` to iomem_resource. This adds one more code path
to skip it.
Second, what if `Crash Kernel` is required in /proc/iomem in the
future? Many architectures do export it.
2. Don't add `System RAM` to iomem_resource by reverting commit c40dd2f766440 ("powerpc: Add System RAM to /proc/iomem"):
- It's not ideal to export `System RAM` via /proc/iomem, but since
it already done ealier and userspace tools like kdump and
kdump-utils rely on `System RAM` from /proc/iomem, removing it
will break userspace.
3. Add Crash Kernel along with System RAM to /proc/iomem:
This patch takes the third approach by updating add_system_ram_resources()
to use insert_resource() instead of the request_resource() API to add the
`System RAM` resource to iomem_resource. insert_resource() allows
inserting resources even if they overlap with existing ones. Since `Crash
Kernel` and `System RAM` resources are added to iomem_resource early in
the boot, any other conflict is not expected.
With the changes introduced here and in the next patch, "powerpc/crash:
use generic crashkernel reservation," /proc/iomem now exports `System RAM`
and `Crash Kernel` as shown below:
Commit 59d58189f3d9 ("crash: fix crash memory reserve exceed system memory
bug") fails crashkernel parsing if the crash size is found to be higher
than system RAM, which makes the memory_limit adjustment code ineffective
due to an early exit from reserve_crashkernel().
Regardless lets not violate the user-specified memory limit by adjusting
it. Remove this adjustment to ensure all reservations stay within the
limit. Commit f94f5ac07983 ("powerpc/fadump: Don't update the
user-specified memory limit") did the same for fadump.
Link: https://lkml.kernel.org/r/20250131113830.925179-6-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Cc: Baoquan he <bhe@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sourabh Jain [Fri, 31 Jan 2025 11:38:27 +0000 (17:08 +0530)]
powerpc/crash: use generic APIs to locate memory hole for kdump
On PowerPC, the memory reserved for the crashkernel can contain components
like RTAS, TCE, OPAL, etc., which should be avoided when loading kexec
segments into crashkernel memory. Due to these special components,
PowerPC has its own set of APIs to locate holes in the crashkernel memory
for loading kexec segments for kdump. However, for loading kexec segments
in the kexec case, PowerPC already uses generic APIs to locate holes.
The previous patch in this series, titled "crash: Let arch decide usable
memory range in reserved area," introduced arch-specific hook to handle
such special regions in the crashkernel area. So, switch PowerPC to use
the generic APIs to locate memory holes for kdump and remove the redundant
PowerPC-specific APIs.
Link: https://lkml.kernel.org/r/20250131113830.925179-5-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Baoquan he <bhe@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sourabh Jain [Fri, 31 Jan 2025 11:38:26 +0000 (17:08 +0530)]
crash: let arch decide usable memory range in reserved area
Although the crashkernel area is reserved, on architectures like PowerPC,
it is possible for the crashkernel reserved area to contain components
like RTAS, TCE, OPAL, etc. To avoid placing kexec segments over these
components, PowerPC has its own set of APIs to locate holes in the
crashkernel reserved area.
Add an arch hook in the generic locate mem hole APIs so that architectures
can handle such special regions in the crashkernel area while locating
memory holes for kexec segments using generic APIs. With this, a lot of
redundant arch-specific code can be removed, as it performs the exact same
job as the generic APIs.
To keep the generic and arch-specific changes separate, the changes
related to moving PowerPC to use the generic APIs and the removal of
PowerPC-specific APIs for memory hole allocation are done in a subsequent
patch titled "powerpc/crash: Use generic APIs to locate memory hole for
kdump.
Link: https://lkml.kernel.org/r/20250131113830.925179-4-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sourabh Jain [Fri, 31 Jan 2025 11:38:24 +0000 (17:08 +0530)]
kexec: initialize ELF lowest address to ULONG_MAX
Patch series "powerpc/crash: use generic crashkernel reservation", v3.
Commit 0ab97169aa05 ("crash_core: add generic function to do reservation")
added a generic function to reserve crashkernel memory. So let's use the
same function on powerpc and remove the architecture-specific code that
essentially does the same thing.
The generic crashkernel reservation also provides a way to split the
crashkernel reservation into high and low memory reservations, which can
be enabled for powerpc in the future.
Additionally move powerpc to use generic APIs to locate memory hole for
kexec segments while loading kdump kernel.
This patch (of 7):
kexec_elf_load() loads an ELF executable and sets the address of the
lowest PT_LOAD section to the address held by the lowest_load_addr
function argument.
To determine the lowest PT_LOAD address, a local variable lowest_addr
(type unsigned long) is initialized to UINT_MAX. After loading each
PT_LOAD, its address is compared to lowest_addr. If a loaded PT_LOAD
address is lower, lowest_addr is updated. However, setting lowest_addr to
UINT_MAX won't work when the kernel image is loaded above 4G, as the
returned lowest PT_LOAD address would be invalid. This is resolved by
initializing lowest_addr to ULONG_MAX instead.
This issue was discovered while implementing crashkernel high/low
reservation on the PowerPC architecture.
I Hsin Cheng [Sun, 19 Jan 2025 06:24:08 +0000 (14:24 +0800)]
lib/plist.c: add shortcut for plist_requeue()
In the operation of plist_requeue(), "node" is deleted from the list
before queueing it back to the list again, which involves looping to find
the tail of same-prio entries.
If "node" is the head of same-prio entries which means its prio_list is on
the priority list, then "node_next" can be retrieve immediately by the
next entry of prio_list, instead of looping nodes on node_list.
The shortcut implementation can benefit plist_requeue() running the below
test, and the test result is shown in the following table.
One can observe from the test result that when the number of nodes of
same-prio entries is smaller, then the probability of hitting the shortcut
can be bigger, thus the benefit can be more significant.
While it tends to behave almost the same for long same-prio entries, since
the probability of taking the shortcut is much smaller.
The test is done on x86_64 architecture with v6.9 kernel and
Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz.
Test script( executed in kernel module mode ):
int init_module(void)
{
unsigned int test_data[test_size];
/* Split the list into 10 different priority
* , when test_size is larger, the number of
* nodes within each priority is larger.
*/
for (i = 0; i < ARRAY_SIZE(test_data); i++) {
test_data[i] = i % 10;
}
for (i = 0; i < ARRAY_SIZE(test_node_local); i++) {
plist_node_init(test_node_local + i, 0);
test_node_local[i].prio = test_data[i];
}
for (i = 0; i < ARRAY_SIZE(test_node_local); i++) {
if (plist_node_empty(test_node_local + i)) {
plist_add(test_node_local + i, &test_head_local);
}
}
for (i = 0; i < ARRAY_SIZE(test_node_local); i += 1) {
start = ktime_get();
plist_requeue(test_node_local + i, &test_head_local);
end = ktime_get();
time_elapsed += (end - start);
}
Add a paragraph explaining what sort of capabilities a process would need
to read procfs data for some other process. Also mention that reading
data for its own process doesn't require any extra permissions.
Guilherme G. Piccoli [Mon, 20 Jan 2025 19:04:26 +0000 (16:04 -0300)]
scripts: add script to extract built-in firmware blobs
There is currently no tool to extract a firmware blob that is built-in
on vmlinux to the best of my knowledge. So if we have a kernel image
containing the blobs, and we want to rebuild the kernel with some debug
patches for example (and given that the image also has IKCONFIG=y), we
currently can't do that for the same versions for all the firmware
blobs, _unless_ we have exact commits of linux-firmware for the
specific versions for each firmware included.
Through the options CONFIG_EXTRA_FIRMWARE{_DIR} one is able to build a
kernel including firmware blobs in a built-in fashion. This is usually
the case of built-in drivers that require some blobs in order to work
properly, for example, like in non-initrd based systems.
Add hereby a script to extract these blobs from a non-stripped vmlinux,
similar to the idea of "extract-ikconfig". The firmware loader interface
saves such built-in blobs as rodata entries, having a field for the FW
name as "_fw_<module_name>_<firmware_name>_bin"; the tool extracts files
named "<module_name>_<firmware_name>" for each rodata firmware entry
detected. It makes use of awk, bash, dd and readelf, pretty standard
tooling for Linux development.
With this tool, we can blindly extract the FWs and easily re-add them
in the new debug kernel build, allowing a more deterministic testing
without the burden of "hunting down" the proper version of each
firmware binary.
Link: https://lkml.kernel.org/r/20250120190436.127578-1-gpiccoli@igalia.com Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Reviewed-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Russ Weight <russ.weight@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yang Yang [Fri, 17 Jan 2025 14:20:13 +0000 (22:20 +0800)]
MAINTAINERS: add Yang Yang as a co-maintainer of PER-TASK DELAY ACCOUNTING
Balbir Singh is the unique maintainer of PER-TASK DELAY ACCOUNTING, and he
had started work on cgroupstats a long time back, this subsystem then is
not growing at a very rapid pace. With their excellent work delay
accounting is still very useful for observing and optimizing system delay,
but still needs continuous improvement. Yang Yang with his team had
worked for most of the recent patches of the subsystem, and he has a
strong willing to help, Balbir Singh is glad to see that, so add him as a
co-maintainer.
Andrii Nakryiko [Mon, 27 Jan 2025 22:21:14 +0000 (14:21 -0800)]
mm,procfs: allow read-only remote mm access under CAP_PERFMON
It's very common for various tracing and profiling toolis to need to
access /proc/PID/maps contents for stack symbolization needs to learn
which shared libraries are mapped in memory, at which file offset, etc.
Currently, access to /proc/PID/maps requires CAP_SYS_PTRACE (unless we are
looking at data for our own process, which is a trivial case not too
relevant for profilers use cases).
Unfortunately, CAP_SYS_PTRACE implies way more than just ability to
discover memory layout of another process: it allows to fully control
arbitrary other processes. This is problematic from security POV for
applications that only need read-only /proc/PID/maps (and other similar
read-only data) access, and in large production settings CAP_SYS_PTRACE is
frowned upon even for the system-wide profilers.
On the other hand, it's already possible to access similar kind of
information (and more) with just CAP_PERFMON capability. E.g., setting up
PERF_RECORD_MMAP collection through perf_event_open() would give one
similar information to what /proc/PID/maps provides.
CAP_PERFMON, together with CAP_BPF, is already a very common combination
for system-wide profiling and observability application. As such, it's
reasonable and convenient to be able to access /proc/PID/maps with
CAP_PERFMON capabilities instead of CAP_SYS_PTRACE.
For procfs, these permissions are checked through common mm_access()
helper, and so we augment that with cap_perfmon() check *only* if
requested mode is PTRACE_MODE_READ. I.e., PTRACE_MODE_ATTACH wouldn't be
permitted by CAP_PERFMON. So /proc/PID/mem, which uses
PTRACE_MODE_ATTACH, won't be permitted by CAP_PERFMON, but /proc/PID/maps,
/proc/PID/environ, and a bunch of other read-only contents will be
allowable under CAP_PERFMON.
Besides procfs itself, mm_access() is used by process_madvise() and
process_vm_{readv,writev}() syscalls. The former one uses
PTRACE_MODE_READ to avoid leaking ASLR metadata, and as such CAP_PERFMON
seems like a meaningful allowable capability as well.
process_vm_{readv,writev} currently assume PTRACE_MODE_ATTACH level of
permissions (though for readv PTRACE_MODE_READ seems more reasonable, but
that's outside the scope of this change), and as such won't be affected by
this patch.
As has_unmovable_pages() call folio_hstate() without hugetlb_lock, there
is a race to free the HugeTLB page between PageHuge() and folio_hstate().
There is no need to add hugetlb_lock here as the HugeTLB page can be freed
in lot of places. So it's enough to unfold folio_hstate() and add a check
to avoid NULL pointer dereference for hugepage_migration_supported().
Link: https://lkml.kernel.org/r/20250122061151.578768-1-liushixin2@huawei.com Fixes: 464c7ffbcb16 ("mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yu Zhao [Wed, 8 Jan 2025 07:48:21 +0000 (00:48 -0700)]
mm/hugetlb_vmemmap: fix memory loads ordering
Using x86_64 as an example, for a 32KB struct page[] area describing a 2MB
hugeTLB, HVO reduces the area to 4KB by the following steps:
1. Split the (r/w vmemmap) PMD mapping the area into 512 (r/w) PTEs;
2. For the 8 PTEs mapping the area, remap PTE 1-7 to the page mapped
by PTE 0, and at the same time change the permission from r/w to
r/o;
3. Free the pages PTE 1-7 used to map, hence the reduction from 32KB
to 4KB.
However, the following race can happen due to improperly memory loads
ordering:
CPU 1 (HVO) CPU 2 (speculative PFN walker)
atomic_add_unless(&page->_refcount)
XXX: try to modify r/o struct page[]
Specifically, page_is_fake_head() must be ordered after page_ref_count()
on CPU 2 so that it can only return true for this case, to avoid the later
attempt to modify r/o struct page[].
This patch adds the missing memory barrier and makes the tests on
page_is_fake_head() and page_ref_count() done in the proper order.
Link: https://lkml.kernel.org/r/20250108074822.722696-1-yuzhao@google.com Fixes: bd225530a4c7 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers") Signed-off-by: Yu Zhao <yuzhao@google.com> Reported-by: Will Deacon <will@kernel.org> Closes: https://lore.kernel.org/20241128142028.GA3506@willie-the-truck/ Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Acked-by: Will Deacon <will@kernel.org> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jinjiang Tu [Wed, 12 Mar 2025 08:47:05 +0000 (16:47 +0800)]
mm/contig_alloc: fix alloc_contig_range when __GFP_COMP and order < MAX_ORDER
When calling alloc_contig_range() with __GFP_COMP and the order of
requested pfn range is pageblock_order, less than MAX_ORDER, I triggered
WARNING as follows:
alloc_contig_range() marks pageblocks of the requested pfn range to be
isolated, migrate these pages if they are in use and will be freed to
MIGRATE_ISOLATED freelist.
Suppose two alloc_contig_range() calls at the same time and the requested
pfn range are [0x80280000, 0x80280200) and [0x80280200, 0x80280400)
respectively. Suppose the two memory range are in use, then
alloc_contig_range() will migrate and free these pages to MIGRATE_ISOLATED
freelist. __free_one_page() will merge MIGRATE_ISOLATE buddy to larger
buddy, resulting in a MAX_ORDER buddy. Finally, find_large_buddy() in
alloc_contig_range() returns a MAX_ORDER buddy and results in WARNING.
To fix it, call free_contig_range() to free the excess pfn range.
Link: https://lkml.kernel.org/r/20250312084705.2938220-1-tujinjiang@huawei.com Fixes: e98337d11bbd ("mm/contig_alloc: support __GFP_COMP") Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Peter Xu [Wed, 12 Mar 2025 14:51:31 +0000 (10:51 -0400)]
mm/userfaultfd: fix release hang over concurrent GUP
This patch should fix a possible userfaultfd release() hang during
concurrent GUP.
This problem was initially reported by Dimitris Siakavaras in July 2023
[1] in a firecracker use case. Firecracker has a separate process
handling page faults remotely, and when the process releases the
userfaultfd it can race with a concurrent GUP from KVM trying to fault in
a guest page during the secondary MMU page fault process.
A similar problem was reported recently again by Jinjiang Tu in March 2025
[2], even though the race happened this time with a mlockall() operation,
which does GUP in a similar fashion.
In 2017, commit 656710a60e36 ("userfaultfd: non-cooperative: closing the
uffd without triggering SIGBUS") was trying to fix this issue. AFAIU,
that fixes well the fault paths but may not work yet for GUP. In GUP, the
issue is NOPAGE will be almost treated the same as "page fault resolved"
in faultin_page(), then the GUP will follow page again, seeing page
missing, and it'll keep going into a live lock situation as reported.
This change makes core mm return RETRY instead of NOPAGE for both the GUP
and fault paths, proactively releasing the mmap read lock. This should
guarantee the other release thread make progress on taking the write lock
and avoid the live lock even for GUP.
When at it, rearrange the comments to make sure it's uptodate.
Kirill A. Shutemov [Mon, 10 Mar 2025 08:28:55 +0000 (10:28 +0200)]
mm/page_alloc: fix memory accept before watermarks gets initialized
Watermarks are initialized during the postcore initcall. Until then, all
watermarks are set to zero. This causes cond_accept_memory() to
incorrectly skip memory acceptance because a watermark of 0 is always met.
This can lead to a premature OOM on boot.
To ensure progress, accept one MAX_ORDER page if the watermark is zero.
Link: https://lkml.kernel.org/r/20250310082855.2587122-1-kirill.shutemov@linux.intel.com Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Farrah Chen <farrah.chen@intel.com> Reported-by: Farrah Chen <farrah.chen@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Cc: Ashish Kalra <ashish.kalra@amd.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: <stable@vger.kernel.org> [6.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Wilcox (Oracle) [Mon, 10 Mar 2025 14:35:24 +0000 (14:35 +0000)]
mm: decline to manipulate the refcount on a slab page
Slab pages now have a refcount of 0, so nobody should be trying to
manipulate the refcount on them. Doing so has little effect; the object
could be freed and reallocated to a different purpose, although the slab
itself would not be until the refcount was put making it behave rather
like TYPESAFE_BY_RCU.
Unfortunately, __iov_iter_get_pages_alloc() does take a refcount. Fix
that to not change the refcount, and make put_page() silently not change
the refcount. get_page() warns so that we can fix any other callers that
need to be changed.
Long-term, networking needs to stop taking a refcount on the pages that it
uses and rely on the caller to hold whatever references are necessary to
make the memory stable. In the medium term, more page types are going to
hav a zero refcount, so we'll want to move get_page() and put_page() out
of line.
Link: https://lkml.kernel.org/r/20250310143544.1216127-1-willy@infradead.org Fixes: 9aec2fb0fd5e (slab: allocate frozen pages) Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: Hannes Reinecke <hare@suse.de> Closes: https://lore.kernel.org/all/08c29e4b-2f71-4b6d-8046-27e407214d8c@suse.com/ Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Shakeel Butt [Mon, 10 Mar 2025 23:09:34 +0000 (16:09 -0700)]
memcg: drain obj stock on cpu hotplug teardown
Currently on cpu hotplug teardown, only memcg stock is drained but we
need to drain the obj stock as well otherwise we will miss the stats
accumulated on the target cpu as well as the nr_bytes cached. The stats
include MEMCG_KMEM, NR_SLAB_RECLAIMABLE_B & NR_SLAB_UNRECLAIMABLE_B. In
addition we are leaking reference to struct obj_cgroup object.
Link: https://lkml.kernel.org/r/20250310230934.2913113-1-shakeel.butt@linux.dev Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API") Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Zi Yan [Mon, 10 Mar 2025 15:57:27 +0000 (11:57 -0400)]
mm/huge_memory: drop beyond-EOF folios with the right number of refs
When an after-split folio is large and needs to be dropped due to EOF,
folio_put_refs(folio, folio_nr_pages(folio)) should be used to drop all
page cache refs. Otherwise, the folio will not be freed, causing memory
leak.
This leak would happen on a filesystem with blocksize > page_size and a
truncate is performed, where the blocksize makes folios split to >0 order
ones, causing truncated folios not being freed.
Link: https://lkml.kernel.org/r/20250310155727.472846-1-ziy@nvidia.com Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Signed-off-by: Zi Yan <ziy@nvidia.com> Reported-by: Hugh Dickins <hughd@google.com> Closes: https://lore.kernel.org/all/fcbadb7f-dd3e-21df-f9a7-2853b53183c4@google.com/ Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We noticed that uffd-stress test was always failing to run when invoked
for the hugetlb profiles on x86_64 systems with a processor count of 64 or
bigger:
The problem boils down to how run_vmtests.sh (mis)calculates the size of
the region it feeds to uffd-stress. The latter expects to see an amount
of MiB while the former is just giving out the number of free hugepages
halved down. This measurement discrepancy ends up violating uffd-stress'
assertion on number of hugetlb pages allocated per CPU, causing it to bail
out with the error above.
This commit fixes that issue by adjusting run_vmtests.sh's
half_ufd_size_MB calculation so it properly renders the region size in
MiB, as expected, while maintaining all of its original constraints in
place.
Link: https://lkml.kernel.org/r/20250218192251.53243-1-aquini@redhat.com Fixes: 2e47a445d7b3 ("selftests/mm: run_vmtests.sh: fix hugetlb mem size calculation") Signed-off-by: Rafael Aquini <raquini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Raphael S. Carvalho [Mon, 24 Feb 2025 14:37:00 +0000 (11:37 -0300)]
mm: fix error handling in __filemap_get_folio() with FGP_NOWAIT
original report:
https://lore.kernel.org/all/CAKhLTr1UL3ePTpYjXOx2AJfNk8Ku2EdcEfu+CH1sf3Asr=B-Dw@mail.gmail.com/T/
When doing buffered writes with FGP_NOWAIT, under memory pressure, the
system returned ENOMEM despite there being plenty of available memory, to
be reclaimed from page cache. The user space used io_uring interface,
which in turn submits I/O with FGP_NOWAIT (the fast path).
This is likely a regression caused by 66dabbb65d67 ("mm: return an ERR_PTR
from __filemap_get_folio"), which moved error handling from
io_map_get_folio() to __filemap_get_folio(), but broke FGP_NOWAIT
handling, so ENOMEM is being escaped to user space. Had it correctly
returned -EAGAIN with NOWAIT, either io_uring or user space itself would
be able to retry the request.
It's not enough to patch io_uring since the iomap interface is the one
responsible for it, and pwritev2(RWF_NOWAIT) and AIO interfaces must
return the proper error too.
The patch was tested with scylladb test suite (its original reproducer),
and the tests all pass now when memory is pressured.
Link: https://lkml.kernel.org/r/20250224143700.23035-1-raphaelsc@scylladb.com Fixes: 66dabbb65d67 ("mm: return an ERR_PTR from __filemap_get_folio") Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Muchun Song [Thu, 6 Mar 2025 02:31:33 +0000 (10:31 +0800)]
mm: memcontrol: fix swap counter leak from offline cgroup
Commit 6769183166b3 removed the parameter of id from swap_cgroup_record()
and get the memcg id from mem_cgroup_id(folio_memcg(folio)). However, the
caller of it may update a different memcg's counter instead of
folio_memcg(folio).
E.g. in the caller of mem_cgroup_swapout(), @swap_memcg could be
different with @memcg and update the counter of @swap_memcg, but
swap_cgroup_record() records the wrong memcg's ID. When it is uncharged
from __mem_cgroup_uncharge_swap(), the swap counter will leak since the
wrong recorded ID.
Fix it by bringing the parameter of id back.
Link: https://lkml.kernel.org/r/20250306023133.44838-1-songmuchun@bytedance.com Fixes: 6769183166b3 ("mm/swap_cgroup: decouple swap cgroup recording and clearing") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Kairui Song <kasong@tencent.com> Cc: Chris Li <chrisl@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>