]> www.infradead.org Git - users/hch/uuid.git/log
users/hch/uuid.git
6 years agopowerpc/fadump: remove RMA_START and RMA_END macros
Hari Bathini [Wed, 11 Sep 2019 14:57:26 +0000 (20:27 +0530)]
powerpc/fadump: remove RMA_START and RMA_END macros

RMA_START is defined as '0' and there is even a BUILD_BUG_ON() to
make sure it is never anything else. Remove this macro and use '0'
instead as code change is needed anyway when it has to be something
else. Also, remove unused RMA_END macro.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821384096.5656.15026984053970204652.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: update documentation about option to release opalcore
Hari Bathini [Wed, 11 Sep 2019 14:57:12 +0000 (20:27 +0530)]
powerpc/fadump: update documentation about option to release opalcore

With /sys/firmware/opal/core support available on OPAL based machines
and an option to the release memory used by kernel in exporting this
core file, update FADump documentation with these details.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821382786.5656.13173494907671241231.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: consider f/w load area
Hari Bathini [Wed, 11 Sep 2019 14:56:59 +0000 (20:26 +0530)]
powerpc/fadump: consider f/w load area

OPAL loads kernel & initrd at 512MB offset (256MB size), also exported
as ibm,opal/dump/fw-load-area. So, if boot memory size of FADump is
less than 768MB, kernel memory to be exported as '/proc/vmcore' would
be overwritten by f/w while loading kernel & initrd. To avoid such a
scenario, enforce a minimum boot memory size of 768MB on OPAL platform
and skip using FADump if a newer F/W version loads kernel & initrd
above 768MB.

Also, irrespective of RMA size, set the minimum boot memory size
expected on pseries platform at 320MB. This is to avoid inflating the
minimum memory requirements on systems with 512M/1024M RMA size.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821381414.5656.1592867278535469652.stgit@hbathini.in.ibm.com
6 years agopowerpc/opalcore: provide an option to invalidate /sys/firmware/opal/core file
Hari Bathini [Wed, 11 Sep 2019 14:56:45 +0000 (20:26 +0530)]
powerpc/opalcore: provide an option to invalidate /sys/firmware/opal/core file

Writing '1' to /sys/kernel/fadump_release_opalcore would release the
memory held by kernel in exporting /sys/firmware/opal/core file.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821380161.5656.17827032108471421830.stgit@hbathini.in.ibm.com
6 years agopowerpc/opalcore: export /sys/firmware/opal/core for analysing opal crashes
Hari Bathini [Wed, 11 Sep 2019 14:56:33 +0000 (20:26 +0530)]
powerpc/opalcore: export /sys/firmware/opal/core for analysing opal crashes

Export /sys/firmware/opal/core file to analyze opal crashes. Since OPAL
core can be generated independent of CONFIG_FA_DUMP support in kernel,
add this support under a new kernel config option CONFIG_OPAL_CORE.
Also, avoid code duplication by moving common code used while exporting
/proc/vmcore and/or /sys/firmware/opal/core file(s).

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821378503.5656.3693769384945087756.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP
Hari Bathini [Wed, 11 Sep 2019 14:56:16 +0000 (20:26 +0530)]
powerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP

Kernel config option CONFIG_PRESERVE_FA_DUMP is introduced to ensure
crash data, from previously crash'ed kernel, is preserved. Update
documentation with this details.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821377195.5656.15840065629705958324.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: add support to preserve crash data on FADUMP disabled kernel
Hari Bathini [Wed, 11 Sep 2019 14:56:03 +0000 (20:26 +0530)]
powerpc/fadump: add support to preserve crash data on FADUMP disabled kernel

Add a new kernel config option, CONFIG_PRESERVE_FA_DUMP that ensures
that crash data, from previously crash'ed kernel, is preserved. This
helps in cases where FADump is not enabled but the subsequent memory
preserving kernel boot is likely to process this crash data. One
typical usecase for this config option is petitboot kernel.

As OPAL allows registering address with it in the first kernel and
retrieving it after MPIPL, use it to store the top of boot memory.
A kernel that intends to preserve crash data retrieves it and avoids
using memory beyond this address.

Move arch_reserved_kernel_pages() function as it is needed for both
FA_DUMP and PRESERVE_FA_DUMP configurations.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821375751.5656.11459483669542541602.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: improve how crashed kernel's memory is reserved
Hari Bathini [Wed, 11 Sep 2019 14:55:49 +0000 (20:25 +0530)]
powerpc/fadump: improve how crashed kernel's memory is reserved

The size parameter to fadump_reserve_crash_area() function is not needed
as all the memory above boot memory size must be preserved anyway. Update
the function by dropping this redundant parameter.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821374440.5656.2945512543806951766.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: consider reserved ranges while releasing memory
Hari Bathini [Wed, 11 Sep 2019 14:55:36 +0000 (20:25 +0530)]
powerpc/fadump: consider reserved ranges while releasing memory

Commit 0962e8004e97 ("powerpc/prom: Scan reserved-ranges node for
memory reservations") enabled support to parse 'reserved-ranges' DT
node to reserve kernel memory falling in these ranges for firmware
purposes. Along with the preserved area memory, ensure memory in
reserved ranges is not overlapped with memory released by capture
kernel aftering saving vmcore. Also, fix the off-by-one error in
fadump_release_reserved_area function while releasing memory.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821371358.5656.6061214942558818661.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: make crash memory ranges array allocation generic
Hari Bathini [Wed, 11 Sep 2019 14:55:05 +0000 (20:25 +0530)]
powerpc/fadump: make crash memory ranges array allocation generic

Make allocate_crash_memory_ranges() and free_crash_memory_ranges()
functions generic to reuse them for memory management of all types of
dynamic memory range arrays. This change helps in memory management
of reserved ranges array to be added later.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821369863.5656.4375667005352155892.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: process architected register state data provided by firmware
Hari Bathini [Wed, 11 Sep 2019 14:54:50 +0000 (20:24 +0530)]
powerpc/fadump: process architected register state data provided by firmware

Firmware provides architected register state data at the time of crash.
Process this data and build CPU notes to append to ELF core. In case
this data is missing or in unsupported format, at least append crashing
CPU's register data, to have something to work with in the vmcore file.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821367702.5656.5546683836236508389.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: make use of memblock's bottom up allocation mode
Hari Bathini [Wed, 11 Sep 2019 14:54:28 +0000 (20:24 +0530)]
powerpc/fadump: make use of memblock's bottom up allocation mode

Earlier, memblock_find_in_range() was not used to find the memory to
be reserved for FADump as bottom up allocation mode was not supported.
But since commit 79442ed189acb8b ("mm/memblock.c: introduce bottom-up
allocation mode") bottom up allocation mode is supported for memblock.
So, use it to find the memory to be reserved for FADump.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821364211.5656.14336025460336135194.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: Update documentation about OPAL platform support
Hari Bathini [Wed, 11 Sep 2019 14:53:53 +0000 (20:23 +0530)]
powerpc/fadump: Update documentation about OPAL platform support

With FADump support now available on both pseries and OPAL platforms,
update FADump documentation with these details.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821361692.5656.11377757995827253404.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: handle invalidation of crashdump and re-registraion
Hari Bathini [Wed, 11 Sep 2019 14:53:28 +0000 (20:23 +0530)]
powerpc/fadump: handle invalidation of crashdump and re-registraion

Make OPAL call to indicate that the dump is processed and the metadata
area in OPAL can be cleared/released. Also, setup/initialize FADump
for re-registration.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821356046.5656.12270927048195494911.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: Warn before processing partial crashdump
Hari Bathini [Wed, 11 Sep 2019 14:52:32 +0000 (20:22 +0530)]
powerpc/fadump: Warn before processing partial crashdump

If all kernel boot memory regions are not registered for MPIPL before
system crashes, try processing the partial crashdump but warn the user
before proceeding.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821352793.5656.1734051341024721407.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: process the crashdump by exporting it as /proc/vmcore
Hari Bathini [Wed, 11 Sep 2019 14:51:59 +0000 (20:21 +0530)]
powerpc/fadump: process the crashdump by exporting it as /proc/vmcore

Add support in the kernel to process the crash'ed kernel's memory
preserved during MPIPL and export it as /proc/vmcore file for the
userland scripts to filter and analyze it later.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821351482.5656.6255805804744333073.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: support copying multiple kernel boot memory regions
Hari Bathini [Wed, 11 Sep 2019 14:51:46 +0000 (20:21 +0530)]
powerpc/fadump: support copying multiple kernel boot memory regions

Firmware uses a 32-bit field for size while copying/backing-up memory
during MPIPL. So, the maximum value that could be represented with
a PAGE_SIZE aligned 32-bit field will be the maximum copy size for a
region but FADump capture kernel usually needs more memory than that
to be preserved to avoid running into out of memory errors.

So, request firmware to copy multiple kernel boot memory regions
instead of just one (which worked fine for pseries as 64-bit field
was used for size there).

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821350193.5656.3664853158523582019.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: define OPAL register/un-register callback functions
Hari Bathini [Wed, 11 Sep 2019 14:51:33 +0000 (20:21 +0530)]
powerpc/fadump: define OPAL register/un-register callback functions

Make OPAL calls to register and un-register with firmware for MPIPL.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821348482.5656.13646250851483648241.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: reset metadata address during clean up
Hari Bathini [Wed, 11 Sep 2019 14:51:16 +0000 (20:21 +0530)]
powerpc/fadump: reset metadata address during clean up

During kexec boot, metadata address needs to be reset to avoid running
into errors interpreting stale metadata address, in case the kexec'ed
kernel crashes before metadata address could be setup again.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821346629.5656.10783321582005237813.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: register kernel metadata address with opal
Hari Bathini [Wed, 11 Sep 2019 14:50:57 +0000 (20:20 +0530)]
powerpc/fadump: register kernel metadata address with opal

OPAL allows registering address with it in the first kernel and
retrieving it after MPIPL. Setup kernel metadata and register its
address with OPAL to use it for processing the crash dump.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821345011.5656.13567765019032928471.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: improve fadump_reserve_mem()
Hari Bathini [Wed, 11 Sep 2019 14:50:41 +0000 (20:20 +0530)]
powerpc/fadump: improve fadump_reserve_mem()

Some code clean-up like using minimal assignments and updating printk
messages. Also, add an 'error_out' label for handling error cleanup
at one place.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821343485.5656.10202857091553646948.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: add fadump support on powernv
Hari Bathini [Wed, 11 Sep 2019 14:50:26 +0000 (20:20 +0530)]
powerpc/fadump: add fadump support on powernv

Add basic callback functions for FADump on PowerNV platform.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821342072.5656.4346362203141486452.stgit@hbathini.in.ibm.com
6 years agopowerpc/opal: add MPIPL interface definitions
Hari Bathini [Wed, 11 Sep 2019 14:50:12 +0000 (20:20 +0530)]
powerpc/opal: add MPIPL interface definitions

MPIPL is Memory Preserving IPL supported from POWER9. This enables the
kernel to reset the system with memory 'preserved'. Also, it supports
copying memory from a source address to some destination address during
MPIPL boot. Add MPIPL interface definitions here to leverage these f/w
features in adding FADump support for PowerNV platform.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821340710.5656.10071829040515662624.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: use FADump instead of fadump for how it is pronounced
Hari Bathini [Wed, 11 Sep 2019 14:49:58 +0000 (20:19 +0530)]
powerpc/fadump: use FADump instead of fadump for how it is pronounced

fadump is pronounced f-a-dump. Update documentation accordingly. Also,
update how fadump_region contents look like with recent changes.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821339317.5656.15852294223821732082.stgit@hbathini.in.ibm.com
6 years agopseries/fadump: move out platform specific support from generic code
Hari Bathini [Wed, 11 Sep 2019 14:49:44 +0000 (20:19 +0530)]
pseries/fadump: move out platform specific support from generic code

Move code that supports processing the crash'ed kernel's memory
preserved by firmware to platform specific callback functions.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821337690.5656.13050665924800177744.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: release all the memory above boot memory size
Hari Bathini [Wed, 11 Sep 2019 14:49:28 +0000 (20:19 +0530)]
powerpc/fadump: release all the memory above boot memory size

Except for Reserved dump area (see Documentation/powerpc/firmware-
assisted-dump.rst) which is permanent reserved, all memory above boot
memory size, where boot memory size is the memory required for the
kernel to boot successfully when booted with restricted memory (memory
for capture kernel), is released when the dump is invalidated. Make
this a bit more explicit in the code.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821336092.5656.1079046285366041687.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: add source info while displaying region contents
Hari Bathini [Wed, 11 Sep 2019 14:49:12 +0000 (20:19 +0530)]
powerpc/fadump: add source info while displaying region contents

Improve how fadump_region contents are displayed by adding source
information of memory regions that are to be dumped by f/w.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821334740.5656.5897097884010195405.stgit@hbathini.in.ibm.com
6 years agopseries/fadump: define RTAS register/un-register callback functions
Hari Bathini [Wed, 11 Sep 2019 14:48:57 +0000 (20:18 +0530)]
pseries/fadump: define RTAS register/un-register callback functions

Move platform specific register/un-register code, the RTAS calls, to
register/un-register callback functions. This would also mean moving
code that initializes and prints the platform specific FADump data.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821332856.5656.16380417702046411631.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: introduce callbacks for platform specific operations
Hari Bathini [Wed, 11 Sep 2019 14:48:40 +0000 (20:18 +0530)]
powerpc/fadump: introduce callbacks for platform specific operations

Introduce callback functions for platform specific operations like
register, unregister, invalidate & such. Also, define place-holders
for the same on pSeries platform.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821330286.5656.15538934400074110770.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: move rtas specific definitions to platform code
Hari Bathini [Wed, 11 Sep 2019 14:48:14 +0000 (20:18 +0530)]
powerpc/fadump: move rtas specific definitions to platform code

Currently, FADump is only supported on pSeries but that is going to
change soon with FADump support being added on PowerNV platform. So,
move rtas specific definitions to platform code to allow FADump
to have multiple platforms support.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821328494.5656.16219929140866195511.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: use helper functions to reserve/release cpu notes buffer
Hari Bathini [Wed, 11 Sep 2019 14:47:56 +0000 (20:17 +0530)]
powerpc/fadump: use helper functions to reserve/release cpu notes buffer

Use helper functions to simplify memory allocation, pinning down and
freeing the memory used for CPU notes buffer.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821323555.5656.2486038022572739622.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: Improve fadump documentation
Hari Bathini [Wed, 11 Sep 2019 14:47:07 +0000 (20:17 +0530)]
powerpc/fadump: Improve fadump documentation

The figures depicting FADump's (Firmware-Assisted Dump) memory layout
are missing some finer details like different memory regions and what
they represent. Improve the documentation by updating those details.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821322070.5656.8194734198500730487.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: declare helper functions in internal header file
Hari Bathini [Wed, 11 Sep 2019 14:46:52 +0000 (20:16 +0530)]
powerpc/fadump: declare helper functions in internal header file

Declare helper functions, that can be reused by multiple platforms,
in the internal header file.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821320487.5656.2660730464212209984.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: add helper functions
Hari Bathini [Wed, 11 Sep 2019 14:46:36 +0000 (20:16 +0530)]
powerpc/fadump: add helper functions

Add helper functions to setup & free CPU notes buffer and to find if a
given memory area is contiguous. Also, use boolean as return type for
the function that finds if boot memory area is contiguous. While at
it, save the virtual address of CPU notes buffer instead of physical
address as virtual address is used often.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821318971.5656.9281936950510635858.stgit@hbathini.in.ibm.com
6 years agopowerpc/fadump: move internal macros/definitions to a new header
Hari Bathini [Wed, 11 Sep 2019 14:46:21 +0000 (20:16 +0530)]
powerpc/fadump: move internal macros/definitions to a new header

Though asm/fadump.h is meant to be used by other components dealing
with FADump, it also has macros/definitions internal to FADump code.
Move them to a new header file used within FADump code. This also
makes way for refactoring platform specific FADump code.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821313134.5656.6597770626574392140.stgit@hbathini.in.ibm.com
6 years agopowerpc: improve prom_init_check rule
Masahiro Yamada [Thu, 12 Sep 2019 07:40:37 +0000 (16:40 +0900)]
powerpc: improve prom_init_check rule

This slightly improves the prom_init_check rule.

[1] Avoid needless check

Currently, prom_init_check.sh is invoked every time you run 'make'
even if you have changed nothing in prom_init.c. With this commit,
the script is re-run only when prom_init.o is recompiled.

[2] Beautify the build log

Currently, the O= build shows the absolute path to the script:

  CALL    /abs/path/to/source/of/linux/arch/powerpc/kernel/prom_init_check.sh

With this commit, it is always a relative path to the timestamp file:

  PROMCHK arch/powerpc/kernel/prom_init_check

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190912074037.13813-1-yamada.masahiro@socionext.com
6 years agopowerpc/kvm: Add ifdefs around template code
Michael Ellerman [Wed, 11 Sep 2019 11:57:46 +0000 (21:57 +1000)]
powerpc/kvm: Add ifdefs around template code

Some of the templates used for KVM patching are only used on certain
platforms, but currently they are always built-in, fix that.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190911115746.12433-4-mpe@ellerman.id.au
6 years agopowerpc/kvm: Explicitly mark kvm guest code as __init
Michael Ellerman [Wed, 11 Sep 2019 11:57:45 +0000 (21:57 +1000)]
powerpc/kvm: Explicitly mark kvm guest code as __init

All the code in kvm.c can be marked __init. Most of it is already
inlined into the initcall, but not all. So instead of relying on the
inlining, mark it all as __init. This saves ~280 bytes of text for my
configuration.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190911115746.12433-3-mpe@ellerman.id.au
6 years agopowerpc/64s: Remove overlaps_kvm_tmp()
Michael Ellerman [Wed, 11 Sep 2019 11:57:44 +0000 (21:57 +1000)]
powerpc/64s: Remove overlaps_kvm_tmp()

kvm_tmp is now in .text and so doesn't need a special overlap check.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190911115746.12433-2-mpe@ellerman.id.au
6 years agopowerpc/kvm: Move kvm_tmp into .text, shrink to 64K
Michael Ellerman [Wed, 11 Sep 2019 11:57:43 +0000 (21:57 +1000)]
powerpc/kvm: Move kvm_tmp into .text, shrink to 64K

In some configurations of KVM, guests binary patch themselves to
avoid/reduce trapping into the hypervisor. For some instructions this
requires replacing one instruction with a sequence of instructions.

For those cases we need to write the sequence of instructions
somewhere and then patch the location of the original instruction to
branch to the sequence. That requires that the location of the
sequence be within 32MB of the original instruction.

The current solution for this is that we create a 1MB array in BSS,
write sequences into there, and then free the remainder of the array.

This has a few problems:
 - it confuses kmemleak.
 - it confuses lockdep.
 - it requires mapping kvm_tmp executable, which can cause adjacent
   areas to also be mapped executable if we're using 16M pages for the
   linear mapping.
 - the 32MB limit can be exceeded if the kernel is big enough,
   especially with STRICT_KERNEL_RWX enabled, which then prevents the
   patching from working at all.

We can fix all those problems by making kvm_tmp just a region of
regular .text. However currently it's 1MB in size, and we don't want
to waste 1MB of text. In practice however I only see ~30KB of kvm_tmp
being used even for an allyes_config. So shrink kvm_tmp to 64K, which
ought to be enough for everyone, and move it into .text.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190911115746.12433-1-mpe@ellerman.id.au
6 years agopowerpc/powernv: Fix build with IOMMU_API=n
Michael Ellerman [Fri, 13 Sep 2019 13:50:06 +0000 (23:50 +1000)]
powerpc/powernv: Fix build with IOMMU_API=n

The builds breaks when IOMMU_API=n, eg. skiroot_defconfig:

  arch/powerpc/platforms/powernv/npu-dma.c:96:28: error: 'get_gpu_pci_dev_and_pe' defined but not used
  arch/powerpc/platforms/powernv/npu-dma.c:126:13: error: 'pnv_npu_set_window' defined but not used

Fixes: b4d37a7b6934 ("powerpc/powernv: Remove unused pnv_npu_try_dma_set_bypass() function")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/eeh: Fix build with STACKTRACE=n
Michael Ellerman [Fri, 13 Sep 2019 13:32:13 +0000 (23:32 +1000)]
powerpc/eeh: Fix build with STACKTRACE=n

The build breaks when STACKTRACE=n, eg. skiroot_defconfig:

  arch/powerpc/kernel/eeh_event.c:124:23: error: implicit declaration of function 'stack_trace_save'

Fix it with some ifdefs for now.

Fixes: 25baf3d81614 ("powerpc/eeh: Defer printing stack trace")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agopowerpc/xive: Fix bogus error code returned by OPAL
Greg Kurz [Wed, 11 Sep 2019 15:52:18 +0000 (17:52 +0200)]
powerpc/xive: Fix bogus error code returned by OPAL

There's a bug in skiboot that causes the OPAL_XIVE_ALLOCATE_IRQ call
to return the 32-bit value 0xffffffff when OPAL has run out of IRQs.
Unfortunatelty, OPAL return values are signed 64-bit entities and
errors are supposed to be negative. If that happens, the linux code
confusingly treats 0xffffffff as a valid IRQ number and panics at some
point.

A fix was recently merged in skiboot:

e97391ae2bb5 ("xive: fix return value of opal_xive_allocate_irq()")

but we need a workaround anyway to support older skiboots already
in the field.

Internally convert 0xffffffff to OPAL_RESOURCE which is the usual error
returned upon resource exhaustion.

Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821713818.1985334.14123187368108582810.stgit@bahia.lan
6 years agopowerpc/pseries: correctly track irq state in default idle
Nathan Lynch [Tue, 10 Sep 2019 22:52:44 +0000 (17:52 -0500)]
powerpc/pseries: correctly track irq state in default idle

prep_irq_for_idle() is intended to be called before entering
H_CEDE (and it is used by the pseries cpuidle driver). However the
default pseries idle routine does not call it, leading to mismanaged
lazy irq state when the cpuidle driver isn't in use. Manifestations of
this include:

* Dropped IPIs in the time immediately after a cpu comes
  online (before it has installed the cpuidle handler), making the
  online operation block indefinitely waiting for the new cpu to
  respond.

* Hitting this WARN_ON in arch_local_irq_restore():
/*
 * We should already be hard disabled here. We had bugs
 * where that wasn't the case so let's dbl check it and
 * warn if we are wrong. Only do that when IRQ tracing
 * is enabled as mfmsr() can be costly.
 */
if (WARN_ON_ONCE(mfmsr() & MSR_EE))
__hard_irq_disable();

Call prep_irq_for_idle() from pseries_lpar_idle() and honor its
result.

Fixes: 363edbe2614a ("powerpc: Default arch idle could cede processor on pseries")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190910225244.25056-1-nathanl@linux.ibm.com
6 years agopowerpc/watchpoint: Disable watchpoint hit by larx/stcx instructions
Ravi Bangoria [Tue, 10 Sep 2019 13:15:13 +0000 (18:45 +0530)]
powerpc/watchpoint: Disable watchpoint hit by larx/stcx instructions

If watchpoint exception is generated by larx/stcx instructions, the
reservation created by larx gets lost while handling exception, and
thus stcx instruction always fails. Generally these instructions are
used in a while(1) loop, for example spinlocks. And because stcx
never succeeds, it loops forever and ultimately hangs the system.

Note that ptrace anyway works in one-shot mode and thus for ptrace
we don't change the behaviour. It's up to ptrace user to take care
of this.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190910131513.30499-1-ravi.bangoria@linux.ibm.com
6 years agopowerpc/powernv: Add new opal message type
Vasant Hegde [Mon, 26 Aug 2019 06:57:01 +0000 (12:27 +0530)]
powerpc/powernv: Add new opal message type

We have OPAL_MSG_PRD message type to pass prd related messages from
OPAL to `opal-prd`. It can handle messages upto 64 bytes. We have a
requirement to send bigger than 64 bytes of data from OPAL to
`opal-prd`. Lets add new message type (OPAL_MSG_PRD2) to pass bigger
data.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
[mpe: Make the error string clear that it's the PRD2 event that failed]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190826065701.8853-2-hegdevasant@linux.vnet.ibm.com
6 years agopowerpc/powernv: Enhance opal message read interface
Vasant Hegde [Mon, 26 Aug 2019 06:57:00 +0000 (12:27 +0530)]
powerpc/powernv: Enhance opal message read interface

Use "opal-msg-size" device tree property to allocate memory for
"opal_msg".

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
[mpe: s/uint32_t/u32/ and mark opal_msg_size as __ro_after_init]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190826065701.8853-1-hegdevasant@linux.vnet.ibm.com
6 years agopowerpc/powernv: Remove unused pnv_npu_try_dma_set_bypass() function
Christoph Hellwig [Tue, 3 Sep 2019 16:51:47 +0000 (18:51 +0200)]
powerpc/powernv: Remove unused pnv_npu_try_dma_set_bypass() function

Neither pnv_npu_try_dma_set_bypass() nor the pnv_npu_dma_set_32() and
pnv_npu_dma_set_bypass() helpers called by it are used anywhere in the
kernel tree, so remove them.

mpe: They're unused since 2d6ad41b2c21 ("powerpc/powernv: use the
generic iommu bypass code") removed the last usage.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903165147.11099-1-hch@lst.de
6 years agoseltests/powerpc: Add a selftest for memcpy_mcsafe
Santosh Sivaraj [Tue, 3 Sep 2019 21:43:59 +0000 (03:13 +0530)]
seltests/powerpc: Add a selftest for memcpy_mcsafe

Appropriate self tests for memcpy_mcsafe

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903214359.23887-2-santosh@fossix.org
6 years agopowerpc/memcpy: Fix stack corruption for smaller sizes
Santosh Sivaraj [Tue, 3 Sep 2019 21:43:58 +0000 (03:13 +0530)]
powerpc/memcpy: Fix stack corruption for smaller sizes

For sizes lesser than 128 bytes, the code branches out early without saving
the stack frame, which when restored later drops frame of the caller.

Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903214359.23887-1-santosh@fossix.org
6 years agopowerpc: Add attributes for setjmp/longjmp
Segher Boessenkool [Wed, 4 Sep 2019 14:11:07 +0000 (14:11 +0000)]
powerpc: Add attributes for setjmp/longjmp

The setjmp function should be declared as "returns_twice", or bad
things can happen[1]. This does not actually change generated code in
my testing.

The longjmp function should be declared as "noreturn", so that the
compiler can optimise calls to it better. This makes the generated
code a little shorter.

1: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-returns_005ftwice-function-attribute

Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c02ce4a573f3bac907e2c70957a2d1275f910013.1567605586.git.segher@kernel.crashing.org
6 years agopowerpc: Remove empty comment
Jordan Niethe [Tue, 13 Aug 2019 05:12:12 +0000 (15:12 +1000)]
powerpc: Remove empty comment

Commit 2874c5fd2842 ("treewide: Replace GPLv2 boilerplate/reference with
SPDX - rule 152") left an empty comment in machdep.h, as the boilerplate
was the only text in the comment. Remove the empty comment.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190813051212.6387-1-jniethe5@gmail.com
6 years agopowerpc/imc: Dont create debugfs files for cpu-less nodes
Madhavan Srinivasan [Tue, 27 Aug 2019 10:16:35 +0000 (15:46 +0530)]
powerpc/imc: Dont create debugfs files for cpu-less nodes

Commit <684d984038aa> ('powerpc/powernv: Add debugfs interface for
imc-mode and imc') added debugfs interface for the nest imc pmu
devices to support changing of different ucode modes. Primarily adding
this capability for debug. But when doing so, the code did not
consider the case of cpu-less nodes. So when reading the _cmd_ or
_mode_ file of a cpu-less node will create this crash.

  Faulting instruction address: 0xc0000000000d0d58
  Oops: Kernel access of bad area, sig: 11 [#1]
  ...
  CPU: 67 PID: 5301 Comm: cat Not tainted 5.2.0-rc6-next-20190627+ #19
  NIP:  c0000000000d0d58 LR: c00000000049aa18 CTR:c0000000000d0d50
  REGS: c00020194548f9e0 TRAP: 0300   Not tainted  (5.2.0-rc6-next-20190627+)
  MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR:28022822  XER: 00000000
  CFAR: c00000000049aa14 DAR: 000000000003fc08 DSISR:40000000 IRQMASK: 0
  ...
  NIP imc_mem_get+0x8/0x20
  LR  simple_attr_read+0x118/0x170
  Call Trace:
    simple_attr_read+0x70/0x170 (unreliable)
    debugfs_attr_read+0x6c/0xb0
    __vfs_read+0x3c/0x70
     vfs_read+0xbc/0x1a0
    ksys_read+0x7c/0x140
    system_call+0x5c/0x70

Patch fixes the issue with a more robust check for vbase to NULL.

Before patch, ls output for the debugfs imc directory

  # ls /sys/kernel/debug/powerpc/imc/
  imc_cmd_0    imc_cmd_251  imc_cmd_253  imc_cmd_255  imc_mode_0    imc_mode_251  imc_mode_253  imc_mode_255
  imc_cmd_250  imc_cmd_252  imc_cmd_254  imc_cmd_8    imc_mode_250  imc_mode_252  imc_mode_254  imc_mode_8

After patch, ls output for the debugfs imc directory

  # ls /sys/kernel/debug/powerpc/imc/
  imc_cmd_0  imc_cmd_8  imc_mode_0  imc_mode_8

Actual bug here is that, we have two loops with potentially different
loop counts. That is, in imc_get_mem_addr_nest(), loop count is
obtained from the dt entries. But in case of export_imc_mode_and_cmd(),
loop was based on for_each_nid() count. Patch fixes the loop count in
latter based on the struct mem_info. Ideally it would be better to
have array size in struct imc_pmu.

Fixes: 684d984038aa ('powerpc/powernv: Add debugfs interface for imc-mode and imc')
Reported-by: Qian Cai <cai@lca.pw>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190827101635.6942-1-maddy@linux.vnet.ibm.com
6 years agopowerpc/64s/radix: introduce options to disable use of the tlbie instruction
Nicholas Piggin [Mon, 2 Sep 2019 15:29:31 +0000 (01:29 +1000)]
powerpc/64s/radix: introduce options to disable use of the tlbie instruction

Introduce two options to control the use of the tlbie instruction. A
boot time option which completely disables the kernel using the
instruction, this is currently incompatible with HASH MMU, KVM, and
coherent accelerators.

And a debugfs option can be switched at runtime and avoids using tlbie
for invalidating CPU TLBs for normal process and kernel address
mappings. Coherent accelerators are still managed with tlbie, as will
KVM partition scope translations.

Cross-CPU TLB flushing is implemented with IPIs and tlbiel. This is a
basic implementation which does not attempt to make any optimisation
beyond the tlbie implementation.

This is useful for performance testing among other things. For example
in certain situations on large systems, using IPIs may be faster than
tlbie as they can be directed rather than broadcast. Later we may also
take advantage of the IPIs to do more interesting things such as trim
the mm cpumask more aggressively.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190902152931.17840-7-npiggin@gmail.com
6 years agopowerpc/64s: remove unnecessary translation cache flushes at boot
Nicholas Piggin [Mon, 2 Sep 2019 15:29:30 +0000 (01:29 +1000)]
powerpc/64s: remove unnecessary translation cache flushes at boot

The various translation structure invalidations performed in early boot
when the MMU is off are not required, because everything is invalidated
immediately before a CPU first enables its MMU (see early_init_mmu
and early_init_mmu_secondary).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190902152931.17840-6-npiggin@gmail.com
6 years agopowerpc/64s/pseries: radix flush translations before MMU is enabled at boot
Nicholas Piggin [Mon, 2 Sep 2019 15:29:29 +0000 (01:29 +1000)]
powerpc/64s/pseries: radix flush translations before MMU is enabled at boot

Radix guests are responsible for managing their own translation caches,
so make them match bare metal radix and hash, and make each CPU flush
all its translations right before enabling its MMU.

Radix guests may not flush partition scope translations, so in
tlbiel_all, make these flushes conditional on CPU_FTR_HVMODE. Process
scope translations are the only type visible to the guest.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190902152931.17840-5-npiggin@gmail.com
6 years agopowerpc/64s: make mmu_partition_table_set_entry TLB flush optional
Nicholas Piggin [Mon, 2 Sep 2019 15:29:28 +0000 (01:29 +1000)]
powerpc/64s: make mmu_partition_table_set_entry TLB flush optional

No functional change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190902152931.17840-4-npiggin@gmail.com
6 years agopowerpc/64s/radix: tidy up TLB flushing code
Nicholas Piggin [Mon, 2 Sep 2019 15:29:27 +0000 (01:29 +1000)]
powerpc/64s/radix: tidy up TLB flushing code

There should be no functional changes.

- Use calls to existing radix_tlb.c functions in flush_partition.

- Rename radix__flush_tlb_lpid to radix__flush_all_lpid and similar,
  because they flush everything, matching flush_all_mm rather than
  flush_tlb_mm for the lpid.

- Remove some unused radix_tlb.c flush primitives.

Signed-off: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190902152931.17840-3-npiggin@gmail.com
6 years agopowerpc/64s: remove register_process_table callback
Nicholas Piggin [Mon, 2 Sep 2019 15:29:26 +0000 (01:29 +1000)]
powerpc/64s: remove register_process_table callback

This callback is only required because the partition table init comes
before process table allocation on powernv (aka bare metal aka native).

Change the order to allocate the process table first, and remove the
callback.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190902152931.17840-2-npiggin@gmail.com
6 years agoselftests/powerpc: Add basic EEH selftest
Oliver O'Halloran [Tue, 3 Sep 2019 10:16:05 +0000 (20:16 +1000)]
selftests/powerpc: Add basic EEH selftest

Use the new eeh_dev_check and eeh_dev_break interfaces to test EEH
recovery.  Historically this has been done manually using platform specific
EEH error injection facilities (e.g. via RTAS). However, documentation on
how to use these facilities is haphazard at best and non-existent at worst
so it's hard to develop a cross-platform test.

The new debugfs interfaces allow the kernel to handle the platform specific
details so we can write a more generic set of sets. This patch adds the
most basic of recovery tests where:

a) Errors are injected and recovered from sequentially,
b) Errors are not injected into PCI-PCI bridges, such as PCIe switches.
c) Errors are only injected into device function zero.
d) No errors are injected into Virtual Functions.

a), b) and c) are largely due to limitations of Linux's EEH support.  EEH
recovery is serialised in the EEH recovery thread which forces a).
Similarly, multi-function PCI devices are almost always grouped into the
same PE so injecting an error on one function exercises the same code
paths. c) is because we currently more or less ignore PCI bridges during
recovery and assume that the recovered topology will be the same as the
original.

d) is due to the limits of the eeh_dev_break interface. With the current
implementation we can't inject an error into a specific VF without
potentially causing additional errors on other VFs. Due to the serialised
recovery process we might end up timing out waiting for another function to
recover before the function of interest is recovered. The platform specific
error injection facilities are finer-grained and allow this capability, but
doing that requires working out how to use those facilities first.

Basicly, it's better than nothing and it's a base to build on.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903101605.2890-15-oohall@gmail.com
6 years agopowerpc/eeh: Add a eeh_dev_break debugfs interface
Oliver O'Halloran [Tue, 3 Sep 2019 10:16:04 +0000 (20:16 +1000)]
powerpc/eeh: Add a eeh_dev_break debugfs interface

Add an interface to debugfs for generating an EEH event on a given device.
This works by disabling memory accesses to and from the device by setting
the PCI_COMMAND register (or the VF Memory Space Enable on the parent PF).

This is a somewhat portable alternative to using the platform specific
error injection mechanisms since those tend to be either hard to use, or
straight up broken. For pseries the interfaces also requires the use of
/dev/mem which is probably going to go away in a post-LOCKDOWN world
(and it's a horrific hack to begin with) so moving to a kernel-provided
interface makes sense and provides a sane, cross-platform interface for
userspace so we can write more generic testing scripts.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903101605.2890-14-oohall@gmail.com
6 years agopowerpc/eeh: Add debugfs interface to run an EEH check
Oliver O'Halloran [Tue, 3 Sep 2019 10:16:03 +0000 (20:16 +1000)]
powerpc/eeh: Add debugfs interface to run an EEH check

Detecting an frozen EEH PE usually occurs when an MMIO load returns a 0xFFs
response. When performing EEH testing using the EEH error injection feature
available on some platforms there is no simple way to kick-off the kernel's
recovery process since any accesses from userspace (usually /dev/mem) will
bypass the MMIO helpers in the kernel which check if a 0xFF response is due
to an EEH freeze or not.

If a device contains a 0xFF byte in it's config space it's possible to
trigger the recovery process via config space read from userspace, but this
is not a reliable method. If a driver is bound to the device an in use it
will frequently trigger the MMIO check, but this is also inconsistent.

To solve these problems this patch adds a debugfs file called
"eeh_dev_check" which accepts a <domain>:<bus>:<dev>.<fn> string and runs
eeh_dev_check_failure() on it. This is the same check that's done when the
kernel gets a 0xFF result from an config or MMIO read with the added
benifit that it can be reliably triggered from userspace.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903101605.2890-13-oohall@gmail.com
6 years agopowerpc/eeh: Set attention indicator while recovering
Oliver O'Halloran [Tue, 3 Sep 2019 10:16:02 +0000 (20:16 +1000)]
powerpc/eeh: Set attention indicator while recovering

I am the RAS team. Hear me roar.

Roar.

On a more serious note, being able to locate failed devices can be helpful.
Set the attention indicator if the slot supports it once we've determined
the device is present and only clear it if the device is fully recovered.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903101605.2890-12-oohall@gmail.com
6 years agopci-hotplug/pnv_php: Add attention indicator support
Oliver O'Halloran [Tue, 3 Sep 2019 10:16:01 +0000 (20:16 +1000)]
pci-hotplug/pnv_php: Add attention indicator support

pnv_php is generally used with PCIe bridges which provide a native
interface for setting the attention and power indicator LEDs. Wire up
those interfaces even if firmware does not have support for them (yet...)

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903101605.2890-11-oohall@gmail.com
6 years agopci-hotplug/pnv_php: Add support for IODA3 Power9 PHBs
Oliver O'Halloran [Tue, 3 Sep 2019 10:16:00 +0000 (20:16 +1000)]
pci-hotplug/pnv_php: Add support for IODA3 Power9 PHBs

Currently we check that an IODA2 compatible PHB is upstream of this slot.
This is mainly to avoid pnv_php creating slots for the various "virtual
PHBs" that we create for NVLink. There's no real need for this restriction
so allow it on IODA3.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903101605.2890-10-oohall@gmail.com
6 years agopci-hotplug/pnv_php: Add a reset_slot() callback
Oliver O'Halloran [Tue, 3 Sep 2019 10:15:59 +0000 (20:15 +1000)]
pci-hotplug/pnv_php: Add a reset_slot() callback

When performing EEH recovery of devices in a hotplug slot we need to use
the slot driver's ->reset_slot() callback to prevent spurious hotplug
events due to spurious DLActive and PresDet change interrupts. Add a
reset_slot() callback to pnv_php so we can handle recovery of devices
in pnv_php managed slots.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903101605.2890-9-oohall@gmail.com
6 years agopowernv/eeh: Use generic code to handle hot resets
Oliver O'Halloran [Tue, 3 Sep 2019 10:15:58 +0000 (20:15 +1000)]
powernv/eeh: Use generic code to handle hot resets

When we reset PCI devices managed by a hotplug driver the reset may
generate spurious hotplug events that cause the PCI device we're resetting
to be torn down accidently. This is a problem for EEH (when the driver is
EEH aware) since we want to leave the OS PCI device state intact so that
the device can be re-set without losing any resources (network, disks,
etc) provided by the driver.

Generic PCI code provides the pci_bus_error_reset() function to handle
resetting a PCI Device (or bus) by using the reset method provided by the
hotplug slot driver. We can use this function if the EEH core has
requested a hot reset (common case) without tripping over the hotplug
driver.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903101605.2890-8-oohall@gmail.com
6 years agopowerpc/eeh: Remove stale CAPI comment
Oliver O'Halloran [Tue, 3 Sep 2019 10:15:57 +0000 (20:15 +1000)]
powerpc/eeh: Remove stale CAPI comment

Support for switching CAPI cards into and out of CAPI mode was removed a
while ago. Drop the comment since it's no longer relevant.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903101605.2890-7-oohall@gmail.com
6 years agopowerpc/eeh: Defer printing stack trace
Oliver O'Halloran [Tue, 3 Sep 2019 10:15:56 +0000 (20:15 +1000)]
powerpc/eeh: Defer printing stack trace

Currently we print a stack trace in the event handler to help with
debugging EEH issues. In the case of suprise hot-unplug this is unneeded,
so we want to prevent printing the stack trace unless we know it's due to
an actual device error. To accomplish this, we can save a stack trace at
the point of detection and only print it once the EEH recovery handler has
determined the freeze was due to an actual error.

Since the whole point of this is to prevent spurious EEH output we also
move a few prints out of the detection thread, or mark them as pr_debug
so anyone interested can get output from the eeh_check_dev_failure()
if they want.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903101605.2890-6-oohall@gmail.com
6 years agopowerpc/eeh: Check slot presence state in eeh_handle_normal_event()
Oliver O'Halloran [Tue, 3 Sep 2019 10:15:55 +0000 (20:15 +1000)]
powerpc/eeh: Check slot presence state in eeh_handle_normal_event()

When a device is surprise removed while undergoing IO we will probably
get an EEH PE freeze due to MMIO timeouts and other errors. When a freeze
is detected we send a recovery event to the EEH worker thread which will
notify drivers, and perform recovery as needed.

In the event of a hot-remove we don't want recovery to occur since there
isn't a device to recover. The recovery process is fairly long due to
the number of wait states (required by PCIe) which causes problems when
devices are removed and replaced (e.g. hot swapping of U.2 NVMe drives).

To determine if we need to skip the recovery process we can use the
get_adapter_state() operation of the hotplug_slot to determine if the
slot contains a device or not, and if the slot is empty we can skip
recovery entirely.

One thing to note is that the slot being EEH frozen does not prevent the
hotplug driver from working. We don't have the EEH recovery thread
remove any of the devices since it's assumed that the hotplug driver
will handle tearing down the slot state.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903101605.2890-5-oohall@gmail.com
6 years agopowerpc/eeh: Make permanently failed devices non-actionable
Oliver O'Halloran [Tue, 3 Sep 2019 10:15:54 +0000 (20:15 +1000)]
powerpc/eeh: Make permanently failed devices non-actionable

If a device is torn down by a hotplug slot driver it's marked as removed
and marked as permaantly failed. There's no point in trying to recover a
permernantly failed device so it should be considered un-actionable.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903101605.2890-4-oohall@gmail.com
6 years agopowerpc/eeh: Fix race when freeing PDNs
Oliver O'Halloran [Tue, 3 Sep 2019 10:15:53 +0000 (20:15 +1000)]
powerpc/eeh: Fix race when freeing PDNs

When hot-adding devices we rely on the hotplug driver to create pci_dn's
for the devices under the hotplug slot. Converse, when hot-removing the
driver will remove the pci_dn's that it created. This is a problem because
the pci_dev is still live until it's refcount drops to zero. This can
happen if the driver is slow to tear down it's internal state. Ideally, the
driver would not attempt to perform any config accesses to the device once
it's been marked as removed, but sometimes it happens. As a result, we
might attempt to access the pci_dn for a device that has been torn down and
the kernel may crash as a result.

To fix this, don't free the pci_dn unless the corresponding pci_dev has
been released.  If the pci_dev is still live, then we mark the pci_dn with
a flag that indicates the pci_dev's release function should free it.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903101605.2890-3-oohall@gmail.com
6 years agopowerpc/eeh: Clean up EEH PEs after recovery finishes
Oliver O'Halloran [Tue, 3 Sep 2019 10:15:52 +0000 (20:15 +1000)]
powerpc/eeh: Clean up EEH PEs after recovery finishes

When the last device in an eeh_pe is removed the eeh_pe structure itself
(and any empty parents) are freed since they are no longer needed. This
results in a crash when a hotplug driver is involved since the following
may occur:

1. Device is suprise removed.
2. Driver performs an MMIO, which fails and queues and eeh_event.
3. Hotplug driver receives a hotplug interrupt and removes any
   pci_devs that were under the slot.
4. pci_dev is torn down and the eeh_pe is freed.
5. The EEH event handler thread processes the eeh_event and crashes
   since the eeh_pe pointer in the eeh_event structure is no
   longer valid.

Crashing is generally considered poor form. Instead of doing that use
the fact PEs are marked as EEH_PE_INVALID to keep them around until the
end of the recovery cycle, at which point we can safely prune any empty
PEs.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190903101605.2890-2-oohall@gmail.com
6 years agopowerpc/64s/exception: reduce page fault unnecessary loads
Nicholas Piggin [Fri, 2 Aug 2019 10:57:01 +0000 (20:57 +1000)]
powerpc/64s/exception: reduce page fault unnecessary loads

This avoids 3 loads in the radix page fault case, 1 load in the
hash fault case, and 2 loads in the hash miss page fault case.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-37-npiggin@gmail.com
6 years agopowerpc/64s/exception: Remove pointless KVM handler name bifurcation
Nicholas Piggin [Fri, 2 Aug 2019 10:57:00 +0000 (20:57 +1000)]
powerpc/64s/exception: Remove pointless KVM handler name bifurcation

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-36-npiggin@gmail.com
6 years agopowerpc/64s/exception: program check handler do not branch into a macro
Nicholas Piggin [Fri, 2 Aug 2019 10:56:59 +0000 (20:56 +1000)]
powerpc/64s/exception: program check handler do not branch into a macro

It is clever, but the small code saving is not worth the spaghetti of
jumping to a label in an expanded macro, particularly when the label
is just a number rather than a descriptive name.

So expand the INT_COMMON macro twice, once for the stack and no stack
cases, and branch to those. The slight code size increase is worth
the improved clarity of branches for this non-performance critical
code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-35-npiggin@gmail.com
6 years agopowerpc/64s/exception: move interrupt entry code above the common handler
Nicholas Piggin [Fri, 2 Aug 2019 10:56:58 +0000 (20:56 +1000)]
powerpc/64s/exception: move interrupt entry code above the common handler

This better reflects the order in which the code is executed.

No generated code change except BUG line number constants.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-34-npiggin@gmail.com
6 years agopowerpc/64s/exception: INT_COMMON add DAR, DSISR, reconcile options
Nicholas Piggin [Fri, 2 Aug 2019 10:56:57 +0000 (20:56 +1000)]
powerpc/64s/exception: INT_COMMON add DAR, DSISR, reconcile options

Move DAR and DSISR saving to pt_regs into INT_COMMON. Also add an
option to expand RECONCILE_IRQ_STATE.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-33-npiggin@gmail.com
6 years agopowerpc/64s/exception: Expand EXCEPTION_PROLOG_COMMON_1 and 2 into caller
Nicholas Piggin [Fri, 2 Aug 2019 10:56:56 +0000 (20:56 +1000)]
powerpc/64s/exception: Expand EXCEPTION_PROLOG_COMMON_1 and 2 into caller

No generated code change except BUG line number constants.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-32-npiggin@gmail.com
6 years agopowerpc/64s/exception: Expand EXCEPTION_COMMON macro into caller
Nicholas Piggin [Fri, 2 Aug 2019 10:56:55 +0000 (20:56 +1000)]
powerpc/64s/exception: Expand EXCEPTION_COMMON macro into caller

No generated code change except BUG line number constants.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-31-npiggin@gmail.com
6 years agopowerpc/64s/exception: Add INT_COMMON gas macro to generate common exception code
Nicholas Piggin [Fri, 2 Aug 2019 10:56:54 +0000 (20:56 +1000)]
powerpc/64s/exception: Add INT_COMMON gas macro to generate common exception code

No generated code change except BUG line number constants.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-30-npiggin@gmail.com
6 years agopowerpc/64s/exception: Merge EXCEPTION_PROLOG_COMMON_2/3
Nicholas Piggin [Fri, 2 Aug 2019 10:56:53 +0000 (20:56 +1000)]
powerpc/64s/exception: Merge EXCEPTION_PROLOG_COMMON_2/3

Merge EXCEPTION_PROLOG_COMMON_3 into EXCEPTION_PROLOG_COMMON_2.

No generated code change except BUG line number constants.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-29-npiggin@gmail.com
6 years agopowerpc/64s/exception: KVM_HANDLER reorder arguments to match other macros
Nicholas Piggin [Fri, 2 Aug 2019 10:56:52 +0000 (20:56 +1000)]
powerpc/64s/exception: KVM_HANDLER reorder arguments to match other macros

Also change argument name (n -> vec) to match others.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-28-npiggin@gmail.com
6 years agopowerpc/64s/exception: Add INT_KVM_HANDLER gas macro
Nicholas Piggin [Fri, 2 Aug 2019 10:56:51 +0000 (20:56 +1000)]
powerpc/64s/exception: Add INT_KVM_HANDLER gas macro

Replace the 4 variants of cpp macros with one gas macro.

No generated code change except BUG line number constants.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-27-npiggin@gmail.com
6 years agopowerpc/64s/exception: INT_HANDLER support HDAR/HDSISR and use it in HDSI
Nicholas Piggin [Fri, 2 Aug 2019 10:56:50 +0000 (20:56 +1000)]
powerpc/64s/exception: INT_HANDLER support HDAR/HDSISR and use it in HDSI

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-26-npiggin@gmail.com
6 years agopowerpc/64s/exception: Add the virt variant of the denorm interrupt handler
Nicholas Piggin [Fri, 2 Aug 2019 10:56:49 +0000 (20:56 +1000)]
powerpc/64s/exception: Add the virt variant of the denorm interrupt handler

All other virt handlers have the prolog code in the virt vector rather
than branch to the real vector. Follow this pattern in the denorm virt
handler.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-25-npiggin@gmail.com
6 years agopowerpc/64s/exception: remove EXCEPTION_PROLOG_0/1, rename _2
Nicholas Piggin [Fri, 2 Aug 2019 10:56:48 +0000 (20:56 +1000)]
powerpc/64s/exception: remove EXCEPTION_PROLOG_0/1, rename _2

EXCEPTION_PROLOG_0 and _1 have only a single caller, so expand them
into it.

Rename EXCEPTION_PROLOG_2_REAL to INT_SAVE_SRR_AND_JUMP and
EXCEPTION_PROLOG_2_VIRT to INT_VIRT_SAVE_SRR_AND_JUMP, which are
more descriptive.

No generated code change except BUG line number constants.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-24-npiggin@gmail.com
6 years agopowerpc/64s/exceptions: Use keyword params to shorten arg lists
Michael Ellerman [Thu, 29 Aug 2019 13:36:08 +0000 (23:36 +1000)]
powerpc/64s/exceptions: Use keyword params to shorten arg lists

The argument lists for the INT_HANDLER macro are getting a bit
unwieldy. Use keyword parameters with default values to shorten them.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190830011426.16810-1-mpe@ellerman.id.au
6 years agopowerpc/64s/exception: Replace PROLOG macros and EXC helpers with a gas macro
Nicholas Piggin [Fri, 2 Aug 2019 10:56:47 +0000 (20:56 +1000)]
powerpc/64s/exception: Replace PROLOG macros and EXC helpers with a gas macro

This creates a single macro that generates the exception prolog code,
with variants specified by arguments, rather than assorted nested
macros for different variants.

The increasing length of macro argument list is not nice to read or
modify, but this is a temporary condition that will be improved in
later changes.

No generated code change except BUG line number constants and label
names.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-23-npiggin@gmail.com
6 years agopowerpc/64s/exception: remove 0xb00 handler
Nicholas Piggin [Fri, 2 Aug 2019 10:56:46 +0000 (20:56 +1000)]
powerpc/64s/exception: remove 0xb00 handler

This vector is not used by any supported processor, and has been
implemented as an unknown exception going back to 2.6. There is
nothing special about 0xb00, so remove it like other unused
vectors.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-22-npiggin@gmail.com
6 years agopowerpc/64s/exception: Fix performance monitor virt handler
Nicholas Piggin [Fri, 2 Aug 2019 10:56:45 +0000 (20:56 +1000)]
powerpc/64s/exception: Fix performance monitor virt handler

The perf virt handler uses EXCEPTION_PROLOG_2_REAL rather than _VIRT.
In practice this is okay because the _REAL variant is usable by virt
mode interrupts, but should be fixed (and is a performance win).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-21-npiggin@gmail.com
6 years agopowerpc/64s/exception: Add EXC_HV_OR_STD, which selects HSRR if HVMODE
Nicholas Piggin [Fri, 2 Aug 2019 10:56:44 +0000 (20:56 +1000)]
powerpc/64s/exception: Add EXC_HV_OR_STD, which selects HSRR if HVMODE

Add EXC_HV_OR_STD and use it to consolidate the 0x500 external
interrupt.

Executed code is unchanged.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-20-npiggin@gmail.com
6 years agopowerpc/64s/exception: move head-64.h exception code to exception-64s.S
Nicholas Piggin [Fri, 2 Aug 2019 10:56:43 +0000 (20:56 +1000)]
powerpc/64s/exception: move head-64.h exception code to exception-64s.S

The head-64.h code should deal only with the head code sections
and offset calculations.

No generated code change except BUG line number constants.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-19-npiggin@gmail.com
6 years agopowerpc/64s/exception: Fix DAR load for handle_page_fault error case
Nicholas Piggin [Fri, 2 Aug 2019 10:56:42 +0000 (20:56 +1000)]
powerpc/64s/exception: Fix DAR load for handle_page_fault error case

This buglet goes back to before the 64/32 arch merge, but it does not
seem to have had practical consequences because bad_page_fault does
not use the 2nd argument, but rather regs->dar/nip.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-18-npiggin@gmail.com
6 years agopowerpc/64s/exception: machine check improve labels and comments
Nicholas Piggin [Fri, 2 Aug 2019 10:56:41 +0000 (20:56 +1000)]
powerpc/64s/exception: machine check improve labels and comments

Short forward and backward branches can be given number labels,
but larger significant divergences in code path a more readable
if they're given descriptive names.

Also adjusts a comment to account for guest delivery.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-17-npiggin@gmail.com
6 years agopowerpc/64s/exception: untangle early machine check handler branch
Nicholas Piggin [Fri, 2 Aug 2019 10:56:40 +0000 (20:56 +1000)]
powerpc/64s/exception: untangle early machine check handler branch

machine_check_early_common now branches to machine_check_handle_early
which is its only caller.

Move interleaving code out of the way, and remove the branch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-16-npiggin@gmail.com
6 years agopowerpc/64s/exception: machine check move unrecoverable handling out of line
Nicholas Piggin [Fri, 2 Aug 2019 10:56:39 +0000 (20:56 +1000)]
powerpc/64s/exception: machine check move unrecoverable handling out of line

Similarly to the previous change, all callers of the unrecoverable
handler run relocated so can reach it with a direct branch. This makes
it easy to move out of line, which makes the "normal" path less
cluttered and easier to follow.

MSR[ME] manipulation still requires the rfi, so that is moved out of
line to its own function.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-15-npiggin@gmail.com
6 years agopowerpc/64s/exception: simplify machine check early path
Nicholas Piggin [Fri, 2 Aug 2019 10:56:38 +0000 (20:56 +1000)]
powerpc/64s/exception: simplify machine check early path

machine_check_handle_early_common can reach machine_check_handle_early
directly now that it runs at the relocated address, so just branch
directly.

The rfi sequence is required to enable MSR[ME] but that step is moved
into a helper function, making the code easier to follow.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-14-npiggin@gmail.com
6 years agopowerpc/64s/exception: machine check move tramp code
Nicholas Piggin [Fri, 2 Aug 2019 10:56:37 +0000 (20:56 +1000)]
powerpc/64s/exception: machine check move tramp code

Following convention, move the tramp code (unrelocated) above the
common handlers (relocated).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-13-npiggin@gmail.com
6 years agopowerpc/64s/exception: machine check restructure to reuse common macros
Nicholas Piggin [Fri, 2 Aug 2019 10:56:36 +0000 (20:56 +1000)]
powerpc/64s/exception: machine check restructure to reuse common macros

Follow the pattern of sreset and HMI handlers more closely: use
EXCEPTION_PROLOG_COMMON_1 rather than open-coding it, and run the
handler at the relocated location.

This helps later simplification and code sharing.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190802105709.27696-12-npiggin@gmail.com